AWS Analytics & Big Data
Data Exchange
Subscribe to and use third-party datasets directly within AWS
AWS Data Exchange is a marketplace service that lets you subscribe to, access, and use third-party datasets directly within AWS without negotiating data licenses, managing data transfer, or building custom ingestion pipelines. Publishers list their datasets and revisions; subscribers receive automatic S3 deliveries when new data is published. It is relevant for data engineering roles at organizations that use external data - financial data, geospatial data, demographic data, and industry benchmarks - to enrich internal analytics.
How Data Exchange Works - Publishers, Subscribers, and Revisions
Data Exchange uses a publisher-subscriber model built around datasets and revisions:
| Concept | Description |
|---|---|
| Dataset | A named collection of data assets (files) from a publisher |
| Revision | A versioned snapshot of a dataset - publishers add revisions when data updates |
| Asset | An individual file or S3 object within a revision |
| Product | A marketplace listing - one or more datasets with a price and subscription terms |
| Subscription | Access agreement between subscriber and publisher |
| Auto-export | Automatically deliver new revisions to a subscriber S3 bucket |
Once subscribed, new data revisions appear automatically in your designated S3 bucket. You can then query them with Athena, process them with Glue, or load them into Redshift - no manual download required.
AWS Data Exchange also supports API-based data products where the publisher exposes a REST API and Data Exchange handles authentication. This is useful for real-time data (stock prices, weather) rather than bulk file delivery.
Common Use Cases and Data Categories
| Category | Examples | Typical Consumer |
|---|---|---|
| Financial data | Stock prices, options chains, fundamentals, sentiment | Fintech, quant funds, insurance |
| Geospatial | Maps, POI data, satellite imagery, traffic patterns | Logistics, retail site selection |
| Demographic and consumer | Census data, consumer behavior, household income | Marketing, retail analytics |
| Healthcare | Clinical trial data, claims data, genomics reference | Health tech, pharma |
| Weather and climate | Historical weather, forecasts, climate indices | Insurance, agriculture, energy |
| Cybersecurity | Threat intelligence feeds, IP reputation, vulnerability data | Security teams, SOCs |
Integrating Data Exchange with Your Data Lake
The typical integration pattern delivers new revisions automatically to S3, then a Lambda trigger or EventBridge rule processes them into the data lake.
# Enable auto-export to S3 for a Data Exchange subscription
aws dataexchange create-event-action \
--action \
ExportRevisionToS3={\
RevisionDestination={\
Bucket=my-data-lake-bucket,\
KeyPattern="${Revision.CreatedAt}/${Asset.Name}"\
}\
} \
--event \
RevisionPublished={\
DataSetId=<dataset-id>\
}After auto-export, trigger a Glue Crawler to update the catalog schema, then Athena can query the latest data. For latency-sensitive cases, trigger a Lambda directly on the S3 PutObject event to start a Glue ETL job.
Data Exchange does not provide compute - it is purely a data delivery mechanism. All processing happens in your account after delivery to S3. Pricing is set by the publisher; AWS charges nothing for the Data Exchange service itself (only S3 storage and transfer costs).
Publishing Data on Data Exchange
Organizations can also publish datasets to monetize their data or share with partners.
| Publishing Concern | How It Works |
|---|---|
| Product listing | Create a product in Data Exchange Marketplace console |
| Pricing | Set subscription price, free tier, or bring-your-own contract |
| Revision publishing | Upload assets to a revision and publish - subscribers receive automatically |
| API products | Expose a REST API with Data Exchange handling auth/metering |
| Revenue share | AWS takes a percentage of subscription revenue |
Interview Focus Points
- 1What is AWS Data Exchange and how does it simplify third-party data acquisition compared to manual downloads?
- 2How would you integrate a Data Exchange subscription into an existing S3 data lake pipeline?
- 3What is the difference between file-based and API-based Data Exchange products?
- 4How does Data Exchange handle data freshness - how do subscribers get updated data?
- 5What are the cost components when using Data Exchange as a subscriber?
- 6Describe a scenario where you would recommend Data Exchange over building a custom data ingestion pipeline from an external provider.