Data Exchange

Subscribe to and use third-party datasets directly within AWS

AWS Data Exchange is a marketplace service that lets you subscribe to, access, and use third-party datasets directly within AWS without negotiating data licenses, managing data transfer, or building custom ingestion pipelines. Publishers list their datasets and revisions; subscribers receive automatic S3 deliveries when new data is published. It is relevant for data engineering roles at organizations that use external data - financial data, geospatial data, demographic data, and industry benchmarks - to enrich internal analytics.

How Data Exchange Works - Publishers, Subscribers, and Revisions

Data Exchange uses a publisher-subscriber model built around datasets and revisions:

Concept	Description
Dataset	A named collection of data assets (files) from a publisher
Revision	A versioned snapshot of a dataset - publishers add revisions when data updates
Asset	An individual file or S3 object within a revision
Product	A marketplace listing - one or more datasets with a price and subscription terms
Subscription	Access agreement between subscriber and publisher
Auto-export	Automatically deliver new revisions to a subscriber S3 bucket

Once subscribed, new data revisions appear automatically in your designated S3 bucket. You can then query them with Athena, process them with Glue, or load them into Redshift - no manual download required.

💡

AWS Data Exchange also supports API-based data products where the publisher exposes a REST API and Data Exchange handles authentication. This is useful for real-time data (stock prices, weather) rather than bulk file delivery.

Common Use Cases and Data Categories

Category	Examples	Typical Consumer
Financial data	Stock prices, options chains, fundamentals, sentiment	Fintech, quant funds, insurance
Geospatial	Maps, POI data, satellite imagery, traffic patterns	Logistics, retail site selection
Demographic and consumer	Census data, consumer behavior, household income	Marketing, retail analytics
Healthcare	Clinical trial data, claims data, genomics reference	Health tech, pharma
Weather and climate	Historical weather, forecasts, climate indices	Insurance, agriculture, energy
Cybersecurity	Threat intelligence feeds, IP reputation, vulnerability data	Security teams, SOCs

Integrating Data Exchange with Your Data Lake

The typical integration pattern delivers new revisions automatically to S3, then a Lambda trigger or EventBridge rule processes them into the data lake.

bash

# Enable auto-export to S3 for a Data Exchange subscription
aws dataexchange create-event-action \
  --action \
    ExportRevisionToS3={\
      RevisionDestination={\
        Bucket=my-data-lake-bucket,\
        KeyPattern="${Revision.CreatedAt}/${Asset.Name}"\
      }\
    } \
  --event \
    RevisionPublished={\
      DataSetId=<dataset-id>\
    }

After auto-export, trigger a Glue Crawler to update the catalog schema, then Athena can query the latest data. For latency-sensitive cases, trigger a Lambda directly on the S3 PutObject event to start a Glue ETL job.

💡

Data Exchange does not provide compute - it is purely a data delivery mechanism. All processing happens in your account after delivery to S3. Pricing is set by the publisher; AWS charges nothing for the Data Exchange service itself (only S3 storage and transfer costs).

Publishing Data on Data Exchange

Organizations can also publish datasets to monetize their data or share with partners.

Publishing Concern	How It Works
Product listing	Create a product in Data Exchange Marketplace console
Pricing	Set subscription price, free tier, or bring-your-own contract
Revision publishing	Upload assets to a revision and publish - subscribers receive automatically
API products	Expose a REST API with Data Exchange handling auth/metering
Revenue share	AWS takes a percentage of subscription revenue

🎯

Interview Focus Points

1What is AWS Data Exchange and how does it simplify third-party data acquisition compared to manual downloads?
2How would you integrate a Data Exchange subscription into an existing S3 data lake pipeline?
3What is the difference between file-based and API-based Data Exchange products?
4How does Data Exchange handle data freshness - how do subscribers get updated data?
5What are the cost components when using Data Exchange as a subscriber?
6Describe a scenario where you would recommend Data Exchange over building a custom data ingestion pipeline from an external provider.