AWS Internet of Things
IoT Analytics
Fully managed analytics service for IoT device data at scale
AWS IoT Analytics is a fully managed service that collects, processes, stores, and analyzes IoT device data at scale without requiring you to manage infrastructure. It provides a pipeline-based architecture for cleaning, transforming, and enriching raw device data before it reaches your analytics tools. Cloud engineers use it to separate raw data ingestion from analytics-ready datasets, enabling complex queries on time-series device data using standard SQL.
How IoT Analytics Works: Channels, Pipelines, and Datastores
IoT Analytics has five core components that data flows through sequentially. Understanding this pipeline architecture is essential for designing IoT data processing systems.
| Component | Role | Key Configuration |
|---|---|---|
| Channel | Ingest point - receives raw messages from IoT Core rules or direct API calls | Configures retention period for raw messages |
| Pipeline | Series of processing activities that transform data as it flows through | Activities include filter, transform, enrich (via Lambda), attribute, math, deviceRegistryEnrich, selectAttributes, removeAttributes |
| Datastore | Long-term storage for processed messages, queryable via SQL | Supports managed storage or customer-managed S3; configures retention period |
| Dataset | SQL query result materialized on a schedule or on-demand | Can trigger export to S3 or call IoT Events; scheduled via cron or on-demand |
| Data Store (S3 managed) | Optimized columnar storage for analytics queries | Automatically partitioned; queryable via dataset SQL |
Data flow: IoT Core rule action sends messages to a Channel. The Channel feeds a Pipeline. The Pipeline writes to a Datastore. Datasets query the Datastore on a schedule and export results.
Pipeline Activities: Transforming Raw Device Data
Pipeline activities are the processing steps that execute on each message as it passes through the pipeline. They run in the order defined and can be chained.
| Activity | What It Does | Example Use Case |
|---|---|---|
| filter | Drop messages that do not match a condition (SQL WHERE clause) | Discard messages where temperature is null or negative |
| selectAttributes | Keep only specified fields from the message | Strip internal device metadata before storing |
| removeAttributes | Remove specified fields from the message | Remove PII fields like user_id before analytics storage |
| addAttributes (math) | Compute derived fields using math expressions | Convert Fahrenheit to Celsius: (temp - 32) * 5/9 |
| lambda | Invoke a Lambda function for complex enrichment | Reverse geocode GPS coordinates to city/region |
| deviceRegistryEnrich | Append Thing attributes from the IoT registry | Add device manufacturer, firmware version, location from registry metadata |
| datastore | Terminal activity that writes to the configured datastore | Final step in every pipeline |
# Create a pipeline with filter and Lambda enrichment
aws iotanalytics create-pipeline \
--pipeline-name "TelemetryPipeline" \
--pipeline-activities '[
{
"channel": {
"name": "ChannelActivity",
"channelName": "TelemetryChannel",
"next": "FilterActivity"
}
},
{
"filter": {
"name": "FilterActivity",
"filter": "temperature > -40 AND temperature < 120",
"next": "EnrichActivity"
}
},
{
"lambda": {
"name": "EnrichActivity",
"lambdaName": "EnrichDeviceLocation",
"batchSize": 10,
"next": "DatastoreActivity"
}
},
{
"datastore": {
"name": "DatastoreActivity",
"datastoreName": "TelemetryDatastore"
}
}
]'Lambda enrichment in a pipeline batches messages up to the configured batchSize (max 1000). Design your Lambda to handle arrays of messages, not single messages, to avoid per-message invocation costs.
Datasets: Materializing Query Results on a Schedule
Datasets are SQL queries that run against a Datastore. Results are materialized to S3 and can trigger downstream systems. They are the primary way to expose analytics-ready data to BI tools, notebooks, and applications.
# Create a dataset with scheduled refresh
aws iotanalytics create-dataset \
--dataset-name "HourlyTemperatureAverages" \
--actions '[{
"actionName": "SqlAction",
"queryAction": {
"sqlQuery": "SELECT deviceId, AVG(temperature) as avg_temp, MAX(temperature) as max_temp, COUNT(*) as reading_count, date_trunc('hour', __dt) as hour FROM TelemetryDatastore WHERE __dt > current_timestamp - interval '1' hour GROUP BY deviceId, date_trunc('hour', __dt)"
}
}]' \
--triggers '[{
"schedule": {
"expression": "cron(0 * * * ? *)"
}
}]' \
--content-delivery-rules '[{
"destination": {
"s3DestinationConfiguration": {
"bucket": "my-analytics-bucket",
"key": "hourly-averages/!{iotanalytics:scheduleTime}/results.csv",
"roleArn": "arn:aws:iam::123456789:role/iotanalytics-role"
}
}
}]'Dataset SQL uses Presto (now Trino) syntax, not standard ANSI SQL in all cases. Test date arithmetic and window functions carefully. The special column __dt is the ingestion timestamp injected by IoT Analytics - use it for time-range filters.
IoT Analytics vs Alternatives: When to Use Each
IoT Analytics is not always the right choice. Understand its tradeoffs against other AWS analytics options.
| Service | Best For | Not Ideal For |
|---|---|---|
| IoT Analytics | IoT-specific pipelines, device registry enrichment, managed schema-on-read storage for device telemetry | General-purpose analytics, sub-second latency queries, very high-velocity streams (use Kinesis first) |
| Kinesis Data Streams + Firehose | High-velocity real-time streaming, fan-out to multiple consumers, millisecond-latency processing | Complex device enrichment, scheduled dataset materialization, built-in IoT integration |
| Timestream | Time-series specific workloads, adaptive query engine for recent vs historical data, millisecond query latency | Complex ETL transformations, non-time-series IoT data |
| S3 + Athena | Ad hoc SQL on raw data, lowest cost for infrequent queries, existing S3 data lake | Real-time ingestion, device-aware enrichment, managed pipeline |
| OpenSearch | Full-text search on device logs, real-time dashboards, anomaly detection | Long-term cost-effective storage, SQL analytics, structured telemetry aggregation |
Pricing and Cost Considerations
| Dimension | Price | Notes |
|---|---|---|
| Message ingestion | $0.08 per GB ingested | Charged at the Channel; applies to all messages regardless of pipeline outcome |
| Pipeline processing | $0.05 per GB processed | Each activity in the pipeline counts separately |
| Datastore storage | $0.023 per GB per month | Same as S3 standard for managed storage |
| Dataset query | $5 per TB queried | Same pricing model as Athena |
| Dataset results storage | $0.023 per GB per month | Results stored in S3 via content delivery rules |
Cost optimization: use selectAttributes or removeAttributes early in your pipeline to drop fields you do not need. This reduces the GB processed by downstream activities and reduces datastore storage costs.
Interview Focus Points
- 1Walk me through the data flow from an IoT device publishing a message to a queryable dataset in IoT Analytics.
- 2How would you enrich device messages with data from an external database using IoT Analytics pipelines?
- 3When would you choose IoT Analytics over Kinesis Data Firehose for IoT telemetry?
- 4What is the __dt column in IoT Analytics and why does it matter for queries?
- 5How do you schedule a dataset to refresh hourly and deliver results to S3 for a downstream Athena query?
- 6Explain how the deviceRegistryEnrich activity reduces the need to embed metadata in device messages.
- 7What are the retention period settings in IoT Analytics and how do they affect cost?