IoT Analytics

Fully managed analytics service for IoT device data at scale

AWS IoT Analytics is a fully managed service that collects, processes, stores, and analyzes IoT device data at scale without requiring you to manage infrastructure. It provides a pipeline-based architecture for cleaning, transforming, and enriching raw device data before it reaches your analytics tools. Cloud engineers use it to separate raw data ingestion from analytics-ready datasets, enabling complex queries on time-series device data using standard SQL.

How IoT Analytics Works: Channels, Pipelines, and Datastores

IoT Analytics has five core components that data flows through sequentially. Understanding this pipeline architecture is essential for designing IoT data processing systems.

Component	Role	Key Configuration
Channel	Ingest point - receives raw messages from IoT Core rules or direct API calls	Configures retention period for raw messages
Pipeline	Series of processing activities that transform data as it flows through	Activities include filter, transform, enrich (via Lambda), attribute, math, deviceRegistryEnrich, selectAttributes, removeAttributes
Datastore	Long-term storage for processed messages, queryable via SQL	Supports managed storage or customer-managed S3; configures retention period
Dataset	SQL query result materialized on a schedule or on-demand	Can trigger export to S3 or call IoT Events; scheduled via cron or on-demand
Data Store (S3 managed)	Optimized columnar storage for analytics queries	Automatically partitioned; queryable via dataset SQL

Data flow: IoT Core rule action sends messages to a Channel. The Channel feeds a Pipeline. The Pipeline writes to a Datastore. Datasets query the Datastore on a schedule and export results.

Pipeline Activities: Transforming Raw Device Data

Pipeline activities are the processing steps that execute on each message as it passes through the pipeline. They run in the order defined and can be chained.

Activity	What It Does	Example Use Case
filter	Drop messages that do not match a condition (SQL WHERE clause)	Discard messages where temperature is null or negative
selectAttributes	Keep only specified fields from the message	Strip internal device metadata before storing
removeAttributes	Remove specified fields from the message	Remove PII fields like user_id before analytics storage
addAttributes (math)	Compute derived fields using math expressions	Convert Fahrenheit to Celsius: (temp - 32) * 5/9
lambda	Invoke a Lambda function for complex enrichment	Reverse geocode GPS coordinates to city/region
deviceRegistryEnrich	Append Thing attributes from the IoT registry	Add device manufacturer, firmware version, location from registry metadata
datastore	Terminal activity that writes to the configured datastore	Final step in every pipeline

bash

# Create a pipeline with filter and Lambda enrichment
aws iotanalytics create-pipeline \
  --pipeline-name "TelemetryPipeline" \
  --pipeline-activities '[
    {
      "channel": {
        "name": "ChannelActivity",
        "channelName": "TelemetryChannel",
        "next": "FilterActivity"
      }
    },
    {
      "filter": {
        "name": "FilterActivity",
        "filter": "temperature > -40 AND temperature < 120",
        "next": "EnrichActivity"
      }
    },
    {
      "lambda": {
        "name": "EnrichActivity",
        "lambdaName": "EnrichDeviceLocation",
        "batchSize": 10,
        "next": "DatastoreActivity"
      }
    },
    {
      "datastore": {
        "name": "DatastoreActivity",
        "datastoreName": "TelemetryDatastore"
      }
    }
  ]'

💡

Lambda enrichment in a pipeline batches messages up to the configured batchSize (max 1000). Design your Lambda to handle arrays of messages, not single messages, to avoid per-message invocation costs.

Datasets: Materializing Query Results on a Schedule

Datasets are SQL queries that run against a Datastore. Results are materialized to S3 and can trigger downstream systems. They are the primary way to expose analytics-ready data to BI tools, notebooks, and applications.

bash

# Create a dataset with scheduled refresh
aws iotanalytics create-dataset \
  --dataset-name "HourlyTemperatureAverages" \
  --actions '[{
    "actionName": "SqlAction",
    "queryAction": {
      "sqlQuery": "SELECT deviceId, AVG(temperature) as avg_temp, MAX(temperature) as max_temp, COUNT(*) as reading_count, date_trunc('hour', __dt) as hour FROM TelemetryDatastore WHERE __dt > current_timestamp - interval '1' hour GROUP BY deviceId, date_trunc('hour', __dt)"
    }
  }]' \
  --triggers '[{
    "schedule": {
      "expression": "cron(0 * * * ? *)"
    }
  }]' \
  --content-delivery-rules '[{
    "destination": {
      "s3DestinationConfiguration": {
        "bucket": "my-analytics-bucket",
        "key": "hourly-averages/!{iotanalytics:scheduleTime}/results.csv",
        "roleArn": "arn:aws:iam::123456789:role/iotanalytics-role"
      }
    }
  }]'

⚠️

Dataset SQL uses Presto (now Trino) syntax, not standard ANSI SQL in all cases. Test date arithmetic and window functions carefully. The special column __dt is the ingestion timestamp injected by IoT Analytics - use it for time-range filters.

IoT Analytics vs Alternatives: When to Use Each

IoT Analytics is not always the right choice. Understand its tradeoffs against other AWS analytics options.

Service	Best For	Not Ideal For
IoT Analytics	IoT-specific pipelines, device registry enrichment, managed schema-on-read storage for device telemetry	General-purpose analytics, sub-second latency queries, very high-velocity streams (use Kinesis first)
Kinesis Data Streams + Firehose	High-velocity real-time streaming, fan-out to multiple consumers, millisecond-latency processing	Complex device enrichment, scheduled dataset materialization, built-in IoT integration
Timestream	Time-series specific workloads, adaptive query engine for recent vs historical data, millisecond query latency	Complex ETL transformations, non-time-series IoT data
S3 + Athena	Ad hoc SQL on raw data, lowest cost for infrequent queries, existing S3 data lake	Real-time ingestion, device-aware enrichment, managed pipeline
OpenSearch	Full-text search on device logs, real-time dashboards, anomaly detection	Long-term cost-effective storage, SQL analytics, structured telemetry aggregation

Pricing and Cost Considerations

Dimension	Price	Notes
Message ingestion	$0.08 per GB ingested	Charged at the Channel; applies to all messages regardless of pipeline outcome
Pipeline processing	$0.05 per GB processed	Each activity in the pipeline counts separately
Datastore storage	$0.023 per GB per month	Same as S3 standard for managed storage
Dataset query	$5 per TB queried	Same pricing model as Athena
Dataset results storage	$0.023 per GB per month	Results stored in S3 via content delivery rules

💡

Cost optimization: use selectAttributes or removeAttributes early in your pipeline to drop fields you do not need. This reduces the GB processed by downstream activities and reduces datastore storage costs.

🎯

Interview Focus Points

1Walk me through the data flow from an IoT device publishing a message to a queryable dataset in IoT Analytics.
2How would you enrich device messages with data from an external database using IoT Analytics pipelines?
3When would you choose IoT Analytics over Kinesis Data Firehose for IoT telemetry?
4What is the __dt column in IoT Analytics and why does it matter for queries?
5How do you schedule a dataset to refresh hourly and deliver results to S3 for a downstream Athena query?
6Explain how the deviceRegistryEnrich activity reduces the need to embed metadata in device messages.
7What are the retention period settings in IoT Analytics and how do they affect cost?