Ace Cloud Interviews
Home/AWS Tutorial/IoT Analytics
📡

AWS Internet of Things

IoT Analytics

Fully managed analytics service for IoT device data at scale

AWS IoT Analytics is a fully managed service that collects, processes, stores, and analyzes IoT device data at scale without requiring you to manage infrastructure. It provides a pipeline-based architecture for cleaning, transforming, and enriching raw device data before it reaches your analytics tools. Cloud engineers use it to separate raw data ingestion from analytics-ready datasets, enabling complex queries on time-series device data using standard SQL.

How IoT Analytics Works: Channels, Pipelines, and Datastores

IoT Analytics has five core components that data flows through sequentially. Understanding this pipeline architecture is essential for designing IoT data processing systems.

ComponentRoleKey Configuration
ChannelIngest point - receives raw messages from IoT Core rules or direct API callsConfigures retention period for raw messages
PipelineSeries of processing activities that transform data as it flows throughActivities include filter, transform, enrich (via Lambda), attribute, math, deviceRegistryEnrich, selectAttributes, removeAttributes
DatastoreLong-term storage for processed messages, queryable via SQLSupports managed storage or customer-managed S3; configures retention period
DatasetSQL query result materialized on a schedule or on-demandCan trigger export to S3 or call IoT Events; scheduled via cron or on-demand
Data Store (S3 managed)Optimized columnar storage for analytics queriesAutomatically partitioned; queryable via dataset SQL

Data flow: IoT Core rule action sends messages to a Channel. The Channel feeds a Pipeline. The Pipeline writes to a Datastore. Datasets query the Datastore on a schedule and export results.

Pipeline Activities: Transforming Raw Device Data

Pipeline activities are the processing steps that execute on each message as it passes through the pipeline. They run in the order defined and can be chained.

ActivityWhat It DoesExample Use Case
filterDrop messages that do not match a condition (SQL WHERE clause)Discard messages where temperature is null or negative
selectAttributesKeep only specified fields from the messageStrip internal device metadata before storing
removeAttributesRemove specified fields from the messageRemove PII fields like user_id before analytics storage
addAttributes (math)Compute derived fields using math expressionsConvert Fahrenheit to Celsius: (temp - 32) * 5/9
lambdaInvoke a Lambda function for complex enrichmentReverse geocode GPS coordinates to city/region
deviceRegistryEnrichAppend Thing attributes from the IoT registryAdd device manufacturer, firmware version, location from registry metadata
datastoreTerminal activity that writes to the configured datastoreFinal step in every pipeline
bash
# Create a pipeline with filter and Lambda enrichment
aws iotanalytics create-pipeline \
  --pipeline-name "TelemetryPipeline" \
  --pipeline-activities '[
    {
      "channel": {
        "name": "ChannelActivity",
        "channelName": "TelemetryChannel",
        "next": "FilterActivity"
      }
    },
    {
      "filter": {
        "name": "FilterActivity",
        "filter": "temperature > -40 AND temperature < 120",
        "next": "EnrichActivity"
      }
    },
    {
      "lambda": {
        "name": "EnrichActivity",
        "lambdaName": "EnrichDeviceLocation",
        "batchSize": 10,
        "next": "DatastoreActivity"
      }
    },
    {
      "datastore": {
        "name": "DatastoreActivity",
        "datastoreName": "TelemetryDatastore"
      }
    }
  ]'
💡

Lambda enrichment in a pipeline batches messages up to the configured batchSize (max 1000). Design your Lambda to handle arrays of messages, not single messages, to avoid per-message invocation costs.

Datasets: Materializing Query Results on a Schedule

Datasets are SQL queries that run against a Datastore. Results are materialized to S3 and can trigger downstream systems. They are the primary way to expose analytics-ready data to BI tools, notebooks, and applications.

bash
# Create a dataset with scheduled refresh
aws iotanalytics create-dataset \
  --dataset-name "HourlyTemperatureAverages" \
  --actions '[{
    "actionName": "SqlAction",
    "queryAction": {
      "sqlQuery": "SELECT deviceId, AVG(temperature) as avg_temp, MAX(temperature) as max_temp, COUNT(*) as reading_count, date_trunc('hour', __dt) as hour FROM TelemetryDatastore WHERE __dt > current_timestamp - interval '1' hour GROUP BY deviceId, date_trunc('hour', __dt)"
    }
  }]' \
  --triggers '[{
    "schedule": {
      "expression": "cron(0 * * * ? *)"
    }
  }]' \
  --content-delivery-rules '[{
    "destination": {
      "s3DestinationConfiguration": {
        "bucket": "my-analytics-bucket",
        "key": "hourly-averages/!{iotanalytics:scheduleTime}/results.csv",
        "roleArn": "arn:aws:iam::123456789:role/iotanalytics-role"
      }
    }
  }]'
⚠️

Dataset SQL uses Presto (now Trino) syntax, not standard ANSI SQL in all cases. Test date arithmetic and window functions carefully. The special column __dt is the ingestion timestamp injected by IoT Analytics - use it for time-range filters.

IoT Analytics vs Alternatives: When to Use Each

IoT Analytics is not always the right choice. Understand its tradeoffs against other AWS analytics options.

ServiceBest ForNot Ideal For
IoT AnalyticsIoT-specific pipelines, device registry enrichment, managed schema-on-read storage for device telemetryGeneral-purpose analytics, sub-second latency queries, very high-velocity streams (use Kinesis first)
Kinesis Data Streams + FirehoseHigh-velocity real-time streaming, fan-out to multiple consumers, millisecond-latency processingComplex device enrichment, scheduled dataset materialization, built-in IoT integration
TimestreamTime-series specific workloads, adaptive query engine for recent vs historical data, millisecond query latencyComplex ETL transformations, non-time-series IoT data
S3 + AthenaAd hoc SQL on raw data, lowest cost for infrequent queries, existing S3 data lakeReal-time ingestion, device-aware enrichment, managed pipeline
OpenSearchFull-text search on device logs, real-time dashboards, anomaly detectionLong-term cost-effective storage, SQL analytics, structured telemetry aggregation

Pricing and Cost Considerations

DimensionPriceNotes
Message ingestion$0.08 per GB ingestedCharged at the Channel; applies to all messages regardless of pipeline outcome
Pipeline processing$0.05 per GB processedEach activity in the pipeline counts separately
Datastore storage$0.023 per GB per monthSame as S3 standard for managed storage
Dataset query$5 per TB queriedSame pricing model as Athena
Dataset results storage$0.023 per GB per monthResults stored in S3 via content delivery rules
💡

Cost optimization: use selectAttributes or removeAttributes early in your pipeline to drop fields you do not need. This reduces the GB processed by downstream activities and reduces datastore storage costs.

🎯

Interview Focus Points

  • 1Walk me through the data flow from an IoT device publishing a message to a queryable dataset in IoT Analytics.
  • 2How would you enrich device messages with data from an external database using IoT Analytics pipelines?
  • 3When would you choose IoT Analytics over Kinesis Data Firehose for IoT telemetry?
  • 4What is the __dt column in IoT Analytics and why does it matter for queries?
  • 5How do you schedule a dataset to refresh hourly and deliver results to S3 for a downstream Athena query?
  • 6Explain how the deviceRegistryEnrich activity reduces the need to embed metadata in device messages.
  • 7What are the retention period settings in IoT Analytics and how do they affect cost?