AWS Internet of Things
IoT Device Management
Onboard, organize, monitor, and remotely manage IoT devices at scale
AWS IoT Device Management is a service for onboarding, organizing, monitoring, and remotely managing IoT devices at scale. It covers the full device lifecycle from provisioning millions of new devices to pushing remote commands, bulk operations, and fleet-wide index queries. Cloud engineers use it to operationalize large IoT deployments where manual device management is not feasible.
Device Provisioning: Four Methods for Onboarding at Scale
IoT Device Management supports several provisioning strategies. The right choice depends on whether devices are pre-provisioned in manufacturing, provisioned in the field, or provisioned by untrusted actors.
| Provisioning Method | How It Works | Best For |
|---|---|---|
| Single-thing provisioning | You call CreateThing, CreateKeysAndCertificate, AttachPolicy, AttachThingPrincipal via API or console | Development, small batches, individually managed devices |
| Bulk provisioning | Upload a CSV of device records to S3; AWS runs a provisioning template in batch | Factory provisioning of thousands of devices before shipping |
| Fleet provisioning (by claim) | Device ships with a claim certificate; on first boot it calls RegisterThing which exchanges the claim for a unique certificate | Field provisioning by untrusted installers; devices installed by end customers |
| Just-in-Time Provisioning (JITP) | Device connects with a certificate from a registered CA; IoT Core auto-runs a provisioning template on first connect | Factory-issued CA certificates; fully automated zero-touch provisioning |
| Just-in-Time Registration (JITR) | Similar to JITP but you handle registration via Lambda triggered by an MQTT event | Custom provisioning logic - validation, external lookups, conditional approval |
Fleet provisioning claim certificates must be rotated carefully. If a claim certificate is leaked, an attacker can provision unlimited devices in your account. Scope claim certificate IoT policies to only allow RegisterThing and limit the number of times a claim certificate can be used by setting provisioningRoleArn permissions tightly.
Fleet Indexing: Query Your Entire Device Fleet with SQL
Fleet indexing creates a searchable index of your Things, Thing Groups, Device Shadows, and Device Defender violations. You can query the index using SQL-like syntax to find devices by any attribute or state.
# Enable fleet indexing with shadow and connectivity data
aws iot update-indexing-configuration \
--thing-indexing-configuration '{
"thingIndexingMode": "REGISTRY_AND_SHADOW",
"thingConnectivityIndexingMode": "STATUS",
"deviceDefenderIndexingMode": "VIOLATIONS",
"managedFields": [
{"name": "shadow.reported.temperature", "type": "Number"},
{"name": "shadow.reported.firmware_version", "type": "String"}
]
}'
# Query: find all connected devices running firmware older than 2.1.0
aws iot search-index \
--index-name "AWS_Things" \
--query-string "connectivity.connected:true AND shadow.reported.firmware_version < \"2.1.0\""
# Query: find devices with temperature above 75 that are in the FactoryFloor group
aws iot search-index \
--index-name "AWS_Things" \
--query-string "thingGroupNames:FactoryFloor AND shadow.reported.temperature > 75"
# Get aggregation stats across the fleet
aws iot get-statistics \
--index-name "AWS_Things" \
--query-string "connectivity.connected:true" \
--aggregation-field "shadow.reported.temperature"Fleet indexing has an additional cost per indexed field operation. Only index the shadow fields you actually query. Indexing every shadow field in a large fleet can add significant cost. Review the indexed field list quarterly and remove unused fields.
IoT Jobs: Remote Operations with Controlled Rollouts
IoT Jobs enable you to send remote operations (firmware updates, configuration changes, diagnostic commands) to a target set of devices. Jobs support rollout rate control, abort criteria, and retry policies to protect fleet availability.
| Jobs Feature | Description | Why It Matters |
|---|---|---|
| Job document | JSON document stored in S3 describing the operation - typically a download URL + commands | The device agent reads this to know what to do |
| Target | A Thing, Thing Group, or explicit list of Thing ARNs | Scope the job to the right devices; Thing Groups enable dynamic membership |
| Rollout config | Controls how many devices receive the job per minute; supports exponential ramp-up | Prevents a bad firmware from bricking your entire fleet simultaneously |
| Abort criteria | Cancels the job if a percentage of devices fail or time out | Automatic safety net - bad firmware update stops before reaching all devices |
| Retry criteria | Retries the job for devices that fail or time out | Handles transient connectivity failures during OTA updates |
| Continuous jobs | Job runs on all new devices that join a Thing Group dynamically | Bootstrap configuration - ensures new devices always get the standard config |
# Create a firmware update job with controlled rollout and abort criteria
aws iot create-job \
--job-id "firmware-2.2.0-rollout" \
--targets "arn:aws:iot:us-east-1:123456789:thinggroup/ProductionDevices" \
--document-source "s3://my-firmware-bucket/jobs/firmware-2.2.0.json" \
--job-executions-rollout-config '{
"exponentialRate": {
"baseRatePerMinute": 10,
"incrementFactor": 1.5,
"rateIncreaseCriteria": {
"numberOfSucceededThings": 50
}
}
}' \
--abort-config '{
"criteriaList": [{
"failureType": "FAILED",
"action": "CANCEL",
"thresholdPercentage": 5.0,
"minNumberOfExecutedThings": 20
}]
}' \
--timeout-config '{"inProgressTimeoutInMinutes": 30}'Design your abort criteria with a minNumberOfExecutedThings that is large enough to be statistically meaningful but small enough to catch problems early. A threshold of 20 devices before aborting at 5% failure means the job stops after 1 device fails once 20 have been attempted - appropriate for a fleet of 10,000.
Thing Groups, Thing Types, and Billing Groups
Organizing your device fleet is essential for operational management. IoT Device Management provides three orthogonal ways to categorize devices.
| Concept | Purpose | Dynamic vs Static |
|---|---|---|
| Thing Type | Template for common attributes shared by a device model (e.g., "TemperatureSensor_v3"); enforces consistent attribute names | Static - defined at creation, one type per thing |
| Static Thing Group | Manually curated group; membership managed explicitly via API or console | Static - you add/remove devices explicitly |
| Dynamic Thing Group | SQL query over the fleet index; membership updates automatically as device attributes/state changes | Dynamic - membership based on real-time indexed state |
| Billing Group | Associates a device with a cost allocation tag for billing attribution | Static - one billing group per thing |
Dynamic Thing Groups require fleet indexing to be enabled. They are extremely powerful for operational targeting - for example, create a dynamic group "devices with firmware < 2.0" and target all IoT Jobs at that group. As devices update, they automatically leave the group without any manual management.
Bulk Operations: Managing Thousands of Devices at Once
When you need to perform operations across thousands or millions of devices, individual API calls are not practical. IoT Device Management provides bulk operations for Thing CRUD, group management, and certificate operations.
# Bulk provision devices from a CSV file in S3
# CSV format: ThingName, SerialNumber, Location
# mydevices.csv:
# Device001,SN123,warehouse-a
# Device002,SN124,warehouse-b
# Start bulk provisioning job
aws iot start-thing-registration-task \
--template-body file://provisioning-template.json \
--input-file-bucket "my-provisioning-bucket" \
--input-file-key "batch/mydevices.csv" \
--role-arn "arn:aws:iam::123456789:role/iot-provisioning-role"
# Monitor progress
aws iot describe-thing-registration-task \
--task-id <task-id>
# List failures for a bulk task
aws iot list-thing-registration-task-reports \
--task-id <task-id> \
--report-type ERRORS| Bulk Operation | API | Max Batch Size |
|---|---|---|
| Bulk Thing provisioning | StartThingRegistrationTask | Unlimited via CSV; AWS manages batching internally |
| Add Things to Group | AddThingToThingGroup (single) or bulk via Jobs | 10 per API call; use Jobs for fleet-scale group changes |
| Bulk certificate update | UpdateCertificates (batch) | Up to 10 certificates per call; loop for large batches |
Interview Focus Points
- 1Compare fleet provisioning by claim versus Just-in-Time Provisioning (JITP) - when would you use each?
- 2How would you design a zero-touch provisioning flow for consumer IoT devices that customers install themselves?
- 3Explain IoT Jobs rollout configuration - how do exponential rollout and abort criteria work together?
- 4How does fleet indexing enable operational visibility and what are its cost implications?
- 5What is the difference between a static Thing Group and a dynamic Thing Group - give a real use case for each?
- 6You need to find all 50,000 devices in your fleet running firmware older than version 3.0 and push an update. Walk me through the complete process.
- 7How would you use Continuous Jobs to ensure every new device added to a fleet gets a baseline configuration?
- 8What are the security risks of fleet provisioning claim certificates and how do you mitigate them?