Ace Cloud Interviews
Home/AWS Tutorial/IoT Device Management
📡

AWS Internet of Things

IoT Device Management

Onboard, organize, monitor, and remotely manage IoT devices at scale

AWS IoT Device Management is a service for onboarding, organizing, monitoring, and remotely managing IoT devices at scale. It covers the full device lifecycle from provisioning millions of new devices to pushing remote commands, bulk operations, and fleet-wide index queries. Cloud engineers use it to operationalize large IoT deployments where manual device management is not feasible.

Device Provisioning: Four Methods for Onboarding at Scale

IoT Device Management supports several provisioning strategies. The right choice depends on whether devices are pre-provisioned in manufacturing, provisioned in the field, or provisioned by untrusted actors.

Provisioning MethodHow It WorksBest For
Single-thing provisioningYou call CreateThing, CreateKeysAndCertificate, AttachPolicy, AttachThingPrincipal via API or consoleDevelopment, small batches, individually managed devices
Bulk provisioningUpload a CSV of device records to S3; AWS runs a provisioning template in batchFactory provisioning of thousands of devices before shipping
Fleet provisioning (by claim)Device ships with a claim certificate; on first boot it calls RegisterThing which exchanges the claim for a unique certificateField provisioning by untrusted installers; devices installed by end customers
Just-in-Time Provisioning (JITP)Device connects with a certificate from a registered CA; IoT Core auto-runs a provisioning template on first connectFactory-issued CA certificates; fully automated zero-touch provisioning
Just-in-Time Registration (JITR)Similar to JITP but you handle registration via Lambda triggered by an MQTT eventCustom provisioning logic - validation, external lookups, conditional approval
⚠️

Fleet provisioning claim certificates must be rotated carefully. If a claim certificate is leaked, an attacker can provision unlimited devices in your account. Scope claim certificate IoT policies to only allow RegisterThing and limit the number of times a claim certificate can be used by setting provisioningRoleArn permissions tightly.

Fleet Indexing: Query Your Entire Device Fleet with SQL

Fleet indexing creates a searchable index of your Things, Thing Groups, Device Shadows, and Device Defender violations. You can query the index using SQL-like syntax to find devices by any attribute or state.

bash
# Enable fleet indexing with shadow and connectivity data
aws iot update-indexing-configuration \
  --thing-indexing-configuration '{
    "thingIndexingMode": "REGISTRY_AND_SHADOW",
    "thingConnectivityIndexingMode": "STATUS",
    "deviceDefenderIndexingMode": "VIOLATIONS",
    "managedFields": [
      {"name": "shadow.reported.temperature", "type": "Number"},
      {"name": "shadow.reported.firmware_version", "type": "String"}
    ]
  }'

# Query: find all connected devices running firmware older than 2.1.0
aws iot search-index \
  --index-name "AWS_Things" \
  --query-string "connectivity.connected:true AND shadow.reported.firmware_version < \"2.1.0\""

# Query: find devices with temperature above 75 that are in the FactoryFloor group
aws iot search-index \
  --index-name "AWS_Things" \
  --query-string "thingGroupNames:FactoryFloor AND shadow.reported.temperature > 75"

# Get aggregation stats across the fleet
aws iot get-statistics \
  --index-name "AWS_Things" \
  --query-string "connectivity.connected:true" \
  --aggregation-field "shadow.reported.temperature"
💡

Fleet indexing has an additional cost per indexed field operation. Only index the shadow fields you actually query. Indexing every shadow field in a large fleet can add significant cost. Review the indexed field list quarterly and remove unused fields.

IoT Jobs: Remote Operations with Controlled Rollouts

IoT Jobs enable you to send remote operations (firmware updates, configuration changes, diagnostic commands) to a target set of devices. Jobs support rollout rate control, abort criteria, and retry policies to protect fleet availability.

Jobs FeatureDescriptionWhy It Matters
Job documentJSON document stored in S3 describing the operation - typically a download URL + commandsThe device agent reads this to know what to do
TargetA Thing, Thing Group, or explicit list of Thing ARNsScope the job to the right devices; Thing Groups enable dynamic membership
Rollout configControls how many devices receive the job per minute; supports exponential ramp-upPrevents a bad firmware from bricking your entire fleet simultaneously
Abort criteriaCancels the job if a percentage of devices fail or time outAutomatic safety net - bad firmware update stops before reaching all devices
Retry criteriaRetries the job for devices that fail or time outHandles transient connectivity failures during OTA updates
Continuous jobsJob runs on all new devices that join a Thing Group dynamicallyBootstrap configuration - ensures new devices always get the standard config
bash
# Create a firmware update job with controlled rollout and abort criteria
aws iot create-job \
  --job-id "firmware-2.2.0-rollout" \
  --targets "arn:aws:iot:us-east-1:123456789:thinggroup/ProductionDevices" \
  --document-source "s3://my-firmware-bucket/jobs/firmware-2.2.0.json" \
  --job-executions-rollout-config '{
    "exponentialRate": {
      "baseRatePerMinute": 10,
      "incrementFactor": 1.5,
      "rateIncreaseCriteria": {
        "numberOfSucceededThings": 50
      }
    }
  }' \
  --abort-config '{
    "criteriaList": [{
      "failureType": "FAILED",
      "action": "CANCEL",
      "thresholdPercentage": 5.0,
      "minNumberOfExecutedThings": 20
    }]
  }' \
  --timeout-config '{"inProgressTimeoutInMinutes": 30}'
💡

Design your abort criteria with a minNumberOfExecutedThings that is large enough to be statistically meaningful but small enough to catch problems early. A threshold of 20 devices before aborting at 5% failure means the job stops after 1 device fails once 20 have been attempted - appropriate for a fleet of 10,000.

Thing Groups, Thing Types, and Billing Groups

Organizing your device fleet is essential for operational management. IoT Device Management provides three orthogonal ways to categorize devices.

ConceptPurposeDynamic vs Static
Thing TypeTemplate for common attributes shared by a device model (e.g., "TemperatureSensor_v3"); enforces consistent attribute namesStatic - defined at creation, one type per thing
Static Thing GroupManually curated group; membership managed explicitly via API or consoleStatic - you add/remove devices explicitly
Dynamic Thing GroupSQL query over the fleet index; membership updates automatically as device attributes/state changesDynamic - membership based on real-time indexed state
Billing GroupAssociates a device with a cost allocation tag for billing attributionStatic - one billing group per thing
💡

Dynamic Thing Groups require fleet indexing to be enabled. They are extremely powerful for operational targeting - for example, create a dynamic group "devices with firmware < 2.0" and target all IoT Jobs at that group. As devices update, they automatically leave the group without any manual management.

Bulk Operations: Managing Thousands of Devices at Once

When you need to perform operations across thousands or millions of devices, individual API calls are not practical. IoT Device Management provides bulk operations for Thing CRUD, group management, and certificate operations.

bash
# Bulk provision devices from a CSV file in S3
# CSV format: ThingName, SerialNumber, Location
# mydevices.csv:
# Device001,SN123,warehouse-a
# Device002,SN124,warehouse-b

# Start bulk provisioning job
aws iot start-thing-registration-task \
  --template-body file://provisioning-template.json \
  --input-file-bucket "my-provisioning-bucket" \
  --input-file-key "batch/mydevices.csv" \
  --role-arn "arn:aws:iam::123456789:role/iot-provisioning-role"

# Monitor progress
aws iot describe-thing-registration-task \
  --task-id <task-id>

# List failures for a bulk task
aws iot list-thing-registration-task-reports \
  --task-id <task-id> \
  --report-type ERRORS
Bulk OperationAPIMax Batch Size
Bulk Thing provisioningStartThingRegistrationTaskUnlimited via CSV; AWS manages batching internally
Add Things to GroupAddThingToThingGroup (single) or bulk via Jobs10 per API call; use Jobs for fleet-scale group changes
Bulk certificate updateUpdateCertificates (batch)Up to 10 certificates per call; loop for large batches
🎯

Interview Focus Points

  • 1Compare fleet provisioning by claim versus Just-in-Time Provisioning (JITP) - when would you use each?
  • 2How would you design a zero-touch provisioning flow for consumer IoT devices that customers install themselves?
  • 3Explain IoT Jobs rollout configuration - how do exponential rollout and abort criteria work together?
  • 4How does fleet indexing enable operational visibility and what are its cost implications?
  • 5What is the difference between a static Thing Group and a dynamic Thing Group - give a real use case for each?
  • 6You need to find all 50,000 devices in your fleet running firmware older than version 3.0 and push an update. Walk me through the complete process.
  • 7How would you use Continuous Jobs to ensure every new device added to a fleet gets a baseline configuration?
  • 8What are the security risks of fleet provisioning claim certificates and how do you mitigate them?