AWS Internet of Things
IoT Device Defender
Audit device configurations and detect anomalous behavior to secure your IoT fleet
AWS IoT Device Defender is a security service that continuously audits IoT device configurations against security best practices and detects anomalous device behavior using machine learning. It covers two distinct functions: audit (finding configuration violations in your IoT setup) and detect (monitoring device behavior at runtime for anomalies). For cloud engineers, it is the primary tool for maintaining security posture across large IoT fleets.
Audit vs Detect: Two Distinct Security Functions
Device Defender has two independent pillars. Audit runs scheduled checks against your IoT configuration in the cloud. Detect monitors device behavior metrics reported by device agents or IoT Core at runtime.
| Audit | Detect | |
|---|---|---|
| What it checks | IoT Core configuration: policies, certificates, logging settings | Device runtime behavior: message rates, connection patterns, port/IP usage |
| When it runs | Scheduled (daily, weekly, etc.) or on-demand | Continuous real-time monitoring |
| Data source | IoT Core service APIs and metadata | Metrics reported by a Device Defender agent on the device, or cloud-side metrics from IoT Core |
| Finding output | Audit findings categorized by severity (Critical, High, Medium, Low) | Violations when metrics deviate from ML model or static thresholds |
| Remediation | Manual, or automated via IoT Core Jobs + Lambda | Alerts via CloudWatch or SNS; quarantine via dynamic policy group |
Audit Checks: What Gets Scanned and Why
Audit checks cover common IoT security misconfigurations. Each check can be enabled or disabled independently.
| Audit Check | What It Finds | Risk if Violated |
|---|---|---|
| DEVICE_CERTIFICATE_EXPIRING_CHECK | Certificates expiring within 30 days | Devices will lose connectivity when cert expires - can take down entire fleet |
| REVOKED_DEVICE_CERTIFICATE_CHECK | Active Things with revoked certificates | Revoked device may still connect if policy was not removed |
| LOGGING_DISABLED_CHECK | IoT Core logging not enabled | No visibility into connection failures, rule errors, or policy denials |
| CA_CERTIFICATE_EXPIRING_CHECK | CA certificates used for Just-In-Time Registration expiring within 30 days | New devices cannot auto-register; mass provisioning failure |
| IOT_POLICY_OVERLY_PERMISSIVE_CHECK | Policies with wildcards (*) on topics or actions | A compromised device can publish/subscribe to any topic in the account |
| SHARED_COGNITO_IDENTITY_CHECK | Multiple devices sharing one Cognito identity | Cannot revoke one device without affecting all; blast radius for compromise is entire group |
| ROLE_ALIAS_OVERLY_PERMISSIVE_CHECK | Role aliases granting excessive AWS permissions | Compromised device gets broad AWS API access beyond IoT |
# Enable an audit with all critical checks
aws iot update-account-audit-configuration \
--audit-notification-target-configurations '{
"SNS": {
"targetArn": "arn:aws:sns:us-east-1:123456789:iot-security-alerts",
"roleArn": "arn:aws:iam::123456789:role/iot-defender-role",
"enabled": true
}
}' \
--audit-check-configurations '{
"DEVICE_CERTIFICATE_EXPIRING_CHECK": {"enabled": true},
"LOGGING_DISABLED_CHECK": {"enabled": true},
"IOT_POLICY_OVERLY_PERMISSIVE_CHECK": {"enabled": true},
"CA_CERTIFICATE_EXPIRING_CHECK": {"enabled": true}
}'
# Schedule a daily audit
aws iot create-scheduled-audit \
--scheduled-audit-name "DailySecurityAudit" \
--frequency "DAILY" \
--target-check-names DEVICE_CERTIFICATE_EXPIRING_CHECK LOGGING_DISABLED_CHECK IOT_POLICY_OVERLY_PERMISSIVE_CHECKDetect: Defining and Monitoring Device Behaviors
Detect uses security profiles to define expected behavior for a device or group. Violations are raised when metrics fall outside the defined criteria. You can use ML-based detection (AWS learns the baseline automatically) or static threshold detection.
| Metric | Type | Example Threshold |
|---|---|---|
| Messages sent (cloud-side) | Cloud-side metric from IoT Core | Alert if > 1000 messages in 5 minutes (possible data exfiltration) |
| Messages received (cloud-side) | Cloud-side metric from IoT Core | Alert if > 500 messages received (command injection pattern) |
| Authorization failures | Cloud-side metric from IoT Core | Alert if > 5 authorization failures in 1 minute (policy probing) |
| Source IP | Cloud-side metric from IoT Core | Alert if device connects from a new CIDR not seen before |
| Listening TCP ports | Device-side metric (requires agent) | Alert if device opens a port not in the allowed list |
| Listening UDP ports | Device-side metric (requires agent) | Alert if device opens UDP port outside baseline |
| Packets sent/received | Device-side metric (requires agent) | ML-based anomaly detection on traffic volume |
# Create a security profile with ML anomaly detection
aws iot create-security-profile \
--security-profile-name "TelemetryDeviceProfile" \
--security-profile-description "Monitors message rates and auth failures" \
--behaviors '[
{
"name": "LowMessageRate",
"metric": "aws:num-messages-sent",
"criteria": {
"comparisonOperator": "greater-than",
"value": {"count": 1000},
"durationSeconds": 300,
"consecutiveDatapointsToAlarm": 2,
"consecutiveDatapointsToClear": 2
}
},
{
"name": "AuthFailures",
"metric": "aws:num-authorization-failures",
"criteria": {
"comparisonOperator": "greater-than",
"value": {"count": 5},
"durationSeconds": 60
}
}
]'
# Attach the profile to a Thing Group
aws iot attach-security-profile \
--security-profile-name "TelemetryDeviceProfile" \
--security-profile-target-arn "arn:aws:iot:us-east-1:123456789:thinggroup/TelemetryDevices"ML-based detection requires at least 14 days of training data before it starts generating meaningful violations. Do not rely on ML detect immediately after enabling it for a new device fleet - use static thresholds initially, then transition to ML after the learning period.
Mitigation Actions: Automated Remediation
Device Defender can trigger predefined mitigation actions automatically when audit findings or detect violations occur. This enables automated remediation without manual intervention.
| Mitigation Action Type | What It Does | Example Trigger |
|---|---|---|
| ADD_THINGS_TO_THING_GROUP | Moves violating devices to a quarantine group with restrictive policy | Device showing anomalous message rates gets isolated from production topics |
| PUBLISH_FINDING_TO_SNS | Publishes finding details to an SNS topic for alerting or downstream automation | Trigger a Lambda that creates a Jira ticket for the security team |
| REPLACE_DEFAULT_POLICY_VERSION | Replaces a device's IoT policy with a blank (deny-all) policy version | Immediately revoke access for a device with active violations |
| UPDATE_CA_CERTIFICATE | Marks a CA certificate as inactive to prevent new device registrations | Compromised CA stops issuing new device credentials |
| UPDATE_DEVICE_CERTIFICATE | Revokes a specific device certificate | Decommission a known-compromised device |
Combine ADD_THINGS_TO_THING_GROUP with a pre-created quarantine Thing Group that has a policy allowing only a specific MQTT topic where the device can report its status. This creates a soft quarantine - the device stays connected but cannot send production data until cleared.
Interview Focus Points
- 1What is the difference between Device Defender Audit and Detect - what does each one monitor?
- 2How would you automatically quarantine a device that starts sending an unusually high volume of messages?
- 3Explain what IOT_POLICY_OVERLY_PERMISSIVE_CHECK catches and how to fix the finding.
- 4How long does ML-based anomaly detection take to start working and what should you use in the meantime?
- 5Walk me through a complete incident response flow when Device Defender flags an authorization failure spike from 50 devices simultaneously.
- 6What device-side metrics require installing a Device Defender agent, versus metrics available with no agent?
- 7How would you handle certificate expiry for a fleet of 100,000 devices at scale using Device Defender findings and IoT Jobs?