IoT Device Defender

Audit device configurations and detect anomalous behavior to secure your IoT fleet

AWS IoT Device Defender is a security service that continuously audits IoT device configurations against security best practices and detects anomalous device behavior using machine learning. It covers two distinct functions: audit (finding configuration violations in your IoT setup) and detect (monitoring device behavior at runtime for anomalies). For cloud engineers, it is the primary tool for maintaining security posture across large IoT fleets.

Audit vs Detect: Two Distinct Security Functions

Device Defender has two independent pillars. Audit runs scheduled checks against your IoT configuration in the cloud. Detect monitors device behavior metrics reported by device agents or IoT Core at runtime.

	Audit	Detect
What it checks	IoT Core configuration: policies, certificates, logging settings	Device runtime behavior: message rates, connection patterns, port/IP usage
When it runs	Scheduled (daily, weekly, etc.) or on-demand	Continuous real-time monitoring
Data source	IoT Core service APIs and metadata	Metrics reported by a Device Defender agent on the device, or cloud-side metrics from IoT Core
Finding output	Audit findings categorized by severity (Critical, High, Medium, Low)	Violations when metrics deviate from ML model or static thresholds
Remediation	Manual, or automated via IoT Core Jobs + Lambda	Alerts via CloudWatch or SNS; quarantine via dynamic policy group

Audit Checks: What Gets Scanned and Why

Audit checks cover common IoT security misconfigurations. Each check can be enabled or disabled independently.

Audit Check	What It Finds	Risk if Violated
DEVICE_CERTIFICATE_EXPIRING_CHECK	Certificates expiring within 30 days	Devices will lose connectivity when cert expires - can take down entire fleet
REVOKED_DEVICE_CERTIFICATE_CHECK	Active Things with revoked certificates	Revoked device may still connect if policy was not removed
LOGGING_DISABLED_CHECK	IoT Core logging not enabled	No visibility into connection failures, rule errors, or policy denials
CA_CERTIFICATE_EXPIRING_CHECK	CA certificates used for Just-In-Time Registration expiring within 30 days	New devices cannot auto-register; mass provisioning failure
IOT_POLICY_OVERLY_PERMISSIVE_CHECK	Policies with wildcards (*) on topics or actions	A compromised device can publish/subscribe to any topic in the account
SHARED_COGNITO_IDENTITY_CHECK	Multiple devices sharing one Cognito identity	Cannot revoke one device without affecting all; blast radius for compromise is entire group
ROLE_ALIAS_OVERLY_PERMISSIVE_CHECK	Role aliases granting excessive AWS permissions	Compromised device gets broad AWS API access beyond IoT

bash

# Enable an audit with all critical checks
aws iot update-account-audit-configuration \
  --audit-notification-target-configurations '{
    "SNS": {
      "targetArn": "arn:aws:sns:us-east-1:123456789:iot-security-alerts",
      "roleArn": "arn:aws:iam::123456789:role/iot-defender-role",
      "enabled": true
    }
  }' \
  --audit-check-configurations '{
    "DEVICE_CERTIFICATE_EXPIRING_CHECK": {"enabled": true},
    "LOGGING_DISABLED_CHECK": {"enabled": true},
    "IOT_POLICY_OVERLY_PERMISSIVE_CHECK": {"enabled": true},
    "CA_CERTIFICATE_EXPIRING_CHECK": {"enabled": true}
  }'

# Schedule a daily audit
aws iot create-scheduled-audit \
  --scheduled-audit-name "DailySecurityAudit" \
  --frequency "DAILY" \
  --target-check-names DEVICE_CERTIFICATE_EXPIRING_CHECK LOGGING_DISABLED_CHECK IOT_POLICY_OVERLY_PERMISSIVE_CHECK

Detect: Defining and Monitoring Device Behaviors

Detect uses security profiles to define expected behavior for a device or group. Violations are raised when metrics fall outside the defined criteria. You can use ML-based detection (AWS learns the baseline automatically) or static threshold detection.

Metric	Type	Example Threshold
Messages sent (cloud-side)	Cloud-side metric from IoT Core	Alert if > 1000 messages in 5 minutes (possible data exfiltration)
Messages received (cloud-side)	Cloud-side metric from IoT Core	Alert if > 500 messages received (command injection pattern)
Authorization failures	Cloud-side metric from IoT Core	Alert if > 5 authorization failures in 1 minute (policy probing)
Source IP	Cloud-side metric from IoT Core	Alert if device connects from a new CIDR not seen before
Listening TCP ports	Device-side metric (requires agent)	Alert if device opens a port not in the allowed list
Listening UDP ports	Device-side metric (requires agent)	Alert if device opens UDP port outside baseline
Packets sent/received	Device-side metric (requires agent)	ML-based anomaly detection on traffic volume

bash

# Create a security profile with ML anomaly detection
aws iot create-security-profile \
  --security-profile-name "TelemetryDeviceProfile" \
  --security-profile-description "Monitors message rates and auth failures" \
  --behaviors '[
    {
      "name": "LowMessageRate",
      "metric": "aws:num-messages-sent",
      "criteria": {
        "comparisonOperator": "greater-than",
        "value": {"count": 1000},
        "durationSeconds": 300,
        "consecutiveDatapointsToAlarm": 2,
        "consecutiveDatapointsToClear": 2
      }
    },
    {
      "name": "AuthFailures",
      "metric": "aws:num-authorization-failures",
      "criteria": {
        "comparisonOperator": "greater-than",
        "value": {"count": 5},
        "durationSeconds": 60
      }
    }
  ]'

# Attach the profile to a Thing Group
aws iot attach-security-profile \
  --security-profile-name "TelemetryDeviceProfile" \
  --security-profile-target-arn "arn:aws:iot:us-east-1:123456789:thinggroup/TelemetryDevices"

⚠️

ML-based detection requires at least 14 days of training data before it starts generating meaningful violations. Do not rely on ML detect immediately after enabling it for a new device fleet - use static thresholds initially, then transition to ML after the learning period.

Mitigation Actions: Automated Remediation

Device Defender can trigger predefined mitigation actions automatically when audit findings or detect violations occur. This enables automated remediation without manual intervention.

Mitigation Action Type	What It Does	Example Trigger
ADD_THINGS_TO_THING_GROUP	Moves violating devices to a quarantine group with restrictive policy	Device showing anomalous message rates gets isolated from production topics
PUBLISH_FINDING_TO_SNS	Publishes finding details to an SNS topic for alerting or downstream automation	Trigger a Lambda that creates a Jira ticket for the security team
REPLACE_DEFAULT_POLICY_VERSION	Replaces a device's IoT policy with a blank (deny-all) policy version	Immediately revoke access for a device with active violations
UPDATE_CA_CERTIFICATE	Marks a CA certificate as inactive to prevent new device registrations	Compromised CA stops issuing new device credentials
UPDATE_DEVICE_CERTIFICATE	Revokes a specific device certificate	Decommission a known-compromised device

💡

Combine ADD_THINGS_TO_THING_GROUP with a pre-created quarantine Thing Group that has a policy allowing only a specific MQTT topic where the device can report its status. This creates a soft quarantine - the device stays connected but cannot send production data until cleared.

🎯

Interview Focus Points

1What is the difference between Device Defender Audit and Detect - what does each one monitor?
2How would you automatically quarantine a device that starts sending an unusually high volume of messages?
3Explain what IOT_POLICY_OVERLY_PERMISSIVE_CHECK catches and how to fix the finding.
4How long does ML-based anomaly detection take to start working and what should you use in the meantime?
5Walk me through a complete incident response flow when Device Defender flags an authorization failure spike from 50 devices simultaneously.
6What device-side metrics require installing a Device Defender agent, versus metrics available with no agent?
7How would you handle certificate expiry for a fleet of 100,000 devices at scale using Device Defender findings and IoT Jobs?