Ace Cloud Interviews
Home/AWS Tutorial/IoT Device Defender
📡

AWS Internet of Things

IoT Device Defender

Audit device configurations and detect anomalous behavior to secure your IoT fleet

AWS IoT Device Defender is a security service that continuously audits IoT device configurations against security best practices and detects anomalous device behavior using machine learning. It covers two distinct functions: audit (finding configuration violations in your IoT setup) and detect (monitoring device behavior at runtime for anomalies). For cloud engineers, it is the primary tool for maintaining security posture across large IoT fleets.

Audit vs Detect: Two Distinct Security Functions

Device Defender has two independent pillars. Audit runs scheduled checks against your IoT configuration in the cloud. Detect monitors device behavior metrics reported by device agents or IoT Core at runtime.

AuditDetect
What it checksIoT Core configuration: policies, certificates, logging settingsDevice runtime behavior: message rates, connection patterns, port/IP usage
When it runsScheduled (daily, weekly, etc.) or on-demandContinuous real-time monitoring
Data sourceIoT Core service APIs and metadataMetrics reported by a Device Defender agent on the device, or cloud-side metrics from IoT Core
Finding outputAudit findings categorized by severity (Critical, High, Medium, Low)Violations when metrics deviate from ML model or static thresholds
RemediationManual, or automated via IoT Core Jobs + LambdaAlerts via CloudWatch or SNS; quarantine via dynamic policy group

Audit Checks: What Gets Scanned and Why

Audit checks cover common IoT security misconfigurations. Each check can be enabled or disabled independently.

Audit CheckWhat It FindsRisk if Violated
DEVICE_CERTIFICATE_EXPIRING_CHECKCertificates expiring within 30 daysDevices will lose connectivity when cert expires - can take down entire fleet
REVOKED_DEVICE_CERTIFICATE_CHECKActive Things with revoked certificatesRevoked device may still connect if policy was not removed
LOGGING_DISABLED_CHECKIoT Core logging not enabledNo visibility into connection failures, rule errors, or policy denials
CA_CERTIFICATE_EXPIRING_CHECKCA certificates used for Just-In-Time Registration expiring within 30 daysNew devices cannot auto-register; mass provisioning failure
IOT_POLICY_OVERLY_PERMISSIVE_CHECKPolicies with wildcards (*) on topics or actionsA compromised device can publish/subscribe to any topic in the account
SHARED_COGNITO_IDENTITY_CHECKMultiple devices sharing one Cognito identityCannot revoke one device without affecting all; blast radius for compromise is entire group
ROLE_ALIAS_OVERLY_PERMISSIVE_CHECKRole aliases granting excessive AWS permissionsCompromised device gets broad AWS API access beyond IoT
bash
# Enable an audit with all critical checks
aws iot update-account-audit-configuration \
  --audit-notification-target-configurations '{
    "SNS": {
      "targetArn": "arn:aws:sns:us-east-1:123456789:iot-security-alerts",
      "roleArn": "arn:aws:iam::123456789:role/iot-defender-role",
      "enabled": true
    }
  }' \
  --audit-check-configurations '{
    "DEVICE_CERTIFICATE_EXPIRING_CHECK": {"enabled": true},
    "LOGGING_DISABLED_CHECK": {"enabled": true},
    "IOT_POLICY_OVERLY_PERMISSIVE_CHECK": {"enabled": true},
    "CA_CERTIFICATE_EXPIRING_CHECK": {"enabled": true}
  }'

# Schedule a daily audit
aws iot create-scheduled-audit \
  --scheduled-audit-name "DailySecurityAudit" \
  --frequency "DAILY" \
  --target-check-names DEVICE_CERTIFICATE_EXPIRING_CHECK LOGGING_DISABLED_CHECK IOT_POLICY_OVERLY_PERMISSIVE_CHECK

Detect: Defining and Monitoring Device Behaviors

Detect uses security profiles to define expected behavior for a device or group. Violations are raised when metrics fall outside the defined criteria. You can use ML-based detection (AWS learns the baseline automatically) or static threshold detection.

MetricTypeExample Threshold
Messages sent (cloud-side)Cloud-side metric from IoT CoreAlert if > 1000 messages in 5 minutes (possible data exfiltration)
Messages received (cloud-side)Cloud-side metric from IoT CoreAlert if > 500 messages received (command injection pattern)
Authorization failuresCloud-side metric from IoT CoreAlert if > 5 authorization failures in 1 minute (policy probing)
Source IPCloud-side metric from IoT CoreAlert if device connects from a new CIDR not seen before
Listening TCP portsDevice-side metric (requires agent)Alert if device opens a port not in the allowed list
Listening UDP portsDevice-side metric (requires agent)Alert if device opens UDP port outside baseline
Packets sent/receivedDevice-side metric (requires agent)ML-based anomaly detection on traffic volume
bash
# Create a security profile with ML anomaly detection
aws iot create-security-profile \
  --security-profile-name "TelemetryDeviceProfile" \
  --security-profile-description "Monitors message rates and auth failures" \
  --behaviors '[
    {
      "name": "LowMessageRate",
      "metric": "aws:num-messages-sent",
      "criteria": {
        "comparisonOperator": "greater-than",
        "value": {"count": 1000},
        "durationSeconds": 300,
        "consecutiveDatapointsToAlarm": 2,
        "consecutiveDatapointsToClear": 2
      }
    },
    {
      "name": "AuthFailures",
      "metric": "aws:num-authorization-failures",
      "criteria": {
        "comparisonOperator": "greater-than",
        "value": {"count": 5},
        "durationSeconds": 60
      }
    }
  ]'

# Attach the profile to a Thing Group
aws iot attach-security-profile \
  --security-profile-name "TelemetryDeviceProfile" \
  --security-profile-target-arn "arn:aws:iot:us-east-1:123456789:thinggroup/TelemetryDevices"
⚠️

ML-based detection requires at least 14 days of training data before it starts generating meaningful violations. Do not rely on ML detect immediately after enabling it for a new device fleet - use static thresholds initially, then transition to ML after the learning period.

Mitigation Actions: Automated Remediation

Device Defender can trigger predefined mitigation actions automatically when audit findings or detect violations occur. This enables automated remediation without manual intervention.

Mitigation Action TypeWhat It DoesExample Trigger
ADD_THINGS_TO_THING_GROUPMoves violating devices to a quarantine group with restrictive policyDevice showing anomalous message rates gets isolated from production topics
PUBLISH_FINDING_TO_SNSPublishes finding details to an SNS topic for alerting or downstream automationTrigger a Lambda that creates a Jira ticket for the security team
REPLACE_DEFAULT_POLICY_VERSIONReplaces a device's IoT policy with a blank (deny-all) policy versionImmediately revoke access for a device with active violations
UPDATE_CA_CERTIFICATEMarks a CA certificate as inactive to prevent new device registrationsCompromised CA stops issuing new device credentials
UPDATE_DEVICE_CERTIFICATERevokes a specific device certificateDecommission a known-compromised device
💡

Combine ADD_THINGS_TO_THING_GROUP with a pre-created quarantine Thing Group that has a policy allowing only a specific MQTT topic where the device can report its status. This creates a soft quarantine - the device stays connected but cannot send production data until cleared.

🎯

Interview Focus Points

  • 1What is the difference between Device Defender Audit and Detect - what does each one monitor?
  • 2How would you automatically quarantine a device that starts sending an unusually high volume of messages?
  • 3Explain what IOT_POLICY_OVERLY_PERMISSIVE_CHECK catches and how to fix the finding.
  • 4How long does ML-based anomaly detection take to start working and what should you use in the meantime?
  • 5Walk me through a complete incident response flow when Device Defender flags an authorization failure spike from 50 devices simultaneously.
  • 6What device-side metrics require installing a Device Defender agent, versus metrics available with no agent?
  • 7How would you handle certificate expiry for a fleet of 100,000 devices at scale using Device Defender findings and IoT Jobs?