Ace Cloud Interviews
Home/AWS Tutorial/IoT Greengrass
📡

AWS Internet of Things

IoT Greengrass

Extend AWS compute, ML, and messaging to edge devices locally

AWS IoT Greengrass extends AWS services to edge devices, enabling local compute, messaging, data caching, and ML inference to run on-premises even without an internet connection. It uses a component model where you deploy Lambda functions, Docker containers, or native processes to edge hardware. Cloud engineers need to understand Greengrass for architectures requiring low latency, offline resilience, or data sovereignty at the edge.

How Greengrass Works: Core, Components, and Deployments

Greengrass V2 (the current version) uses a component-based architecture. The Greengrass Core software runs on the edge device and manages the lifecycle of deployed components. Components can be AWS-provided (Lambda runner, stream manager, ML inference) or custom-built.

ConceptDescriptionAnalogy
Greengrass CoreSoftware daemon running on the edge device, manages component lifecycle and cloud connectivityLike a local Kubernetes node agent
ComponentDeployable unit - can be Lambda, Docker container, process, or pluginLike a Kubernetes pod/deployment spec
DeploymentA set of components targeted to a device or group, managed from the cloudLike a Helm release applied to a node group
Thing GroupLogical grouping of Greengrass cores - deploy to the group to reach all membersLike a Kubernetes node selector label
RecipeJSON/YAML manifest defining a component's dependencies, configuration, and lifecycle scriptsLike a Dockerfile + docker-compose config
ArtifactBinary assets (code, models, config files) stored in S3 that a component downloadsLike container image layers

Deployments are created in the cloud and pulled by Greengrass cores. The core polls for new deployments and installs/updates components locally. This pull model means the edge device only needs outbound internet access - no inbound ports required.

💡

Greengrass V2 replaced V1 in 2021. In interviews, clarify which version you are discussing. V2 uses a modular component architecture; V1 used Lambda groups and connectors. AWS recommends all new deployments use V2.

Edge ML Inference with Greengrass

One of the most common Greengrass use cases is running ML inference locally to avoid round-trip latency to the cloud. Greengrass provides built-in ML components for SageMaker Neo-compiled models and DLR (Deep Learning Runtime).

ML ComponentFrameworkUse Case
DLR InferenceDLR (Deep Learning Runtime), supports TensorFlow, MXNet, PyTorchGeneral ML inference on compiled models
TensorFlow LiteTensorFlow LiteComputer vision and NLP on constrained devices
Greengrass ML Inference (v1 pattern)MXNet, TensorFlow via connectorsLegacy V1 pattern - migrate to V2 components
bash
# Deploy a SageMaker Neo model to Greengrass V2 devices
# 1. Compile model with SageMaker Neo for target hardware
aws sagemaker create-compilation-job \
  --compilation-job-name "ResNet50-RaspberryPi4" \
  --role-arn arn:aws:iam::123456789:role/sagemaker-role \
  --input-config '{"S3Uri": "s3://my-bucket/resnet50/", "DataInputConfig": "{"input": [1,3,224,224]}", "Framework": "PYTORCH"}' \
  --output-config '{"S3OutputLocation": "s3://my-bucket/compiled/", "TargetPlatform": {"Os": "LINUX", "Arch": "ARM64"}}' \
  --stopping-condition '{"MaxRuntimeInSeconds": 900}'

# 2. Create a Greengrass component referencing the compiled model
# recipe.json:
{
  "RecipeFormatVersion": "2020-01-25",
  "ComponentName": "com.example.ImageClassifier",
  "ComponentVersion": "1.0.0",
  "ComponentDependencies": {
    "aws.greengrass.DLRImageClassification": { "VersionRequirement": ">=2.0.0" }
  },
  "Manifests": [{
    "Lifecycle": {
      "Run": "python3 {artifacts:path}/inference.py"
    },
    "Artifacts": [{
      "URI": "s3://my-bucket/compiled/model.tar.gz"
    }]
  }]
}
⚠️

SageMaker Neo compilation must target the exact CPU architecture of your edge device. An ARM64 model will not run on x86_64 and vice versa. Always specify both Os and Arch in the TargetPlatform.

Local Messaging, IPC, and Offline Resilience

Greengrass provides local pub/sub messaging so components can communicate without reaching the cloud. It also provides stream manager for buffering data when the cloud is unreachable.

CapabilityHow It WorksOffline Behavior
Local MQTT (Moquette broker)Components publish and subscribe to local MQTT topics; Greengrass bridges selected topics to IoT CoreFully functional offline; messages bridge to cloud when reconnected
IPC (Inter-Process Communication)Direct API between components using Unix domain sockets; supports pub/sub, config, secrets, and local shadowAlways local, no cloud dependency
Stream ManagerPersistent queue that stores data streams locally and exports to IoT Analytics, Kinesis, S3, or IoT Core when connectedBuffers up to configured size; applies backpressure to producers when full
Local ShadowDevice shadow stored on the Greengrass core, syncs to cloud shadow when connectedComponents read/write shadow locally without cloud dependency
💡

Stream Manager is the key component for offline resilience. Design your edge applications to write to Stream Manager rather than directly to IoT Core or Kinesis. Stream Manager handles the buffering, retry, and ordered delivery when connectivity is restored.

Security Model: Token Exchange, Certificates, and Secrets

Greengrass uses a Token Exchange Service (TES) to provide temporary AWS credentials to components. Components assume a configured IAM role via TES rather than having long-lived credentials embedded on the device.

Security MechanismWhat It ProvidesBest Practice
Device certificate (X.509)Authenticates the Greengrass core to IoT Core via mutual TLSRotate annually; use AWS IoT Device Defender to monitor expiry
Token Exchange Service roleIAM role assumed by components to call AWS APIs (S3, DynamoDB, etc.)Scope permissions tightly to only what each component needs; use conditions on resource ARNs
Secrets Manager integrationSecrets are fetched at deployment time and cached locally on the deviceUse for database passwords, API keys; never embed secrets in artifacts
Component IAM authorizationPer-component resource access policies using IAM conditionsSeparate component roles where possible; avoid one overpermissive TES role
bash
# Check Greengrass core status and component health
# On the edge device:
aws greengrassv2 list-installed-components \
  --core-device-thing-name my-edge-device

# List deployments for a core device
aws greengrassv2 list-deployments \
  --history-filter LATEST_ONLY

# Check component logs on the device
sudo journalctl -u greengrass.service -f
# Or check component-specific logs:
tail -f /greengrass/v2/logs/com.example.MyComponent.log

Greengrass vs Outposts vs Wavelength: Edge Compute Comparison

ServiceBest ForHardware RequirementLatency Target
IoT GreengrassEdge ML inference, local device orchestration, offline-capable IoT gateways, constrained hardwareAny Linux device, Raspberry Pi to industrial gateway; no dedicated AWS hardwareDevice-local - sub-millisecond for local inference
AWS OutpostsFull AWS services running on-premises in a datacenter, latency-sensitive enterprise apps needing AWS APIs locallyAWS-provided Outpost rack or 1U/2U servers installed on-premises< 5ms to on-premises AWS infrastructure
AWS WavelengthUltra-low latency to mobile devices via 5G carrier network, gaming, AR/VR, autonomous vehiclesAWS compute deployed inside telecom provider's 5G network< 10ms to 5G mobile devices
💡

Interview framing: Greengrass is for device-edge (software on your existing hardware), Outposts is for datacenter-edge (AWS hardware in your building), and Wavelength is for network-edge (AWS hardware in telecom 5G datacenters).

🎯

Interview Focus Points

  • 1Explain the difference between Greengrass V1 and V2 component models.
  • 2How does Greengrass handle connectivity loss - what happens to data and local processing?
  • 3A factory robot needs to run ML inference on camera images with under 50ms latency. Why would Greengrass be preferred over sending images to Rekognition?
  • 4How does the Token Exchange Service work and why is it more secure than embedding IAM credentials on edge devices?
  • 5Walk me through deploying a new ML model version to 5,000 Greengrass devices without downtime.
  • 6What is Stream Manager and how does it provide offline resilience for IoT data pipelines?
  • 7Compare Greengrass, Outposts, and Wavelength - when would you choose each?
  • 8How would you organize Greengrass devices into Thing Groups to enable phased rollouts of new component versions?