AWS Compute
Lambda
Event-driven serverless compute - write code, AWS handles the infrastructure
AWS Lambda is a serverless compute service that executes your code in response to events - HTTP requests, S3 uploads, DynamoDB changes, SQS messages, and more. You write a function, Lambda handles provisioning, scaling, and high availability. You pay only for the compute time your function actually uses, billed in 1ms increments.
How Lambda Executes
Lambda runs functions inside isolated execution environments (microVMs powered by Firecracker). When a function is invoked, Lambda either reuses an existing warm environment or creates a new one.
- Cold start: Lambda provisions a new execution environment, downloads the deployment package, initializes the runtime, and runs the function's initialization code (outside the handler). This adds latency - typically 100ms to 1s depending on runtime and package size.
- Warm invocation: An existing environment is reused. The handler runs immediately. Objects declared outside the handler (DB connections, SDK clients) persist across invocations in the same environment.
- Concurrent executions: Lambda scales by creating more execution environments in parallel. Each environment handles one invocation at a time.
- Burst limits: Lambda can scale by 500-3000 concurrent executions per minute (region-dependent) before throttling.
Reuse execution environments by initializing expensive resources (DB connections, HTTP clients) outside the handler. They are cached between invocations in the same environment.
Memory, Timeout, and Concurrency
| Setting | Range | Key Points |
|---|---|---|
| Memory | 128 MB to 10,240 MB | CPU allocation scales proportionally with memory. To get more CPU, increase memory. |
| Timeout | 1 second to 15 minutes | Default is 3 seconds. Set close to your expected duration to fail fast. |
| Ephemeral Storage (/tmp) | 512 MB to 10,240 MB | Shared within a single execution environment, not across invocations. |
| Reserved Concurrency | 0 to account limit | Guarantees a minimum capacity for the function; also acts as a hard cap. |
| Provisioned Concurrency | 1 to reserved limit | Pre-warms environments to eliminate cold starts. Costs more. |
Setting Reserved Concurrency to 0 disables the function. This is useful to stop runaway functions during incidents.
Invocation Types
Lambda supports three invocation models that determine how errors and retries behave:
| Type | Who Uses It | Error Handling | Retry Behavior |
|---|---|---|---|
| Synchronous | API Gateway, ALB, SDK direct calls | Error returned to caller | Caller handles retries |
| Asynchronous | S3 events, SNS, EventBridge | Lambda retries twice internally | 2 automatic retries, then DLQ or destination |
| Stream/Queue Poll | SQS, Kinesis, DynamoDB Streams | Depends on source config | Records retried until success or expiry |
- Destinations (Lambda Destinations): route asynchronous invocation results (success or failure) to SNS, SQS, EventBridge, or another Lambda - more flexible than DLQs
- Dead Letter Queues (DLQ): for async invocations, failed events after retries go to an SQS queue or SNS topic for inspection
- Function URLs: built-in HTTPS endpoint for a Lambda function without needing API Gateway
Cold Starts and How to Minimize Them
Cold starts add latency to function invocations and are a major concern for latency-sensitive APIs. The main factors and mitigations:
| Factor | Impact | Mitigation |
|---|---|---|
| Runtime | .NET, Java are slowest; Node.js, Python fastest | Choose lightweight runtimes; use GraalVM for Java |
| Package size | Larger packages take longer to download/unzip | Minimize dependencies; use Lambda Layers to cache them |
| VPC attachment | Was historically slow (added ~10s); fixed in 2019 via pre-created ENIs | Use VPC only when needed; no action required for existing functions |
| Provisioned Concurrency | Eliminates cold starts by keeping environments pre-initialized | Use for latency-sensitive APIs; costs ~1.5x On-Demand |
| Keep-warm (EventBridge ping) | Reduces cold starts for low-traffic functions | Ping every 5 min; cheap but fragile and not fully reliable |
Layers, Container Images, and Deployment
- Deployment package limit: 50 MB zipped direct upload, 250 MB unzipped via S3
- Lambda Layers: up to 5 layers per function, 250 MB total unzipped. Useful for sharing dependencies (SDKs, bindings) across functions.
- Container images: up to 10 GB. Package code + runtime + dependencies into a Docker image pushed to ECR. No 250 MB limit.
- Lambda Extensions: run alongside the function for monitoring, security, and telemetry. Internal extensions run in the same process; external run as a separate process.
- Lambda@Edge: run functions at CloudFront edge locations for Viewer Request, Viewer Response, Origin Request, Origin Response. Max 5 seconds timeout, 1 MB limit, no VPC.
- CloudFront Functions: lighter, faster JS-only execution for URL rewrites, header manipulation. Sub-millisecond. Cheaper than Lambda@Edge.
Pricing Model
| Component | Free Tier | Beyond Free Tier |
|---|---|---|
| Requests | 1M requests/month | $0.20 per 1M requests |
| Duration (x86) | 400,000 GB-seconds/month | $0.0000166667 per GB-second |
| Duration (Graviton/arm64) | 400,000 GB-seconds/month | $0.0000133334 per GB-second (20% cheaper) |
| Provisioned Concurrency | Not included | $0.000004646 per GB-second allocated |
Graviton2 (arm64) Lambda functions are 20% cheaper and up to 19% faster. Use arm64 for new functions when your code and dependencies are compatible.
Interview Focus Points
- 1Cold start causes and all mitigation strategies (provisioned concurrency, lightweight runtimes, package size)
- 2Sync vs async vs poll-based invocation - retry behavior and error handling for each
- 3How memory and CPU are linked - to get more CPU, increase memory
- 4Reserved vs Provisioned Concurrency - difference and use cases
- 5Lambda Destinations vs Dead Letter Queues - when to use which
- 6Lambda@Edge vs CloudFront Functions - limits, latency, and use cases
- 7Container images for Lambda - benefits and the 10 GB limit
- 8How Lambda execution environments are reused - initialization code outside handler