Lambda

Event-driven serverless compute - write code, AWS handles the infrastructure

AWS Lambda is a serverless compute service that executes your code in response to events - HTTP requests, S3 uploads, DynamoDB changes, SQS messages, and more. You write a function, Lambda handles provisioning, scaling, and high availability. You pay only for the compute time your function actually uses, billed in 1ms increments.

How Lambda Executes

Lambda runs functions inside isolated execution environments (microVMs powered by Firecracker). When a function is invoked, Lambda either reuses an existing warm environment or creates a new one.

Cold start: Lambda provisions a new execution environment, downloads the deployment package, initializes the runtime, and runs the function's initialization code (outside the handler). This adds latency - typically 100ms to 1s depending on runtime and package size.
Warm invocation: An existing environment is reused. The handler runs immediately. Objects declared outside the handler (DB connections, SDK clients) persist across invocations in the same environment.
Concurrent executions: Lambda scales by creating more execution environments in parallel. Each environment handles one invocation at a time.
Burst limits: Lambda can scale by 500-3000 concurrent executions per minute (region-dependent) before throttling.

💡

Reuse execution environments by initializing expensive resources (DB connections, HTTP clients) outside the handler. They are cached between invocations in the same environment.

Memory, Timeout, and Concurrency

Setting	Range	Key Points
Memory	128 MB to 10,240 MB	CPU allocation scales proportionally with memory. To get more CPU, increase memory.
Timeout	1 second to 15 minutes	Default is 3 seconds. Set close to your expected duration to fail fast.
Ephemeral Storage (/tmp)	512 MB to 10,240 MB	Shared within a single execution environment, not across invocations.
Reserved Concurrency	0 to account limit	Guarantees a minimum capacity for the function; also acts as a hard cap.
Provisioned Concurrency	1 to reserved limit	Pre-warms environments to eliminate cold starts. Costs more.

⚠️

Setting Reserved Concurrency to 0 disables the function. This is useful to stop runaway functions during incidents.

Invocation Types

Lambda supports three invocation models that determine how errors and retries behave:

Type	Who Uses It	Error Handling	Retry Behavior
Synchronous	API Gateway, ALB, SDK direct calls	Error returned to caller	Caller handles retries
Asynchronous	S3 events, SNS, EventBridge	Lambda retries twice internally	2 automatic retries, then DLQ or destination
Stream/Queue Poll	SQS, Kinesis, DynamoDB Streams	Depends on source config	Records retried until success or expiry

Destinations (Lambda Destinations): route asynchronous invocation results (success or failure) to SNS, SQS, EventBridge, or another Lambda - more flexible than DLQs
Dead Letter Queues (DLQ): for async invocations, failed events after retries go to an SQS queue or SNS topic for inspection
Function URLs: built-in HTTPS endpoint for a Lambda function without needing API Gateway

Cold Starts and How to Minimize Them

Cold starts add latency to function invocations and are a major concern for latency-sensitive APIs. The main factors and mitigations:

Factor	Impact	Mitigation
Runtime	.NET, Java are slowest; Node.js, Python fastest	Choose lightweight runtimes; use GraalVM for Java
Package size	Larger packages take longer to download/unzip	Minimize dependencies; use Lambda Layers to cache them
VPC attachment	Was historically slow (added ~10s); fixed in 2019 via pre-created ENIs	Use VPC only when needed; no action required for existing functions
Provisioned Concurrency	Eliminates cold starts by keeping environments pre-initialized	Use for latency-sensitive APIs; costs ~1.5x On-Demand
Keep-warm (EventBridge ping)	Reduces cold starts for low-traffic functions	Ping every 5 min; cheap but fragile and not fully reliable

Layers, Container Images, and Deployment

Deployment package limit: 50 MB zipped direct upload, 250 MB unzipped via S3
Lambda Layers: up to 5 layers per function, 250 MB total unzipped. Useful for sharing dependencies (SDKs, bindings) across functions.
Container images: up to 10 GB. Package code + runtime + dependencies into a Docker image pushed to ECR. No 250 MB limit.
Lambda Extensions: run alongside the function for monitoring, security, and telemetry. Internal extensions run in the same process; external run as a separate process.
Lambda@Edge: run functions at CloudFront edge locations for Viewer Request, Viewer Response, Origin Request, Origin Response. Max 5 seconds timeout, 1 MB limit, no VPC.
CloudFront Functions: lighter, faster JS-only execution for URL rewrites, header manipulation. Sub-millisecond. Cheaper than Lambda@Edge.

Pricing Model

Component	Free Tier	Beyond Free Tier
Requests	1M requests/month	$0.20 per 1M requests
Duration (x86)	400,000 GB-seconds/month	$0.0000166667 per GB-second
Duration (Graviton/arm64)	400,000 GB-seconds/month	$0.0000133334 per GB-second (20% cheaper)
Provisioned Concurrency	Not included	$0.000004646 per GB-second allocated

💡

Graviton2 (arm64) Lambda functions are 20% cheaper and up to 19% faster. Use arm64 for new functions when your code and dependencies are compatible.

🎯

Interview Focus Points

1Cold start causes and all mitigation strategies (provisioned concurrency, lightweight runtimes, package size)
2Sync vs async vs poll-based invocation - retry behavior and error handling for each
3How memory and CPU are linked - to get more CPU, increase memory
4Reserved vs Provisioned Concurrency - difference and use cases
5Lambda Destinations vs Dead Letter Queues - when to use which
6Lambda@Edge vs CloudFront Functions - limits, latency, and use cases
7Container images for Lambda - benefits and the 10 GB limit
8How Lambda execution environments are reused - initialization code outside handler