AWS Containers
App Mesh
Service mesh for application-level networking between microservices
AWS App Mesh is a service mesh that provides application-level networking for microservices, giving you consistent visibility, traffic control, and security across services - regardless of which compute platform they run on (ECS, EKS, EC2). App Mesh uses Envoy proxy sidecars injected alongside each service to handle routing, retries, circuit breaking, and mTLS without requiring application code changes. It is built on the Envoy proxy and integrates with AWS X-Ray, CloudWatch, and ACM for certificates.
Core Concepts: Mesh, Virtual Services, and Virtual Nodes
App Mesh has its own resource model that maps to your service topology. Understanding these resources is essential for configuration and troubleshooting.
| Resource | Purpose | Maps To |
|---|---|---|
| Mesh | Top-level container - defines the service mesh boundary | Your entire application or a bounded context |
| Virtual Node | Logical pointer to an actual service (ECS service, K8s deployment, EC2) | One microservice with its listeners and backends |
| Virtual Service | Abstraction of a real service - what other services call | DNS name other services use to call this service |
| Virtual Router | Routes traffic to one or more virtual nodes | Load balancer / traffic splitter |
| Route | Traffic matching rules (path, headers, method) on a virtual router | Individual routing rules with weights |
| Gateway Route | Routes ingress traffic from a virtual gateway | API gateway routing rules |
| Virtual Gateway | Ingress point for external traffic into the mesh | Ingress controller / API gateway |
The typical call flow: Service A calls virtual-service-b.mesh.local -> App Mesh resolves this via Cloud Map -> Envoy sidecar on Service A intercepts the call and applies routing rules -> forwards to a virtual node for Service B.
Envoy Sidecar: Injection and Traffic Interception
App Mesh works by injecting an Envoy proxy container alongside your application container. Envoy intercepts all inbound and outbound traffic using iptables rules, without requiring changes to application code.
| Component | Function | Configuration |
|---|---|---|
| Envoy sidecar container | Intercepts and manages all traffic | Add as second container in task/pod definition |
| APPMESH_RESOURCE_ARN env var | Tells Envoy which virtual node it represents | Set to the virtual node ARN |
| iptables rules | Redirect all traffic through Envoy on port 15001/15000 | Injected by App Mesh agent or init container |
| Envoy Admin interface | Debug endpoint for Envoy config and stats | Port 9901 - never expose externally |
| xds-grpc | gRPC channel to App Mesh control plane for config | Automatically handled by Envoy |
# ECS task definition with Envoy sidecar (abbreviated)
{
"containerDefinitions": [
{
"name": "app",
"image": "my-app:latest",
"portMappings": [{ "containerPort": 8080 }],
"dependsOn": [{ "containerName": "envoy", "condition": "HEALTHY" }]
},
{
"name": "envoy",
"image": "840364872350.dkr.ecr.us-east-1.amazonaws.com/aws-appmesh-envoy:v1.29.6.0-prod",
"environment": [
{ "name": "APPMESH_RESOURCE_ARN", "value": "arn:aws:appmesh:us-east-1:123:mesh/my-mesh/virtualNode/service-a-node" }
],
"healthCheck": {
"command": ["CMD-SHELL", "curl -s http://localhost:9901/server_info | grep state | grep -q LIVE"],
"interval": 5, "timeout": 10, "retries": 10
}
}
],
"proxyConfiguration": {
"type": "APPMESH",
"containerName": "envoy",
"properties": [
{ "name": "IgnoredUID", "value": "1337" },
{ "name": "ProxyIngressPort", "value": "15000" },
{ "name": "ProxyEgressPort", "value": "15001" },
{ "name": "AppPorts", "value": "8080" }
]
}
}The app container must declare a dependsOn condition for the Envoy container to be HEALTHY. If the app starts before Envoy is ready, it may establish connections that bypass the sidecar and miss mesh policies.
Traffic Management: Canary Releases and Circuit Breaking
App Mesh routes provide sophisticated traffic management capabilities that enable zero-downtime deployments, A/B testing, and fault tolerance without application changes.
| Feature | Use Case | Configuration |
|---|---|---|
| Weighted routing | Canary deployments - send 10% to v2 | Route with two targets and weights (90/10) |
| Header-based routing | A/B testing - route beta users to v2 | Match on x-user-group: beta header |
| Retry policy | Retry on 5xx or connection errors | maxRetries, perRetryTimeout, retryOn events |
| Timeout | Set per-route response time limit | perRequest timeout in route spec |
| Circuit breaker | Stop sending to unhealthy endpoints | Connection pool limits + outlier detection |
| Outlier detection | Eject failing hosts from load balancing | Consecutive 5xx threshold configuration |
# Virtual router route - weighted canary deployment
aws appmesh create-route \
--mesh-name my-mesh \
--virtual-router-name service-a-router \
--route-name canary-route \
--spec '{
"httpRoute": {
"match": { "prefix": "/" },
"action": {
"weightedTargets": [
{ "virtualNode": "service-a-v1-node", "weight": 90 },
{ "virtualNode": "service-a-v2-node", "weight": 10 }
]
},
"retryPolicy": {
"maxRetries": 3,
"perRetryTimeout": { "unit": "ms", "value": 2000 },
"httpRetryEvents": ["server-error", "gateway-error"],
"tcpRetryEvents": ["connection-error"]
}
}
}'Mutual TLS and Service-to-Service Authentication
App Mesh supports mutual TLS (mTLS) between services, ensuring only authenticated services within the mesh can communicate. Certificates can come from ACM Private CA or file-based certificates.
| TLS Mode | Description | Certificate Source |
|---|---|---|
| DISABLED | No TLS - plaintext traffic | N/A |
| PERMISSIVE | Accept both TLS and plaintext - migration mode | ACM PCA or file |
| STRICT (TLS) | Require TLS but do not verify client cert | ACM PCA or file |
| STRICT (mTLS) | Require TLS + verify client certificate | ACM PCA - both sides present certificates |
ACM Private CA integration automates certificate issuance and rotation. Envoy fetches certificates via the SDS (Secret Discovery Service) API, meaning certs rotate without restarting containers.
Use PERMISSIVE mode during the migration phase when rolling out mTLS to an existing mesh. This lets already-deployed services (without certs yet) continue to communicate while you progressively add certificates. Switch to STRICT only after all services are running with valid certs.
App Mesh vs Istio vs Linkerd
App Mesh competes with other service meshes. For AWS workloads, the main alternatives are Istio (complex but feature-rich) and Linkerd (lightweight, simpler). AWS has also introduced VPC Lattice as a simpler alternative to App Mesh.
| Factor | App Mesh | Istio | Linkerd |
|---|---|---|---|
| AWS integration | Native - CloudWatch, X-Ray, ACM PCA | Requires manual integration | Requires manual integration |
| Complexity | Medium - AWS resource model | High - many CRDs and concepts | Low - simple API |
| Multi-platform | ECS + EKS + EC2 | Kubernetes only | Kubernetes only |
| Proxy | Envoy | Envoy | Linkerd2-proxy (Rust) |
| Performance overhead | Low-medium (Envoy) | Medium (Envoy + istiod) | Very low (Rust proxy) |
| Feature depth | Core mesh features | Most comprehensive | Core features only |
| Cost | No control plane cost | No control plane cost | No control plane cost |
AWS App Mesh is not actively receiving major new features. AWS has shifted focus to Amazon VPC Lattice (launched 2023) for service-to-service connectivity. For new projects, evaluate VPC Lattice before committing to App Mesh - it has a simpler model and tighter AWS integration without sidecar proxies.
Interview Focus Points
- 1What problem does a service mesh solve that a load balancer or API gateway does not?
- 2Explain how Envoy sidecar injection works in App Mesh - what happens to traffic at the network level?
- 3How would you implement a canary deployment using App Mesh weighted routing?
- 4What is the difference between TLS and mTLS in App Mesh - when would you require mTLS?
- 5How does App Mesh handle service discovery - what is the relationship with AWS Cloud Map?
- 6What is outlier detection and circuit breaking in App Mesh and how does it improve resilience?
- 7How does App Mesh differ from AWS VPC Lattice - which would you choose for a new project?
- 8What are the operational costs of running a service mesh - what overhead does it add to each service?