Ace Cloud Interviews
🐳

AWS Containers

App Mesh

Service mesh for application-level networking between microservices

AWS App Mesh is a service mesh that provides application-level networking for microservices, giving you consistent visibility, traffic control, and security across services - regardless of which compute platform they run on (ECS, EKS, EC2). App Mesh uses Envoy proxy sidecars injected alongside each service to handle routing, retries, circuit breaking, and mTLS without requiring application code changes. It is built on the Envoy proxy and integrates with AWS X-Ray, CloudWatch, and ACM for certificates.

Core Concepts: Mesh, Virtual Services, and Virtual Nodes

App Mesh has its own resource model that maps to your service topology. Understanding these resources is essential for configuration and troubleshooting.

ResourcePurposeMaps To
MeshTop-level container - defines the service mesh boundaryYour entire application or a bounded context
Virtual NodeLogical pointer to an actual service (ECS service, K8s deployment, EC2)One microservice with its listeners and backends
Virtual ServiceAbstraction of a real service - what other services callDNS name other services use to call this service
Virtual RouterRoutes traffic to one or more virtual nodesLoad balancer / traffic splitter
RouteTraffic matching rules (path, headers, method) on a virtual routerIndividual routing rules with weights
Gateway RouteRoutes ingress traffic from a virtual gatewayAPI gateway routing rules
Virtual GatewayIngress point for external traffic into the meshIngress controller / API gateway

The typical call flow: Service A calls virtual-service-b.mesh.local -> App Mesh resolves this via Cloud Map -> Envoy sidecar on Service A intercepts the call and applies routing rules -> forwards to a virtual node for Service B.

Envoy Sidecar: Injection and Traffic Interception

App Mesh works by injecting an Envoy proxy container alongside your application container. Envoy intercepts all inbound and outbound traffic using iptables rules, without requiring changes to application code.

ComponentFunctionConfiguration
Envoy sidecar containerIntercepts and manages all trafficAdd as second container in task/pod definition
APPMESH_RESOURCE_ARN env varTells Envoy which virtual node it representsSet to the virtual node ARN
iptables rulesRedirect all traffic through Envoy on port 15001/15000Injected by App Mesh agent or init container
Envoy Admin interfaceDebug endpoint for Envoy config and statsPort 9901 - never expose externally
xds-grpcgRPC channel to App Mesh control plane for configAutomatically handled by Envoy
bash
# ECS task definition with Envoy sidecar (abbreviated)
{
  "containerDefinitions": [
    {
      "name": "app",
      "image": "my-app:latest",
      "portMappings": [{ "containerPort": 8080 }],
      "dependsOn": [{ "containerName": "envoy", "condition": "HEALTHY" }]
    },
    {
      "name": "envoy",
      "image": "840364872350.dkr.ecr.us-east-1.amazonaws.com/aws-appmesh-envoy:v1.29.6.0-prod",
      "environment": [
        { "name": "APPMESH_RESOURCE_ARN", "value": "arn:aws:appmesh:us-east-1:123:mesh/my-mesh/virtualNode/service-a-node" }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -s http://localhost:9901/server_info | grep state | grep -q LIVE"],
        "interval": 5, "timeout": 10, "retries": 10
      }
    }
  ],
  "proxyConfiguration": {
    "type": "APPMESH",
    "containerName": "envoy",
    "properties": [
      { "name": "IgnoredUID", "value": "1337" },
      { "name": "ProxyIngressPort", "value": "15000" },
      { "name": "ProxyEgressPort", "value": "15001" },
      { "name": "AppPorts", "value": "8080" }
    ]
  }
}
💡

The app container must declare a dependsOn condition for the Envoy container to be HEALTHY. If the app starts before Envoy is ready, it may establish connections that bypass the sidecar and miss mesh policies.

Traffic Management: Canary Releases and Circuit Breaking

App Mesh routes provide sophisticated traffic management capabilities that enable zero-downtime deployments, A/B testing, and fault tolerance without application changes.

FeatureUse CaseConfiguration
Weighted routingCanary deployments - send 10% to v2Route with two targets and weights (90/10)
Header-based routingA/B testing - route beta users to v2Match on x-user-group: beta header
Retry policyRetry on 5xx or connection errorsmaxRetries, perRetryTimeout, retryOn events
TimeoutSet per-route response time limitperRequest timeout in route spec
Circuit breakerStop sending to unhealthy endpointsConnection pool limits + outlier detection
Outlier detectionEject failing hosts from load balancingConsecutive 5xx threshold configuration
bash
# Virtual router route - weighted canary deployment
aws appmesh create-route \
  --mesh-name my-mesh \
  --virtual-router-name service-a-router \
  --route-name canary-route \
  --spec '{
    "httpRoute": {
      "match": { "prefix": "/" },
      "action": {
        "weightedTargets": [
          { "virtualNode": "service-a-v1-node", "weight": 90 },
          { "virtualNode": "service-a-v2-node", "weight": 10 }
        ]
      },
      "retryPolicy": {
        "maxRetries": 3,
        "perRetryTimeout": { "unit": "ms", "value": 2000 },
        "httpRetryEvents": ["server-error", "gateway-error"],
        "tcpRetryEvents": ["connection-error"]
      }
    }
  }'

Mutual TLS and Service-to-Service Authentication

App Mesh supports mutual TLS (mTLS) between services, ensuring only authenticated services within the mesh can communicate. Certificates can come from ACM Private CA or file-based certificates.

TLS ModeDescriptionCertificate Source
DISABLEDNo TLS - plaintext trafficN/A
PERMISSIVEAccept both TLS and plaintext - migration modeACM PCA or file
STRICT (TLS)Require TLS but do not verify client certACM PCA or file
STRICT (mTLS)Require TLS + verify client certificateACM PCA - both sides present certificates

ACM Private CA integration automates certificate issuance and rotation. Envoy fetches certificates via the SDS (Secret Discovery Service) API, meaning certs rotate without restarting containers.

💡

Use PERMISSIVE mode during the migration phase when rolling out mTLS to an existing mesh. This lets already-deployed services (without certs yet) continue to communicate while you progressively add certificates. Switch to STRICT only after all services are running with valid certs.

App Mesh vs Istio vs Linkerd

App Mesh competes with other service meshes. For AWS workloads, the main alternatives are Istio (complex but feature-rich) and Linkerd (lightweight, simpler). AWS has also introduced VPC Lattice as a simpler alternative to App Mesh.

FactorApp MeshIstioLinkerd
AWS integrationNative - CloudWatch, X-Ray, ACM PCARequires manual integrationRequires manual integration
ComplexityMedium - AWS resource modelHigh - many CRDs and conceptsLow - simple API
Multi-platformECS + EKS + EC2Kubernetes onlyKubernetes only
ProxyEnvoyEnvoyLinkerd2-proxy (Rust)
Performance overheadLow-medium (Envoy)Medium (Envoy + istiod)Very low (Rust proxy)
Feature depthCore mesh featuresMost comprehensiveCore features only
CostNo control plane costNo control plane costNo control plane cost
⚠️

AWS App Mesh is not actively receiving major new features. AWS has shifted focus to Amazon VPC Lattice (launched 2023) for service-to-service connectivity. For new projects, evaluate VPC Lattice before committing to App Mesh - it has a simpler model and tighter AWS integration without sidecar proxies.

🎯

Interview Focus Points

  • 1What problem does a service mesh solve that a load balancer or API gateway does not?
  • 2Explain how Envoy sidecar injection works in App Mesh - what happens to traffic at the network level?
  • 3How would you implement a canary deployment using App Mesh weighted routing?
  • 4What is the difference between TLS and mTLS in App Mesh - when would you require mTLS?
  • 5How does App Mesh handle service discovery - what is the relationship with AWS Cloud Map?
  • 6What is outlier detection and circuit breaking in App Mesh and how does it improve resilience?
  • 7How does App Mesh differ from AWS VPC Lattice - which would you choose for a new project?
  • 8What are the operational costs of running a service mesh - what overhead does it add to each service?