Ace Cloud Interviews
🐳

AWS Containers

ECS

Fully managed Docker container orchestration with deep AWS integration

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service that makes it easy to deploy, manage, and scale Docker containers on AWS. It integrates deeply with IAM, VPC, ALB, CloudWatch, and other AWS services, making it the go-to choice for teams already invested in the AWS ecosystem. ECS abstracts away the orchestration complexity while giving you fine-grained control over networking, storage, and compute.

Launch Types: EC2 vs Fargate

ECS supports two launch types that determine where your containers run. The choice affects cost, operational overhead, and scaling behavior.

DimensionEC2 Launch TypeFargate Launch Type
InfrastructureYou manage EC2 instances in the clusterAWS manages compute - serverless
Cost modelPay for instances whether containers run or notPay per vCPU and memory per second
ScalingMust scale instances AND tasks separatelyTasks scale independently - no instance scaling
ControlFull control over instance type, storage, GPULimited to supported vCPU/memory combinations
Startup timeFast if instance is already runningSlightly slower cold start (10-30s typical)
Use casePredictable, sustained workloads; GPU; custom AMIsVariable workloads; new projects; batch jobs
💡

Use Fargate for most new workloads. EC2 launch type makes sense when you need GPU instances, custom Linux kernel parameters, or have predictable baseline load where reserved instances save significant cost.

Task Definitions and Service Configuration

A task definition is a blueprint for your containers - it specifies the Docker image, CPU/memory, networking mode, environment variables, secrets, volumes, and IAM role. A service runs and maintains a desired number of task instances.

ConceptDescriptionKey Setting
Task DefinitionImmutable versioned blueprint for containerstaskRoleArn, executionRoleArn
ServiceRuns N copies of a task, replaces failed tasksdesiredCount, deploymentConfiguration
TaskRunning instance of a task definitionEphemeral - not restarted on same host
ClusterLogical grouping of services and tasksCan span multiple AZs
Container DefinitionPer-container config inside task definitionimage, portMappings, environment, secrets

Two IAM roles are critical to understand:

RolePurposeExample Permissions
Task Execution RoleUsed by ECS agent to pull images and write logsECR pull, CloudWatch Logs create
Task RoleUsed by your application code at runtimeS3 read, DynamoDB write, SQS send
⚠️

Never put AWS credentials in environment variables. Always use the task role - credentials are automatically rotated and scoped to the task via the container metadata endpoint.

Networking Modes and Service Discovery

ECS supports multiple Docker networking modes, each with different implications for port management, security, and performance.

ModeHow It WorksBest For
awsvpcEach task gets its own ENI and private IPProduction - security groups per task, Fargate required
bridgeDocker bridge network with port mappingEC2 only - legacy, dynamic host port assignment
hostContainer uses host network namespace directlyEC2 only - high performance, no port isolation
noneNo external network connectivityBatch/sidecar containers with no network needs

With awsvpc mode, services communicate via AWS Cloud Map (service discovery) or an internal ALB. Cloud Map registers task IPs as DNS records automatically.

bash
# Register a service with Cloud Map via ECS
aws ecs create-service \
  --cluster my-cluster \
  --service-name my-service \
  --task-definition my-task:5 \
  --desired-count 3 \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-abc,subnet-def],securityGroups=[sg-123],assignPublicIp=DISABLED}" \
  --service-registries "registryArn=arn:aws:servicediscovery:us-east-1:123:service/srv-abc"
💡

awsvpc is the only mode supported by Fargate and the recommended mode for all new ECS workloads on EC2. It allows you to apply security group rules at the task level rather than the instance level.

Deployment Strategies and Rolling Updates

ECS services support several deployment strategies. The right choice depends on your tolerance for downtime, rollback speed requirements, and traffic shift needs.

StrategyHow It WorksDowntime RiskRollback Speed
Rolling UpdateReplace tasks incrementally using min/max healthy percentLow with correct settingsSlow - re-deploy old version
Blue/Green (CodeDeploy)Shift traffic between two task sets via ALB target groupsZero downtimeFast - flip traffic back instantly
ExternalThird-party controller (Argo Rollouts, Spinnaker)Depends on implementationDepends on implementation

Rolling update minimumHealthyPercent and maximumPercent control the pace:

SettingMeaningRecommended Value
minimumHealthyPercentMinimum % of desired count that must be running during update100 for zero-downtime
maximumPercentMaximum % of desired count that can be running during update200 for rolling (run double temporarily)
bash
# Blue/Green deployment with CodeDeploy
# appspec.yaml
version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: "<TASK_DEFINITION>"
        LoadBalancerInfo:
          ContainerName: "web"
          ContainerPort: 8080
Hooks:
  - BeforeAllowTraffic: "LambdaFunctionToRunTests"
  - AfterAllowTraffic: "LambdaFunctionToRunSmokeTests"

Auto Scaling: Service and Capacity Provider

ECS has two levels of scaling that work together: Service Auto Scaling (scales task count) and Capacity Providers (scales EC2 instances for EC2 launch type).

Scaling LayerWhat It ScalesTrigger Mechanism
Service Auto ScalingNumber of running tasksCloudWatch metrics (CPU, memory, ALB request count, custom)
Capacity Provider (CAST)EC2 instance count in clusterBased on pending task capacity requirements
FargateNo instance scaling neededTasks provision instantly up to account limits

Capacity Provider Auto Scaling (CAST) uses a managed scaling policy that targets a specific utilization percentage across the cluster. AWS recommends targeting 100% - this keeps a buffer of capacity for fast task placement.

💡

For spiky workloads on EC2 launch type, combine ECS Capacity Providers with Spot instances. Use a Spot capacity provider with weight 4 and an On-Demand provider with weight 1 for 80% Spot, 20% On-Demand mixed strategy.

ECS vs EKS - When to Choose Each

This is the most common interview question about ECS. The answer depends on team expertise, ecosystem requirements, and operational trade-offs.

FactorChoose ECSChoose EKS
Team expertiseAWS-native team, unfamiliar with K8sK8s experience, portable skills needed
EcosystemDeep AWS integration (IAM, ALB, CloudWatch)CNCF ecosystem (Prometheus, Istio, ArgoCD)
ComplexitySimpler - fewer concepts to learnMore complex - pods, namespaces, RBAC, CRDs
PortabilityAWS-only workloadsMulti-cloud or hybrid requirements
CostNo control plane fee$0.10/hr per cluster ($72/mo) for EKS control plane
Custom schedulersNot supportedFull scheduler extensibility
ComplianceGood for mostSome enterprises require K8s for tooling compatibility
💡

AWS recommends ECS for teams starting fresh on containers who are AWS-native. EKS is better when you need Kubernetes-specific tooling, have existing K8s expertise, or plan to run on multiple clouds.

🎯

Interview Focus Points

  • 1ECS vs EKS - when would you choose one over the other and what are the trade-offs?
  • 2Explain the difference between a task role and a task execution role in ECS.
  • 3How does ECS Blue/Green deployment work with CodeDeploy and ALB target groups?
  • 4What is awsvpc networking mode and why is it preferred over bridge mode?
  • 5How would you handle secrets management for ECS tasks - environment variables vs Secrets Manager vs Parameter Store?
  • 6Explain ECS Capacity Providers and how they differ from traditional EC2 Auto Scaling for ECS clusters.
  • 7How does ECS service auto scaling work and what metrics can trigger it?
  • 8Walk me through what happens when an ECS task fails health checks during a rolling deployment.
  • 9How would you design a multi-region ECS architecture for high availability?
  • 10What are the cost optimization strategies for ECS workloads - when does Fargate become more expensive than EC2?