ECS

Fully managed Docker container orchestration with deep AWS integration

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service that makes it easy to deploy, manage, and scale Docker containers on AWS. It integrates deeply with IAM, VPC, ALB, CloudWatch, and other AWS services, making it the go-to choice for teams already invested in the AWS ecosystem. ECS abstracts away the orchestration complexity while giving you fine-grained control over networking, storage, and compute.

Launch Types: EC2 vs Fargate

ECS supports two launch types that determine where your containers run. The choice affects cost, operational overhead, and scaling behavior.

Dimension	EC2 Launch Type	Fargate Launch Type
Infrastructure	You manage EC2 instances in the cluster	AWS manages compute - serverless
Cost model	Pay for instances whether containers run or not	Pay per vCPU and memory per second
Scaling	Must scale instances AND tasks separately	Tasks scale independently - no instance scaling
Control	Full control over instance type, storage, GPU	Limited to supported vCPU/memory combinations
Startup time	Fast if instance is already running	Slightly slower cold start (10-30s typical)
Use case	Predictable, sustained workloads; GPU; custom AMIs	Variable workloads; new projects; batch jobs

💡

Use Fargate for most new workloads. EC2 launch type makes sense when you need GPU instances, custom Linux kernel parameters, or have predictable baseline load where reserved instances save significant cost.

Task Definitions and Service Configuration

A task definition is a blueprint for your containers - it specifies the Docker image, CPU/memory, networking mode, environment variables, secrets, volumes, and IAM role. A service runs and maintains a desired number of task instances.

Concept	Description	Key Setting
Task Definition	Immutable versioned blueprint for containers	taskRoleArn, executionRoleArn
Service	Runs N copies of a task, replaces failed tasks	desiredCount, deploymentConfiguration
Task	Running instance of a task definition	Ephemeral - not restarted on same host
Cluster	Logical grouping of services and tasks	Can span multiple AZs
Container Definition	Per-container config inside task definition	image, portMappings, environment, secrets

Two IAM roles are critical to understand:

Role	Purpose	Example Permissions
Task Execution Role	Used by ECS agent to pull images and write logs	ECR pull, CloudWatch Logs create
Task Role	Used by your application code at runtime	S3 read, DynamoDB write, SQS send

⚠️

Never put AWS credentials in environment variables. Always use the task role - credentials are automatically rotated and scoped to the task via the container metadata endpoint.

Networking Modes and Service Discovery

ECS supports multiple Docker networking modes, each with different implications for port management, security, and performance.

Mode	How It Works	Best For
awsvpc	Each task gets its own ENI and private IP	Production - security groups per task, Fargate required
bridge	Docker bridge network with port mapping	EC2 only - legacy, dynamic host port assignment
host	Container uses host network namespace directly	EC2 only - high performance, no port isolation
none	No external network connectivity	Batch/sidecar containers with no network needs

With awsvpc mode, services communicate via AWS Cloud Map (service discovery) or an internal ALB. Cloud Map registers task IPs as DNS records automatically.

bash

# Register a service with Cloud Map via ECS
aws ecs create-service \
  --cluster my-cluster \
  --service-name my-service \
  --task-definition my-task:5 \
  --desired-count 3 \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-abc,subnet-def],securityGroups=[sg-123],assignPublicIp=DISABLED}" \
  --service-registries "registryArn=arn:aws:servicediscovery:us-east-1:123:service/srv-abc"

💡

awsvpc is the only mode supported by Fargate and the recommended mode for all new ECS workloads on EC2. It allows you to apply security group rules at the task level rather than the instance level.

Deployment Strategies and Rolling Updates

ECS services support several deployment strategies. The right choice depends on your tolerance for downtime, rollback speed requirements, and traffic shift needs.

Strategy	How It Works	Downtime Risk	Rollback Speed
Rolling Update	Replace tasks incrementally using min/max healthy percent	Low with correct settings	Slow - re-deploy old version
Blue/Green (CodeDeploy)	Shift traffic between two task sets via ALB target groups	Zero downtime	Fast - flip traffic back instantly
External	Third-party controller (Argo Rollouts, Spinnaker)	Depends on implementation	Depends on implementation

Rolling update minimumHealthyPercent and maximumPercent control the pace:

Setting	Meaning	Recommended Value
minimumHealthyPercent	Minimum % of desired count that must be running during update	100 for zero-downtime
maximumPercent	Maximum % of desired count that can be running during update	200 for rolling (run double temporarily)

bash

# Blue/Green deployment with CodeDeploy
# appspec.yaml
version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: "<TASK_DEFINITION>"
        LoadBalancerInfo:
          ContainerName: "web"
          ContainerPort: 8080
Hooks:
  - BeforeAllowTraffic: "LambdaFunctionToRunTests"
  - AfterAllowTraffic: "LambdaFunctionToRunSmokeTests"

Auto Scaling: Service and Capacity Provider

ECS has two levels of scaling that work together: Service Auto Scaling (scales task count) and Capacity Providers (scales EC2 instances for EC2 launch type).

Scaling Layer	What It Scales	Trigger Mechanism
Service Auto Scaling	Number of running tasks	CloudWatch metrics (CPU, memory, ALB request count, custom)
Capacity Provider (CAST)	EC2 instance count in cluster	Based on pending task capacity requirements
Fargate	No instance scaling needed	Tasks provision instantly up to account limits

Capacity Provider Auto Scaling (CAST) uses a managed scaling policy that targets a specific utilization percentage across the cluster. AWS recommends targeting 100% - this keeps a buffer of capacity for fast task placement.

💡

For spiky workloads on EC2 launch type, combine ECS Capacity Providers with Spot instances. Use a Spot capacity provider with weight 4 and an On-Demand provider with weight 1 for 80% Spot, 20% On-Demand mixed strategy.

ECS vs EKS - When to Choose Each

This is the most common interview question about ECS. The answer depends on team expertise, ecosystem requirements, and operational trade-offs.

Factor	Choose ECS	Choose EKS
Team expertise	AWS-native team, unfamiliar with K8s	K8s experience, portable skills needed
Ecosystem	Deep AWS integration (IAM, ALB, CloudWatch)	CNCF ecosystem (Prometheus, Istio, ArgoCD)
Complexity	Simpler - fewer concepts to learn	More complex - pods, namespaces, RBAC, CRDs
Portability	AWS-only workloads	Multi-cloud or hybrid requirements
Cost	No control plane fee	$0.10/hr per cluster ($72/mo) for EKS control plane
Custom schedulers	Not supported	Full scheduler extensibility
Compliance	Good for most	Some enterprises require K8s for tooling compatibility

💡

AWS recommends ECS for teams starting fresh on containers who are AWS-native. EKS is better when you need Kubernetes-specific tooling, have existing K8s expertise, or plan to run on multiple clouds.

🎯

Interview Focus Points

1ECS vs EKS - when would you choose one over the other and what are the trade-offs?
2Explain the difference between a task role and a task execution role in ECS.
3How does ECS Blue/Green deployment work with CodeDeploy and ALB target groups?
4What is awsvpc networking mode and why is it preferred over bridge mode?
5How would you handle secrets management for ECS tasks - environment variables vs Secrets Manager vs Parameter Store?
6Explain ECS Capacity Providers and how they differ from traditional EC2 Auto Scaling for ECS clusters.
7How does ECS service auto scaling work and what metrics can trigger it?
8Walk me through what happens when an ECS task fails health checks during a rolling deployment.
9How would you design a multi-region ECS architecture for high availability?
10What are the cost optimization strategies for ECS workloads - when does Fargate become more expensive than EC2?