AWS Containers
ECS
Fully managed Docker container orchestration with deep AWS integration
Amazon Elastic Container Service (ECS) is a fully managed container orchestration service that makes it easy to deploy, manage, and scale Docker containers on AWS. It integrates deeply with IAM, VPC, ALB, CloudWatch, and other AWS services, making it the go-to choice for teams already invested in the AWS ecosystem. ECS abstracts away the orchestration complexity while giving you fine-grained control over networking, storage, and compute.
Launch Types: EC2 vs Fargate
ECS supports two launch types that determine where your containers run. The choice affects cost, operational overhead, and scaling behavior.
| Dimension | EC2 Launch Type | Fargate Launch Type |
|---|---|---|
| Infrastructure | You manage EC2 instances in the cluster | AWS manages compute - serverless |
| Cost model | Pay for instances whether containers run or not | Pay per vCPU and memory per second |
| Scaling | Must scale instances AND tasks separately | Tasks scale independently - no instance scaling |
| Control | Full control over instance type, storage, GPU | Limited to supported vCPU/memory combinations |
| Startup time | Fast if instance is already running | Slightly slower cold start (10-30s typical) |
| Use case | Predictable, sustained workloads; GPU; custom AMIs | Variable workloads; new projects; batch jobs |
Use Fargate for most new workloads. EC2 launch type makes sense when you need GPU instances, custom Linux kernel parameters, or have predictable baseline load where reserved instances save significant cost.
Task Definitions and Service Configuration
A task definition is a blueprint for your containers - it specifies the Docker image, CPU/memory, networking mode, environment variables, secrets, volumes, and IAM role. A service runs and maintains a desired number of task instances.
| Concept | Description | Key Setting |
|---|---|---|
| Task Definition | Immutable versioned blueprint for containers | taskRoleArn, executionRoleArn |
| Service | Runs N copies of a task, replaces failed tasks | desiredCount, deploymentConfiguration |
| Task | Running instance of a task definition | Ephemeral - not restarted on same host |
| Cluster | Logical grouping of services and tasks | Can span multiple AZs |
| Container Definition | Per-container config inside task definition | image, portMappings, environment, secrets |
Two IAM roles are critical to understand:
| Role | Purpose | Example Permissions |
|---|---|---|
| Task Execution Role | Used by ECS agent to pull images and write logs | ECR pull, CloudWatch Logs create |
| Task Role | Used by your application code at runtime | S3 read, DynamoDB write, SQS send |
Never put AWS credentials in environment variables. Always use the task role - credentials are automatically rotated and scoped to the task via the container metadata endpoint.
Networking Modes and Service Discovery
ECS supports multiple Docker networking modes, each with different implications for port management, security, and performance.
| Mode | How It Works | Best For |
|---|---|---|
| awsvpc | Each task gets its own ENI and private IP | Production - security groups per task, Fargate required |
| bridge | Docker bridge network with port mapping | EC2 only - legacy, dynamic host port assignment |
| host | Container uses host network namespace directly | EC2 only - high performance, no port isolation |
| none | No external network connectivity | Batch/sidecar containers with no network needs |
With awsvpc mode, services communicate via AWS Cloud Map (service discovery) or an internal ALB. Cloud Map registers task IPs as DNS records automatically.
# Register a service with Cloud Map via ECS
aws ecs create-service \
--cluster my-cluster \
--service-name my-service \
--task-definition my-task:5 \
--desired-count 3 \
--network-configuration "awsvpcConfiguration={subnets=[subnet-abc,subnet-def],securityGroups=[sg-123],assignPublicIp=DISABLED}" \
--service-registries "registryArn=arn:aws:servicediscovery:us-east-1:123:service/srv-abc"awsvpc is the only mode supported by Fargate and the recommended mode for all new ECS workloads on EC2. It allows you to apply security group rules at the task level rather than the instance level.
Deployment Strategies and Rolling Updates
ECS services support several deployment strategies. The right choice depends on your tolerance for downtime, rollback speed requirements, and traffic shift needs.
| Strategy | How It Works | Downtime Risk | Rollback Speed |
|---|---|---|---|
| Rolling Update | Replace tasks incrementally using min/max healthy percent | Low with correct settings | Slow - re-deploy old version |
| Blue/Green (CodeDeploy) | Shift traffic between two task sets via ALB target groups | Zero downtime | Fast - flip traffic back instantly |
| External | Third-party controller (Argo Rollouts, Spinnaker) | Depends on implementation | Depends on implementation |
Rolling update minimumHealthyPercent and maximumPercent control the pace:
| Setting | Meaning | Recommended Value |
|---|---|---|
| minimumHealthyPercent | Minimum % of desired count that must be running during update | 100 for zero-downtime |
| maximumPercent | Maximum % of desired count that can be running during update | 200 for rolling (run double temporarily) |
# Blue/Green deployment with CodeDeploy
# appspec.yaml
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "<TASK_DEFINITION>"
LoadBalancerInfo:
ContainerName: "web"
ContainerPort: 8080
Hooks:
- BeforeAllowTraffic: "LambdaFunctionToRunTests"
- AfterAllowTraffic: "LambdaFunctionToRunSmokeTests"Auto Scaling: Service and Capacity Provider
ECS has two levels of scaling that work together: Service Auto Scaling (scales task count) and Capacity Providers (scales EC2 instances for EC2 launch type).
| Scaling Layer | What It Scales | Trigger Mechanism |
|---|---|---|
| Service Auto Scaling | Number of running tasks | CloudWatch metrics (CPU, memory, ALB request count, custom) |
| Capacity Provider (CAST) | EC2 instance count in cluster | Based on pending task capacity requirements |
| Fargate | No instance scaling needed | Tasks provision instantly up to account limits |
Capacity Provider Auto Scaling (CAST) uses a managed scaling policy that targets a specific utilization percentage across the cluster. AWS recommends targeting 100% - this keeps a buffer of capacity for fast task placement.
For spiky workloads on EC2 launch type, combine ECS Capacity Providers with Spot instances. Use a Spot capacity provider with weight 4 and an On-Demand provider with weight 1 for 80% Spot, 20% On-Demand mixed strategy.
ECS vs EKS - When to Choose Each
This is the most common interview question about ECS. The answer depends on team expertise, ecosystem requirements, and operational trade-offs.
| Factor | Choose ECS | Choose EKS |
|---|---|---|
| Team expertise | AWS-native team, unfamiliar with K8s | K8s experience, portable skills needed |
| Ecosystem | Deep AWS integration (IAM, ALB, CloudWatch) | CNCF ecosystem (Prometheus, Istio, ArgoCD) |
| Complexity | Simpler - fewer concepts to learn | More complex - pods, namespaces, RBAC, CRDs |
| Portability | AWS-only workloads | Multi-cloud or hybrid requirements |
| Cost | No control plane fee | $0.10/hr per cluster ($72/mo) for EKS control plane |
| Custom schedulers | Not supported | Full scheduler extensibility |
| Compliance | Good for most | Some enterprises require K8s for tooling compatibility |
AWS recommends ECS for teams starting fresh on containers who are AWS-native. EKS is better when you need Kubernetes-specific tooling, have existing K8s expertise, or plan to run on multiple clouds.
Interview Focus Points
- 1ECS vs EKS - when would you choose one over the other and what are the trade-offs?
- 2Explain the difference between a task role and a task execution role in ECS.
- 3How does ECS Blue/Green deployment work with CodeDeploy and ALB target groups?
- 4What is awsvpc networking mode and why is it preferred over bridge mode?
- 5How would you handle secrets management for ECS tasks - environment variables vs Secrets Manager vs Parameter Store?
- 6Explain ECS Capacity Providers and how they differ from traditional EC2 Auto Scaling for ECS clusters.
- 7How does ECS service auto scaling work and what metrics can trigger it?
- 8Walk me through what happens when an ECS task fails health checks during a rolling deployment.
- 9How would you design a multi-region ECS architecture for high availability?
- 10What are the cost optimization strategies for ECS workloads - when does Fargate become more expensive than EC2?