AWS Compute
EC2
Resizable virtual machines with full OS control and flexible pricing models
Amazon Elastic Compute Cloud (EC2) is the foundational compute service of AWS - it provides resizable virtual machines in the cloud. You choose the hardware profile (CPU, RAM, network), the operating system, storage, and networking. EC2 underpins almost every AWS architecture and is one of the most tested services in cloud interviews.
Instance Families and Types
EC2 instances are grouped into families optimized for different workloads. The instance name format is: Family + Generation + Attributes + Size (e.g., m7g.2xlarge).
| Family | Optimized For | Examples | Use Cases |
|---|---|---|---|
| T (Burstable) | Variable CPU workloads | t3.micro, t4g.small | Dev/test, small web apps, CI runners |
| M (General Purpose) | Balanced CPU/RAM | m7i.large, m6g.xlarge | Web servers, app servers, code repos |
| C (Compute) | High CPU performance | c7g.4xlarge, c6i.8xlarge | HPC, scientific modeling, batch processing |
| R (Memory) | High RAM-to-CPU ratio | r7g.16xlarge, r6i.2xlarge | In-memory DBs, Hadoop, SAP HANA |
| X (Extra Memory) | Extreme memory | x2gd.16xlarge | SAP HANA, in-memory analytics |
| I (Storage) | NVMe SSD throughput | i4i.4xlarge | NoSQL DBs, data warehousing |
| D (Dense Storage) | HDD throughput | d3.8xlarge | Distributed file systems, Hadoop |
| G / P / Trn | GPU / ML | g5.48xlarge, p4d.24xlarge | ML training, inference, video rendering |
| Inf | ML Inference | inf2.48xlarge | Low-latency inference with AWS Inferentia |
Graviton (g suffix, e.g., m7g) instances use AWS-designed ARM chips and typically offer 20-40% better price/performance than equivalent x86 types. Prefer Graviton when your workload is ARM-compatible.
Purchasing Options
Choosing the right pricing model can reduce EC2 costs by 60-90%. This is a common interview topic.
| Model | Discount vs On-Demand | Commitment | Best For |
|---|---|---|---|
| On-Demand | 0% (baseline) | None | Unpredictable workloads, testing, spiky traffic |
| Reserved Instances (1yr) | Up to 40% | 1 year | Steady-state workloads with known instance type/region |
| Reserved Instances (3yr) | Up to 72% | 3 years | Long-running databases, always-on services |
| Savings Plans (Compute) | Up to 66% | 1-3 years | Flexible - covers EC2, Lambda, Fargate across regions |
| Savings Plans (EC2) | Up to 72% | 1-3 years | Fixed instance family in one region, more flexible than RIs |
| Spot Instances | Up to 90% | None | Fault-tolerant, batch jobs, ML training, stateless workloads |
| Dedicated Hosts | Varies | On-demand or 1-3yr | Bring-your-own-license (BYOL), compliance, hardware isolation |
| Dedicated Instances | Higher price | On-demand | Single-tenant hardware without full host control |
Spot instances can be interrupted with a 2-minute warning. Never use Spot for workloads that cannot tolerate interruption (e.g., primary databases, synchronous payment processing).
A common architecture for cost optimization: use Reserved/Savings Plans for the baseline load, On-Demand for predictable spikes, and Spot for batch/background jobs.
Storage: EBS Volumes
EBS (Elastic Block Store) volumes are network-attached block storage for EC2. They persist independently of the instance lifecycle and can be snapshotted to S3.
| Volume Type | Max IOPS | Max Throughput | Best For |
|---|---|---|---|
| gp3 (General Purpose SSD) | 16,000 | 1,000 MB/s | Most workloads - default choice. IOPS independent of size. |
| gp2 (General Purpose SSD) | 16,000 | 250 MB/s | Legacy - gp3 is better in every way. Migrate to gp3. |
| io2 Block Express (Provisioned IOPS) | 256,000 | 4,000 MB/s | Mission-critical DBs, Oracle RAC, sub-millisecond latency |
| io1 (Provisioned IOPS) | 64,000 | 1,000 MB/s | I/O intensive databases requiring consistent performance |
| st1 (Throughput HDD) | 500 | 500 MB/s | Big data, Kafka, log processing - sequential reads |
| sc1 (Cold HDD) | 250 | 250 MB/s | Infrequently accessed data, lowest cost HDD option |
Only gp2, gp3, io1, and io2 can be used as boot volumes. st1 and sc1 are data volumes only.
- EBS Multi-Attach allows io1/io2 volumes to be attached to up to 16 instances in the same AZ simultaneously
- EBS snapshots are incremental and stored in S3 - you pay only for changed blocks
- Snapshots can be copied across regions for disaster recovery
- EBS Encryption uses KMS and is transparent to the OS - zero performance impact on Nitro instances
Networking and Security
EC2 networking is built around VPCs, subnets, security groups, and network ACLs. Understanding the difference between security groups and NACLs is a must for interviews.
| Feature | Security Groups | Network ACLs |
|---|---|---|
| Level | Instance (ENI) | Subnet |
| Statefulness | Stateful - return traffic allowed automatically | Stateless - must explicitly allow both directions |
| Rules | Allow rules only | Allow and deny rules |
| Evaluation | All rules evaluated together | Rules evaluated in order (lowest number first) |
| Default | Deny all inbound, allow all outbound | Allow all inbound and outbound |
- Elastic IPs - static IPv4 addresses that can be remapped between instances for failover
- Enhanced Networking (ENA) - up to 100 Gbps, lower latency, fewer CPU cycles for network processing
- Placement Groups: Cluster (low latency, same AZ), Spread (max 7 per AZ, hardware fault isolation), Partition (up to 7 per AZ, used by Hadoop/Cassandra/Kafka)
- Instance Metadata Service (IMDS) at 169.254.169.254 - provides instance ID, IAM role credentials, AMI ID. IMDSv2 uses session tokens and is more secure.
Auto Scaling and High Availability
EC2 Auto Scaling Groups (ASGs) maintain a fleet of instances to handle varying load. They replace unhealthy instances automatically and scale based on policies.
- Scaling policies: Target Tracking (maintain a metric like 70% CPU), Step Scaling (specific thresholds trigger step changes), Scheduled Scaling (known traffic patterns)
- Cooldown period prevents rapid scale-in/out oscillation - default 300 seconds
- Launch Templates define instance configuration (AMI, type, security groups, user data) - preferred over older Launch Configurations
- Instance refresh performs rolling replacement to update the fleet to a new AMI without downtime
- Lifecycle hooks pause instance launch/termination to run custom actions (e.g., drain connections, copy logs)
- Warm pools pre-launch and pre-configure instances in a stopped state so they can join the fleet faster
For a highly available architecture, always spread ASG instances across at least 3 Availability Zones behind a load balancer. An AZ failure should not impact the application.
Amazon Machine Images (AMIs)
An AMI is a template containing the OS, application software, and configuration needed to launch an EC2 instance. It is the "blueprint" for instances in an ASG.
- AMIs are region-specific - copy them to other regions for multi-region deployments
- AMI types: EBS-backed (persistent, can be stopped/started) vs Instance Store-backed (ephemeral, data lost on stop)
- Custom AMIs ("baked AMIs" or "golden AMIs") pre-install software to reduce boot time vs installing via user data
- AWS Systems Manager Parameter Store can track the latest AMI ID for automation
- Sharing: AMIs can be private (default), shared with specific AWS accounts, or made public
Interview Focus Points
- 1Difference between Spot, Reserved, On-Demand, and Savings Plans - when to use each
- 2Security Groups vs Network ACLs - statefulness, level, rule types
- 3Placement group types and when to choose each
- 4EBS volume types - gp3 vs io2 vs st1, and when to use each
- 5How IMDSv2 improves security over IMDSv1
- 6How Auto Scaling lifecycle hooks work and why you need them
- 7Graviton instances - benefits and workload compatibility
- 8How EBS Multi-Attach works and its limitations