S3

Infinitely scalable object storage with 99.999999999% (11 nines) durability

Amazon S3 (Simple Storage Service) is infinitely scalable object storage offering 11 nines (99.999999999%) durability by automatically replicating data across multiple Availability Zones. It serves as the backbone for data lakes, static website hosting, backup archives, and application asset storage. Every cloud architect must understand S3 deeply because it underpins dozens of AWS services and is almost always present in production architectures.

S3 Storage Classes and When to Use Each

S3 offers multiple storage classes optimized for different access patterns and cost profiles. Choosing the wrong class is one of the most common sources of unexpected AWS bills.

Storage Class	Availability	Min Duration	Retrieval	Use Case
Standard	99.99%	None	Immediate	Frequently accessed data
Intelligent-Tiering	99.9%	30 days	Immediate	Unknown or changing access patterns
Standard-IA	99.9%	30 days	Immediate	Infrequent access, rapid when needed
One Zone-IA	99.5%	30 days	Immediate	Infrequent, non-critical, reproducible
Glacier Instant	99.9%	90 days	Milliseconds	Archive with immediate access
Glacier Flexible	99.99%	90 days	1-12 hours	Backups, disaster recovery
Glacier Deep Archive	99.99%	180 days	12-48 hours	Long-term compliance archives

💡

Intelligent-Tiering automatically moves objects between tiers based on access patterns. There is a monitoring fee per 1,000 objects but no retrieval fees - ideal when you cannot predict access patterns.

⚠️

Standard-IA and One Zone-IA charge a retrieval fee per GB. If you access IA data frequently, you can end up paying more than Standard pricing. Always model expected access before choosing IA classes.

Bucket Configuration and Security Model

S3 uses a layered security model combining bucket policies, IAM policies, ACLs, and Block Public Access settings. Understanding how these interact is critical for preventing data leaks.

Control Layer	Scope	Best Practice
Block Public Access	Account or bucket level	Enable on all buckets unless intentionally public
Bucket Policy	Bucket and object level	Use for cross-account access and enforce HTTPS
IAM Policy	Principal (user/role) level	Use for granting AWS principals access
ACLs	Object level	Disable ACLs - use bucket policies instead (AWS now recommends this)
S3 Access Points	Application level	Use for multiple apps sharing one bucket with different permissions

bash

# Enforce HTTPS-only access via bucket policy
aws s3api put-bucket-policy --bucket my-bucket --policy '{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "DenyHTTP",
    "Effect": "Deny",
    "Principal": "*",
    "Action": "s3:*",
    "Resource": ["arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*"],
    "Condition": {"Bool": {"aws:SecureTransport": "false"}}
  }]
}'

💡

As of April 2023, S3 Object Ownership defaults to "Bucket owner enforced" for new buckets, which disables ACLs entirely. This is the recommended setting.

Versioning, Replication, and Lifecycle Rules

Versioning preserves every version of an object, enabling recovery from accidental deletes and overwrites. Replication copies objects to another bucket, optionally in another region or account.

Feature	CRR (Cross-Region)	SRR (Same-Region)
Primary use	Compliance, latency reduction, DR	Log aggregation, dev/prod sync
Versioning required	Yes - on source and destination	Yes - on source and destination
Existing objects	Not replicated automatically	Not replicated automatically
Delete markers	Optional replication	Optional replication
Cost	Data transfer + replication requests	Replication requests only

Lifecycle rules automate transitioning objects between storage classes and expiring old versions:

bash

# Example: transition to IA after 30 days, Glacier after 90, expire after 365
aws s3api put-bucket-lifecycle-configuration \
  --bucket my-bucket \
  --lifecycle-configuration file://lifecycle.json

# lifecycle.json snippet:
{
  "Rules": [{
    "ID": "archive-old-objects",
    "Status": "Enabled",
    "Filter": {"Prefix": "logs/"},
    "Transitions": [
      {"Days": 30, "StorageClass": "STANDARD_IA"},
      {"Days": 90, "StorageClass": "GLACIER"}
    ],
    "Expiration": {"Days": 365}
  }]
}

⚠️

Replication only applies to new objects after replication is enabled. Use S3 Batch Operations to replicate existing objects. Also note - S3 does not replicate objects that already exist in the destination bucket.

Performance Optimization and Common Patterns

S3 automatically partitions based on key prefixes. Understanding how S3 partitions data helps you design key naming conventions that scale without throttling.

Pattern	Description	Use Case
Prefix randomization	Add hash prefix to avoid hot partitions (old guidance - now largely unnecessary)	Very high-throughput legacy workloads
Multipart upload	Upload objects >100MB in parallel parts	Large files, resumable uploads
Transfer Acceleration	Route uploads via CloudFront edge network	Global upload performance
S3 Select	Query CSV/JSON/Parquet with SQL - retrieve subset	Reduce data transfer for analytics
Requester Pays	Downloader pays transfer costs	Public datasets, cost sharing
Presigned URLs	Temporary authenticated URLs	Direct browser upload/download without proxy

💡

S3 supports at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix. With multiple prefixes, throughput scales linearly. For most workloads this is not a bottleneck.

Presigned URLs are a critical pattern for serverless architectures - they allow clients to upload directly to S3 without routing large files through your application servers:

bash

# Generate a presigned URL for direct browser upload (expires in 1 hour)
aws s3 presign s3://my-bucket/uploads/video.mp4 \
  --expires-in 3600

# For PUT operations (upload), use AWS SDK:
import boto3
s3 = boto3.client('s3')
url = s3.generate_presigned_url(
    'put_object',
    Params={'Bucket': 'my-bucket', 'Key': 'uploads/video.mp4'},
    ExpiresIn=3600
)

S3 Pricing Model and Cost Optimization

S3 costs have four main dimensions: storage, requests, data transfer, and optional features. Data transfer out to the internet is often the largest surprise cost.

Cost Component	Standard Pricing (us-east-1)	Optimization
Storage	$0.023/GB/month first 50TB	Use lifecycle rules to tier down to IA/Glacier
PUT/COPY/POST/LIST	$0.005 per 1,000 requests	Batch small writes, avoid excessive list operations
GET/SELECT	$0.0004 per 1,000 requests	Use CloudFront to cache and reduce origin GETs
Data transfer out	$0.09/GB (after 1GB free)	Use CloudFront - no transfer fee S3 to CloudFront
Replication	$0.015/GB transferred	Replicate only what is needed
S3 Inventory	$0.0025 per million objects listed	Replace frequent LIST operations with Inventory

💡

Data transfer between S3 and EC2/Lambda in the same region is free. S3 to CloudFront is free. The expensive transfer is S3 to the internet or to another region.

🎯

Interview Focus Points

1How would you design an S3-based data lake for a company ingesting 10TB of logs per day?
2What is the difference between S3 bucket policies and IAM policies - when do you use each?
3A developer accidentally deleted important files from S3. How do you recover them and prevent this in the future?
4Explain how you would use presigned URLs to allow customers to upload files directly to S3 from a browser.
5What causes S3 throttling and how would you redesign a key naming scheme to avoid it?
6Walk me through the S3 storage classes - how would you choose between Standard-IA and Glacier Instant Retrieval?
7How does S3 Cross-Region Replication work and what are its limitations?
8Your S3 data transfer costs are unexpectedly high. What are the top causes and how do you diagnose them?
9How would you enforce encryption at rest and in transit for all objects in an S3 bucket?
10What is S3 Object Lock and when would a compliance team require it?