RDS

Managed relational databases - MySQL, PostgreSQL, Oracle, SQL Server, MariaDB

Amazon RDS (Relational Database Service) is a managed service that handles provisioning, patching, backups, and failover for six database engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Amazon Aurora. It removes the undifferentiated heavy lifting of running a relational database so engineers can focus on schema design and query optimization rather than OS maintenance. RDS is the default choice for any OLTP workload that needs SQL semantics and ACID guarantees.

How RDS Works: Instance, Storage, and Replication

An RDS deployment consists of a DB instance (compute), storage (EBS-backed gp2/gp3/io1/io2), and optionally a Multi-AZ standby or read replicas. The primary instance writes to EBS, which is synchronously replicated to the standby in Multi-AZ mode. Read replicas use asynchronous replication and serve read traffic.

Component	Description	Key Behaviour
DB Instance	Compute running the database engine	Sized by instance class (db.t3, db.r6g, etc.)
EBS Storage	Persistent block storage attached to the instance	gp3 is default; io1/io2 for high IOPS
Multi-AZ Standby	Synchronous replica in a different AZ	Automatic failover in ~60-120 seconds
Read Replica	Asynchronous read-only copy	Up to 5 per primary; can be cross-region
Parameter Group	Engine configuration (e.g. max_connections)	Changes may require a reboot
Option Group	Optional engine features (e.g. Oracle APEX)	Engine-specific add-ons

💡

Multi-AZ is for high availability (HA), not for read scaling. Read replicas are for read scaling but do not provide automatic failover.

Storage Types and IOPS Sizing

Choosing the right storage type is one of the most common RDS sizing mistakes. Under-provisioning IOPS leads to queue depth buildup and latency spikes that are hard to diagnose.

Storage Type	Max IOPS	Max Throughput	Use Case
gp2	16,000 (burst)	250 MB/s	General purpose, legacy default
gp3	64,000	4,000 MB/s	General purpose, cost-optimized default
io1	64,000	1,000 MB/s	I/O-intensive OLTP (legacy)
io2 Block Express	256,000	4,000 MB/s	Mission-critical, sub-millisecond latency

💡

Migrate existing gp2 instances to gp3 - you get 3,000 IOPS and 125 MB/s baseline at no extra cost versus gp2's 100 IOPS/GB ratio. For volumes under 1 TB this is almost always cheaper.

⚠️

Storage autoscaling only expands volume - it never shrinks. Plan your initial size with headroom because you cannot scale down without creating a new instance from a snapshot.

HA and Disaster Recovery Patterns

RDS provides several layers of protection. Understanding the RTO and RPO of each is essential for architecture decisions and disaster recovery planning interviews.

Pattern	RTO	RPO	Notes
Multi-AZ (synchronous)	60-120 sec	Near zero	Automatic failover, same region
Read Replica promoted	Minutes (manual)	Seconds of lag	Manual intervention required
Cross-region read replica	Minutes (manual)	Seconds to minutes	Good DR target for another region
Automated backups	Hours	Up to 5 min (PITR)	Point-in-time recovery within retention window
Manual snapshots	Hours	Snapshot age	Persists after instance deletion

💡

Enable automated backups with at least a 7-day retention period to get Point-in-Time Recovery (PITR). PITR restores from the last backup plus transaction logs, giving you recovery to any second within the retention window.

RDS Proxy: Connection Pooling for Serverless Workloads

RDS Proxy sits between your application and RDS, pooling and sharing database connections. It is critical when using Lambda - each Lambda invocation opens a new connection, and without a proxy, a burst to 1,000 concurrent Lambdas creates 1,000 database connections, which can exhaust max_connections on small instances.

Feature	Without Proxy	With RDS Proxy
Connection count	One per app thread/Lambda	Pooled - far fewer to DB
Failover time	60-120 seconds	Reduced by pinning to new primary faster
IAM auth	Possible but complex	Native IAM auth support
Cost	No extra cost	$0.015/vCPU-hour of DB instance

bash

# Create an RDS Proxy
aws rds create-db-proxy \
  --db-proxy-name my-proxy \
  --engine-family MYSQL \
  --auth '[{"AuthScheme":"SECRETS","SecretArn":"arn:aws:secretsmanager:...","IAMAuth":"REQUIRED"}]' \
  --role-arn arn:aws:iam::123456789012:role/rds-proxy-role \
  --vpc-subnet-ids subnet-abc subnet-def

Pricing Model and Cost Optimization

RDS pricing has five components. Optimizing each one independently can cut costs significantly without sacrificing performance.

Component	Pricing Basis	Optimization Tip
Instance hours	Per hour by instance class	Reserve 1-3 years for production (up to 69% savings)
Storage	Per GB-month (gp3 cheaper than gp2)	Migrate to gp3; enable storage autoscaling
I/O requests	Per million I/Os (gp2/gp3 baseline free)	Monitor read/write IOPS; io1 only if needed
Backup storage	Free up to DB size; per GB beyond	Reduce retention window on dev/staging
Data transfer	Per GB for cross-AZ/cross-region	Keep app and DB in same AZ to avoid cross-AZ fees
Multi-AZ	~2x instance + storage cost	Use only in production; use single-AZ in dev

⚠️

Cross-AZ data transfer is charged even within the same VPC. If your application and RDS are in different AZs you pay for every byte. Always pin your app instances to the same AZ as the primary RDS endpoint when latency and cost matter.

CLI Commands and Operational Runbook

bash

# Describe all RDS instances
aws rds describe-db-instances --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceStatus,MultiAZ]' --output table

# Create a manual snapshot before a risky migration
aws rds create-db-snapshot \
  --db-instance-identifier mydb \
  --db-snapshot-identifier mydb-pre-migration-$(date +%Y%m%d)

# Initiate manual Multi-AZ failover (for testing)
aws rds reboot-db-instance \
  --db-instance-identifier mydb \
  --force-failover

# Restore to a point in time
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier mydb \
  --target-db-instance-identifier mydb-restored \
  --restore-time 2024-01-15T12:00:00Z

# Modify instance class (will cause downtime without Multi-AZ)
aws rds modify-db-instance \
  --db-instance-identifier mydb \
  --db-instance-class db.r6g.large \
  --apply-immediately

🎯

Interview Focus Points

1What is the difference between Multi-AZ and a read replica? When would you use each?
2Walk me through what happens during an RDS Multi-AZ failover. What is the RTO and RPO?
3A Lambda function is throwing "too many connections" errors against RDS. How do you fix it?
4When would you choose io2 Block Express storage over gp3? What metrics would guide that decision?
5How does Point-in-Time Recovery work in RDS? What are its limitations?
6How would you migrate an on-premises MySQL database to RDS with minimal downtime?
7What happens to RDS when the EBS volume runs out of space? How do you prevent it?
8Explain RDS Proxy - how does it work and what problems does it solve?
9What is the difference between a parameter group and an option group in RDS?
10A production RDS instance is running slow. What metrics and tools do you use to diagnose it?