Ace Cloud Interviews
🗃️

AWS Database

RDS

Managed relational databases - MySQL, PostgreSQL, Oracle, SQL Server, MariaDB

Amazon RDS (Relational Database Service) is a managed service that handles provisioning, patching, backups, and failover for six database engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Amazon Aurora. It removes the undifferentiated heavy lifting of running a relational database so engineers can focus on schema design and query optimization rather than OS maintenance. RDS is the default choice for any OLTP workload that needs SQL semantics and ACID guarantees.

How RDS Works: Instance, Storage, and Replication

An RDS deployment consists of a DB instance (compute), storage (EBS-backed gp2/gp3/io1/io2), and optionally a Multi-AZ standby or read replicas. The primary instance writes to EBS, which is synchronously replicated to the standby in Multi-AZ mode. Read replicas use asynchronous replication and serve read traffic.

ComponentDescriptionKey Behaviour
DB InstanceCompute running the database engineSized by instance class (db.t3, db.r6g, etc.)
EBS StoragePersistent block storage attached to the instancegp3 is default; io1/io2 for high IOPS
Multi-AZ StandbySynchronous replica in a different AZAutomatic failover in ~60-120 seconds
Read ReplicaAsynchronous read-only copyUp to 5 per primary; can be cross-region
Parameter GroupEngine configuration (e.g. max_connections)Changes may require a reboot
Option GroupOptional engine features (e.g. Oracle APEX)Engine-specific add-ons
💡

Multi-AZ is for high availability (HA), not for read scaling. Read replicas are for read scaling but do not provide automatic failover.

Storage Types and IOPS Sizing

Choosing the right storage type is one of the most common RDS sizing mistakes. Under-provisioning IOPS leads to queue depth buildup and latency spikes that are hard to diagnose.

Storage TypeMax IOPSMax ThroughputUse Case
gp216,000 (burst)250 MB/sGeneral purpose, legacy default
gp364,0004,000 MB/sGeneral purpose, cost-optimized default
io164,0001,000 MB/sI/O-intensive OLTP (legacy)
io2 Block Express256,0004,000 MB/sMission-critical, sub-millisecond latency
💡

Migrate existing gp2 instances to gp3 - you get 3,000 IOPS and 125 MB/s baseline at no extra cost versus gp2's 100 IOPS/GB ratio. For volumes under 1 TB this is almost always cheaper.

⚠️

Storage autoscaling only expands volume - it never shrinks. Plan your initial size with headroom because you cannot scale down without creating a new instance from a snapshot.

HA and Disaster Recovery Patterns

RDS provides several layers of protection. Understanding the RTO and RPO of each is essential for architecture decisions and disaster recovery planning interviews.

PatternRTORPONotes
Multi-AZ (synchronous)60-120 secNear zeroAutomatic failover, same region
Read Replica promotedMinutes (manual)Seconds of lagManual intervention required
Cross-region read replicaMinutes (manual)Seconds to minutesGood DR target for another region
Automated backupsHoursUp to 5 min (PITR)Point-in-time recovery within retention window
Manual snapshotsHoursSnapshot agePersists after instance deletion
💡

Enable automated backups with at least a 7-day retention period to get Point-in-Time Recovery (PITR). PITR restores from the last backup plus transaction logs, giving you recovery to any second within the retention window.

RDS Proxy: Connection Pooling for Serverless Workloads

RDS Proxy sits between your application and RDS, pooling and sharing database connections. It is critical when using Lambda - each Lambda invocation opens a new connection, and without a proxy, a burst to 1,000 concurrent Lambdas creates 1,000 database connections, which can exhaust max_connections on small instances.

FeatureWithout ProxyWith RDS Proxy
Connection countOne per app thread/LambdaPooled - far fewer to DB
Failover time60-120 secondsReduced by pinning to new primary faster
IAM authPossible but complexNative IAM auth support
CostNo extra cost$0.015/vCPU-hour of DB instance
bash
# Create an RDS Proxy
aws rds create-db-proxy \
  --db-proxy-name my-proxy \
  --engine-family MYSQL \
  --auth '[{"AuthScheme":"SECRETS","SecretArn":"arn:aws:secretsmanager:...","IAMAuth":"REQUIRED"}]' \
  --role-arn arn:aws:iam::123456789012:role/rds-proxy-role \
  --vpc-subnet-ids subnet-abc subnet-def

Pricing Model and Cost Optimization

RDS pricing has five components. Optimizing each one independently can cut costs significantly without sacrificing performance.

ComponentPricing BasisOptimization Tip
Instance hoursPer hour by instance classReserve 1-3 years for production (up to 69% savings)
StoragePer GB-month (gp3 cheaper than gp2)Migrate to gp3; enable storage autoscaling
I/O requestsPer million I/Os (gp2/gp3 baseline free)Monitor read/write IOPS; io1 only if needed
Backup storageFree up to DB size; per GB beyondReduce retention window on dev/staging
Data transferPer GB for cross-AZ/cross-regionKeep app and DB in same AZ to avoid cross-AZ fees
Multi-AZ~2x instance + storage costUse only in production; use single-AZ in dev
⚠️

Cross-AZ data transfer is charged even within the same VPC. If your application and RDS are in different AZs you pay for every byte. Always pin your app instances to the same AZ as the primary RDS endpoint when latency and cost matter.

CLI Commands and Operational Runbook

bash
# Describe all RDS instances
aws rds describe-db-instances --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceStatus,MultiAZ]' --output table

# Create a manual snapshot before a risky migration
aws rds create-db-snapshot \
  --db-instance-identifier mydb \
  --db-snapshot-identifier mydb-pre-migration-$(date +%Y%m%d)

# Initiate manual Multi-AZ failover (for testing)
aws rds reboot-db-instance \
  --db-instance-identifier mydb \
  --force-failover

# Restore to a point in time
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier mydb \
  --target-db-instance-identifier mydb-restored \
  --restore-time 2024-01-15T12:00:00Z

# Modify instance class (will cause downtime without Multi-AZ)
aws rds modify-db-instance \
  --db-instance-identifier mydb \
  --db-instance-class db.r6g.large \
  --apply-immediately
🎯

Interview Focus Points

  • 1What is the difference between Multi-AZ and a read replica? When would you use each?
  • 2Walk me through what happens during an RDS Multi-AZ failover. What is the RTO and RPO?
  • 3A Lambda function is throwing "too many connections" errors against RDS. How do you fix it?
  • 4When would you choose io2 Block Express storage over gp3? What metrics would guide that decision?
  • 5How does Point-in-Time Recovery work in RDS? What are its limitations?
  • 6How would you migrate an on-premises MySQL database to RDS with minimal downtime?
  • 7What happens to RDS when the EBS volume runs out of space? How do you prevent it?
  • 8Explain RDS Proxy - how does it work and what problems does it solve?
  • 9What is the difference between a parameter group and an option group in RDS?
  • 10A production RDS instance is running slow. What metrics and tools do you use to diagnose it?