Ace Cloud Interviews
Home/AWS Tutorial/Keyspaces
🗃️

AWS Database

Keyspaces

Serverless managed Apache Cassandra-compatible wide-column database

Amazon Keyspaces (for Apache Cassandra) is a serverless, scalable, and highly available managed database service that is compatible with Apache Cassandra. It allows you to run Cassandra workloads on AWS without managing Cassandra clusters, while using the same CQL (Cassandra Query Language) drivers and data model. Keyspaces is the right choice for teams with existing Cassandra applications or developers who need wide-column NoSQL with massive write throughput and time-series-friendly data modeling.

Cassandra Data Model: Keyspaces, Tables, Partition Keys, Clustering Columns

The Cassandra data model is optimized for fast writes and efficient reads of related rows. Understanding partition keys and clustering columns is essential before designing any table.

ConceptDescriptionAnalogy
KeyspaceTop-level namespace; analogous to a database schemaSQL Schema/Database
TableCollection of rows with defined columnsSQL Table
Partition KeyDetermines which node stores the data; primary distribution mechanismShard key in MongoDB
Clustering ColumnOrders rows within a partition; enables range queries within a partitionSort key in DynamoDB
Primary KeyPartition key + clustering columns; must be unique per rowComposite primary key
bash
-- Create a keyspace
CREATE KEYSPACE sensor_data
WITH replication = {'class': 'SingleRegionStrategy'}
AND tags = {'env': 'production'};

-- Create a time-series table: sensor readings partitioned by sensor_id
-- ordered by timestamp descending
CREATE TABLE sensor_data.readings (
  sensor_id   text,
  recorded_at timestamp,
  temperature double,
  humidity    double,
  PRIMARY KEY (sensor_id, recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC)
  AND default_time_to_live = 2592000; -- 30 days TTL
⚠️

A partition that grows too large (hot partition or large partition) degrades read and write performance. Design your partition key to distribute data evenly. For time-series data, consider including a time bucket (e.g. year-month) in the partition key to bound partition size.

Keyspaces vs Self-Managed Cassandra: Key Differences

FeatureAmazon KeyspacesSelf-Managed Cassandra
InfrastructureServerless - no nodes to manageManage nodes, storage, JVM, GC tuning
ScalingAutomatic (on-demand or provisioned throughput)Manual node addition; requires rebalancing
Replication factorAlways 3x in one region; no configuration neededConfigurable replication factor and strategy
Consistency levelsLOCAL_QUORUM only for writes; LOCAL_ONE and LOCAL_QUORUM for readsFull range: ONE, QUORUM, ALL, etc.
Lightweight Transactions (LWT)Supported (IF NOT EXISTS, IF conditions)Supported
Secondary indexesSupported (but table scans under the hood)Supported (with same caveats)
CQL compatibilityCQL v3 compatible; some features missingFull CQL support
Multi-region replicationNot supported (single region)Supported with NetworkTopologyStrategy
⚠️

Keyspaces does not support multi-region replication. If you need global Cassandra-style replication across regions, you must use self-managed Cassandra or DataStax Astra. This is a hard blocker for some use cases.

Capacity Modes: On-Demand vs Provisioned

Like DynamoDB, Keyspaces offers on-demand and provisioned capacity modes. Provisioned mode supports auto scaling.

ModePricing UnitBest For
On-DemandPer read/write request unitUnpredictable or new workloads
ProvisionedPer read/write capacity unit-hourSteady-state, cost-optimized production
💡

One Keyspaces Write Request Unit (WRU) = one write up to 1 KB. One Read Request Unit (RRU) = one LOCAL_ONE read up to 4 KB; LOCAL_QUORUM costs 2x. This is similar to DynamoDB capacity units.

Point-in-Time Recovery, Encryption, and Client Connectivity

FeatureDetail
Encryption at restAES-256 via AWS-owned key or customer KMS key
Encryption in transitTLS required; use Starfield certificate
AuthenticationSigV4 plugin or username/password via Secrets Manager
PITRPoint-in-Time Recovery: restore to any second in past 35 days
BackupsOn-demand snapshots to S3 in addition to PITR
Multi-AZData stored in 3 AZs automatically; no configuration needed
bash
# Connect using cqlsh with SigV4 auth
cqlsh cassandra.us-east-1.amazonaws.com 9142 \
  --ssl \
  --auth-provider "SigV4AuthProvider" \
  --ssl-certificate /path/to/sf-class2-root.crt

# Or using username/password (Secrets Manager)
cqlsh cassandra.us-east-1.amazonaws.com 9142 \
  --ssl \
  --username "your-iam-user-at-123456789012" \
  --password "your-service-specific-credential"
🎯

Interview Focus Points

  • 1What is the Cassandra data model? Explain partition keys and clustering columns.
  • 2What are the key differences between Amazon Keyspaces and self-managed Apache Cassandra?
  • 3When would you choose Keyspaces over DynamoDB for a NoSQL workload?
  • 4What consistency levels does Keyspaces support? How does this differ from native Cassandra?
  • 5What is a hot partition in Cassandra/Keyspaces and how do you design to avoid it?
  • 6How do you migrate an existing Cassandra application to Keyspaces?
  • 7What are the limitations of Keyspaces compared to native Cassandra?
  • 8How does PITR work in Keyspaces and what is the maximum recovery window?