Keyspaces

Serverless managed Apache Cassandra-compatible wide-column database

Amazon Keyspaces (for Apache Cassandra) is a serverless, scalable, and highly available managed database service that is compatible with Apache Cassandra. It allows you to run Cassandra workloads on AWS without managing Cassandra clusters, while using the same CQL (Cassandra Query Language) drivers and data model. Keyspaces is the right choice for teams with existing Cassandra applications or developers who need wide-column NoSQL with massive write throughput and time-series-friendly data modeling.

Cassandra Data Model: Keyspaces, Tables, Partition Keys, Clustering Columns

The Cassandra data model is optimized for fast writes and efficient reads of related rows. Understanding partition keys and clustering columns is essential before designing any table.

Concept	Description	Analogy
Keyspace	Top-level namespace; analogous to a database schema	SQL Schema/Database
Table	Collection of rows with defined columns	SQL Table
Partition Key	Determines which node stores the data; primary distribution mechanism	Shard key in MongoDB
Clustering Column	Orders rows within a partition; enables range queries within a partition	Sort key in DynamoDB
Primary Key	Partition key + clustering columns; must be unique per row	Composite primary key

bash

-- Create a keyspace
CREATE KEYSPACE sensor_data
WITH replication = {'class': 'SingleRegionStrategy'}
AND tags = {'env': 'production'};

-- Create a time-series table: sensor readings partitioned by sensor_id
-- ordered by timestamp descending
CREATE TABLE sensor_data.readings (
  sensor_id   text,
  recorded_at timestamp,
  temperature double,
  humidity    double,
  PRIMARY KEY (sensor_id, recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC)
  AND default_time_to_live = 2592000; -- 30 days TTL

⚠️

A partition that grows too large (hot partition or large partition) degrades read and write performance. Design your partition key to distribute data evenly. For time-series data, consider including a time bucket (e.g. year-month) in the partition key to bound partition size.

Keyspaces vs Self-Managed Cassandra: Key Differences

Feature	Amazon Keyspaces	Self-Managed Cassandra
Infrastructure	Serverless - no nodes to manage	Manage nodes, storage, JVM, GC tuning
Scaling	Automatic (on-demand or provisioned throughput)	Manual node addition; requires rebalancing
Replication factor	Always 3x in one region; no configuration needed	Configurable replication factor and strategy
Consistency levels	LOCAL_QUORUM only for writes; LOCAL_ONE and LOCAL_QUORUM for reads	Full range: ONE, QUORUM, ALL, etc.
Lightweight Transactions (LWT)	Supported (IF NOT EXISTS, IF conditions)	Supported
Secondary indexes	Supported (but table scans under the hood)	Supported (with same caveats)
CQL compatibility	CQL v3 compatible; some features missing	Full CQL support
Multi-region replication	Not supported (single region)	Supported with NetworkTopologyStrategy

⚠️

Keyspaces does not support multi-region replication. If you need global Cassandra-style replication across regions, you must use self-managed Cassandra or DataStax Astra. This is a hard blocker for some use cases.

Capacity Modes: On-Demand vs Provisioned

Like DynamoDB, Keyspaces offers on-demand and provisioned capacity modes. Provisioned mode supports auto scaling.

Mode	Pricing Unit	Best For
On-Demand	Per read/write request unit	Unpredictable or new workloads
Provisioned	Per read/write capacity unit-hour	Steady-state, cost-optimized production

💡

One Keyspaces Write Request Unit (WRU) = one write up to 1 KB. One Read Request Unit (RRU) = one LOCAL_ONE read up to 4 KB; LOCAL_QUORUM costs 2x. This is similar to DynamoDB capacity units.

Point-in-Time Recovery, Encryption, and Client Connectivity

Feature	Detail
Encryption at rest	AES-256 via AWS-owned key or customer KMS key
Encryption in transit	TLS required; use Starfield certificate
Authentication	SigV4 plugin or username/password via Secrets Manager
PITR	Point-in-Time Recovery: restore to any second in past 35 days
Backups	On-demand snapshots to S3 in addition to PITR
Multi-AZ	Data stored in 3 AZs automatically; no configuration needed

bash

# Connect using cqlsh with SigV4 auth
cqlsh cassandra.us-east-1.amazonaws.com 9142 \
  --ssl \
  --auth-provider "SigV4AuthProvider" \
  --ssl-certificate /path/to/sf-class2-root.crt

# Or using username/password (Secrets Manager)
cqlsh cassandra.us-east-1.amazonaws.com 9142 \
  --ssl \
  --username "your-iam-user-at-123456789012" \
  --password "your-service-specific-credential"

🎯

Interview Focus Points

1What is the Cassandra data model? Explain partition keys and clustering columns.
2What are the key differences between Amazon Keyspaces and self-managed Apache Cassandra?
3When would you choose Keyspaces over DynamoDB for a NoSQL workload?
4What consistency levels does Keyspaces support? How does this differ from native Cassandra?
5What is a hot partition in Cassandra/Keyspaces and how do you design to avoid it?
6How do you migrate an existing Cassandra application to Keyspaces?
7What are the limitations of Keyspaces compared to native Cassandra?
8How does PITR work in Keyspaces and what is the maximum recovery window?