AWS Database
Keyspaces
Serverless managed Apache Cassandra-compatible wide-column database
Amazon Keyspaces (for Apache Cassandra) is a serverless, scalable, and highly available managed database service that is compatible with Apache Cassandra. It allows you to run Cassandra workloads on AWS without managing Cassandra clusters, while using the same CQL (Cassandra Query Language) drivers and data model. Keyspaces is the right choice for teams with existing Cassandra applications or developers who need wide-column NoSQL with massive write throughput and time-series-friendly data modeling.
Cassandra Data Model: Keyspaces, Tables, Partition Keys, Clustering Columns
The Cassandra data model is optimized for fast writes and efficient reads of related rows. Understanding partition keys and clustering columns is essential before designing any table.
| Concept | Description | Analogy |
|---|---|---|
| Keyspace | Top-level namespace; analogous to a database schema | SQL Schema/Database |
| Table | Collection of rows with defined columns | SQL Table |
| Partition Key | Determines which node stores the data; primary distribution mechanism | Shard key in MongoDB |
| Clustering Column | Orders rows within a partition; enables range queries within a partition | Sort key in DynamoDB |
| Primary Key | Partition key + clustering columns; must be unique per row | Composite primary key |
-- Create a keyspace
CREATE KEYSPACE sensor_data
WITH replication = {'class': 'SingleRegionStrategy'}
AND tags = {'env': 'production'};
-- Create a time-series table: sensor readings partitioned by sensor_id
-- ordered by timestamp descending
CREATE TABLE sensor_data.readings (
sensor_id text,
recorded_at timestamp,
temperature double,
humidity double,
PRIMARY KEY (sensor_id, recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC)
AND default_time_to_live = 2592000; -- 30 days TTLA partition that grows too large (hot partition or large partition) degrades read and write performance. Design your partition key to distribute data evenly. For time-series data, consider including a time bucket (e.g. year-month) in the partition key to bound partition size.
Keyspaces vs Self-Managed Cassandra: Key Differences
| Feature | Amazon Keyspaces | Self-Managed Cassandra |
|---|---|---|
| Infrastructure | Serverless - no nodes to manage | Manage nodes, storage, JVM, GC tuning |
| Scaling | Automatic (on-demand or provisioned throughput) | Manual node addition; requires rebalancing |
| Replication factor | Always 3x in one region; no configuration needed | Configurable replication factor and strategy |
| Consistency levels | LOCAL_QUORUM only for writes; LOCAL_ONE and LOCAL_QUORUM for reads | Full range: ONE, QUORUM, ALL, etc. |
| Lightweight Transactions (LWT) | Supported (IF NOT EXISTS, IF conditions) | Supported |
| Secondary indexes | Supported (but table scans under the hood) | Supported (with same caveats) |
| CQL compatibility | CQL v3 compatible; some features missing | Full CQL support |
| Multi-region replication | Not supported (single region) | Supported with NetworkTopologyStrategy |
Keyspaces does not support multi-region replication. If you need global Cassandra-style replication across regions, you must use self-managed Cassandra or DataStax Astra. This is a hard blocker for some use cases.
Capacity Modes: On-Demand vs Provisioned
Like DynamoDB, Keyspaces offers on-demand and provisioned capacity modes. Provisioned mode supports auto scaling.
| Mode | Pricing Unit | Best For |
|---|---|---|
| On-Demand | Per read/write request unit | Unpredictable or new workloads |
| Provisioned | Per read/write capacity unit-hour | Steady-state, cost-optimized production |
One Keyspaces Write Request Unit (WRU) = one write up to 1 KB. One Read Request Unit (RRU) = one LOCAL_ONE read up to 4 KB; LOCAL_QUORUM costs 2x. This is similar to DynamoDB capacity units.
Point-in-Time Recovery, Encryption, and Client Connectivity
| Feature | Detail |
|---|---|
| Encryption at rest | AES-256 via AWS-owned key or customer KMS key |
| Encryption in transit | TLS required; use Starfield certificate |
| Authentication | SigV4 plugin or username/password via Secrets Manager |
| PITR | Point-in-Time Recovery: restore to any second in past 35 days |
| Backups | On-demand snapshots to S3 in addition to PITR |
| Multi-AZ | Data stored in 3 AZs automatically; no configuration needed |
# Connect using cqlsh with SigV4 auth
cqlsh cassandra.us-east-1.amazonaws.com 9142 \
--ssl \
--auth-provider "SigV4AuthProvider" \
--ssl-certificate /path/to/sf-class2-root.crt
# Or using username/password (Secrets Manager)
cqlsh cassandra.us-east-1.amazonaws.com 9142 \
--ssl \
--username "your-iam-user-at-123456789012" \
--password "your-service-specific-credential"Interview Focus Points
- 1What is the Cassandra data model? Explain partition keys and clustering columns.
- 2What are the key differences between Amazon Keyspaces and self-managed Apache Cassandra?
- 3When would you choose Keyspaces over DynamoDB for a NoSQL workload?
- 4What consistency levels does Keyspaces support? How does this differ from native Cassandra?
- 5What is a hot partition in Cassandra/Keyspaces and how do you design to avoid it?
- 6How do you migrate an existing Cassandra application to Keyspaces?
- 7What are the limitations of Keyspaces compared to native Cassandra?
- 8How does PITR work in Keyspaces and what is the maximum recovery window?