AWS Database
Timestream
Fast, scalable, serverless time-series database for IoT and operational applications
Amazon Timestream is a fast, scalable, serverless time-series database designed for IoT telemetry, DevOps metrics, and operational data that is naturally timestamped. It automatically tiers data between an in-memory store for recent data and a cost-optimized magnetic store for historical data, separating hot and cold storage transparently. Timestream can store and analyze trillions of time-series data points per day at a fraction of the cost of a relational database storing the same data.
Timestream Architecture: Memory Store and Magnetic Store
Timestream automatically manages two storage tiers. Recent data lives in the in-memory store for high-speed ingest and query. As data ages past the memory store retention period, Timestream automatically moves it to the magnetic store. Queries transparently span both tiers.
| Attribute | Memory Store | Magnetic Store |
|---|---|---|
| Purpose | Recent data; high-speed writes and queries | Historical data; cost-optimized |
| Latency | Microseconds to milliseconds | Milliseconds to seconds |
| Retention | Configurable (hours to days) | Configurable (years; up to unlimited) |
| Cost | Higher (per GB-hour) | Lower (per GB-month) |
| Data movement | Automatic when memory retention expires | Transparent to queries |
A common configuration for IoT telemetry is a 24-hour memory store with a 1-year magnetic store. Recent dashboard queries hit the memory store and are fast; historical analytics queries go to the magnetic store at lower cost.
Data Model: Databases, Tables, Dimensions, Measures, and Time
Timestream organizes data differently from relational databases. Every record has a timestamp, dimensions (metadata that describes the series), and measures (the actual measured values).
| Concept | Description | Example |
|---|---|---|
| Database | Logical container for tables | 'iot-sensors' |
| Table | Container for time-series records | 'temperature-readings' |
| Dimension | Metadata identifying the time series; low cardinality | device_id, location, sensor_type |
| Measure name | What is being measured | 'cpu_utilization', 'temperature' |
| Measure value | The actual reading | 72.5, 98 |
| Time | Timestamp of the measurement (nanosecond precision) | 2024-01-15 10:00:00.000000000 |
-- Timestream uses a SQL-like query language
-- Query average temperature per device in the last hour
SELECT device_id,
AVG(measure_value::double) AS avg_temp,
bin(time, 5m) AS time_bucket
FROM "iot-sensors"."temperature-readings"
WHERE measure_name = 'temperature'
AND time BETWEEN ago(1h) AND now()
GROUP BY device_id, bin(time, 5m)
ORDER BY time_bucket DESC
-- Use built-in time-series functions
SELECT device_id,
INTERPOLATE_LINEAR(
CREATE_TIME_SERIES(time, measure_value::double),
SEQUENCE(min(time), max(time), 1m)
) AS interpolated_readings
FROM "iot-sensors"."temperature-readings"
WHERE measure_name = 'temperature'
AND time BETWEEN ago(6h) AND now()
GROUP BY device_idIntegrations: IoT Core, Kinesis, Grafana, and SageMaker
| Integration | How It Works | Use Case |
|---|---|---|
| AWS IoT Core | IoT Core rules can route MQTT messages directly to Timestream | Ingest IoT device telemetry without custom code |
| Amazon Kinesis Data Streams | Kinesis Data Analytics (Flink) can write to Timestream | High-throughput streaming telemetry |
| Amazon Managed Grafana | Native Timestream data source plugin | Real-time operational dashboards |
| AWS Lambda | Write records via the Timestream SDK in Lambda | Serverless telemetry pipeline |
| Amazon SageMaker | Export Timestream data to S3 for ML training | Anomaly detection, forecasting models |
The native Grafana integration is one of Timestream's strongest selling points for DevOps and IoT teams. You can stand up a real-time operational dashboard in minutes using Amazon Managed Grafana with a Timestream data source, without building any custom query layer.
Timestream vs InfluxDB vs TimescaleDB vs DynamoDB for Time-Series
| Database | Model | Strengths | Weaknesses |
|---|---|---|---|
| Timestream | Serverless, managed, AWS-native | No ops, auto-tiering, AWS integrations, SQL-like | AWS lock-in, limited query flexibility vs SQL |
| InfluxDB (Cloud) | Purpose-built time-series, open source + managed | Flux query language, rich ecosystem, multi-cloud | Flux learning curve, OSS version self-managed |
| TimescaleDB | PostgreSQL extension for time-series | Full SQL, ACID, rich ecosystem, familiar | Requires PostgreSQL management (unless Timescale Cloud) |
| DynamoDB | Key-value NoSQL | Sub-millisecond reads, serverless, global tables | No time-series functions, expensive for high-frequency writes |
Choose Timestream when you are already on AWS, want zero infrastructure management, and need native IoT Core or Kinesis integration. Choose TimescaleDB when you need full SQL and your team is comfortable with PostgreSQL. Choose InfluxDB when you need the Flux ecosystem or multi-cloud portability.
Pricing Model
| Component | Pricing Basis | Tip |
|---|---|---|
| Writes | Per million write requests (per 1 KB chunk) | Batch writes (up to 100 records) to reduce per-request cost |
| Memory store | Per GB-hour stored | Keep retention short; move to magnetic quickly |
| Magnetic store | Per GB-month stored | Enable magnetic store writes for late-arriving data |
| Queries | Per GB of data scanned | Use time predicates to minimize data scanned |
| Scheduled queries | Per scheduled query execution + data scanned | Materialize aggregates to reduce ad-hoc scan costs |
Timestream query costs are based on data scanned, similar to Athena. A query without a time range predicate will scan the entire table and generate a large bill. Always include WHERE time BETWEEN ... AND ... in production queries.
Interview Focus Points
- 1What is Timestream and what types of workloads is it optimized for?
- 2Explain the memory store and magnetic store architecture. How does automatic tiering work?
- 3How does Timestream pricing work? What are the main cost levers?
- 4Compare Timestream to storing time-series data in DynamoDB or RDS. When does each make sense?
- 5How do you ingest IoT telemetry from AWS IoT Core into Timestream?
- 6What are Timestream scheduled queries and why would you use them?
- 7What are dimensions and measures in the Timestream data model?
- 8What time-series specific query functions does Timestream provide?