Timestream

Fast, scalable, serverless time-series database for IoT and operational applications

Amazon Timestream is a fast, scalable, serverless time-series database designed for IoT telemetry, DevOps metrics, and operational data that is naturally timestamped. It automatically tiers data between an in-memory store for recent data and a cost-optimized magnetic store for historical data, separating hot and cold storage transparently. Timestream can store and analyze trillions of time-series data points per day at a fraction of the cost of a relational database storing the same data.

Timestream Architecture: Memory Store and Magnetic Store

Timestream automatically manages two storage tiers. Recent data lives in the in-memory store for high-speed ingest and query. As data ages past the memory store retention period, Timestream automatically moves it to the magnetic store. Queries transparently span both tiers.

Attribute	Memory Store	Magnetic Store
Purpose	Recent data; high-speed writes and queries	Historical data; cost-optimized
Latency	Microseconds to milliseconds	Milliseconds to seconds
Retention	Configurable (hours to days)	Configurable (years; up to unlimited)
Cost	Higher (per GB-hour)	Lower (per GB-month)
Data movement	Automatic when memory retention expires	Transparent to queries

💡

A common configuration for IoT telemetry is a 24-hour memory store with a 1-year magnetic store. Recent dashboard queries hit the memory store and are fast; historical analytics queries go to the magnetic store at lower cost.

Data Model: Databases, Tables, Dimensions, Measures, and Time

Timestream organizes data differently from relational databases. Every record has a timestamp, dimensions (metadata that describes the series), and measures (the actual measured values).

Concept	Description	Example
Database	Logical container for tables	'iot-sensors'
Table	Container for time-series records	'temperature-readings'
Dimension	Metadata identifying the time series; low cardinality	device_id, location, sensor_type
Measure name	What is being measured	'cpu_utilization', 'temperature'
Measure value	The actual reading	72.5, 98
Time	Timestamp of the measurement (nanosecond precision)	2024-01-15 10:00:00.000000000

bash

-- Timestream uses a SQL-like query language
-- Query average temperature per device in the last hour
SELECT device_id,
       AVG(measure_value::double) AS avg_temp,
       bin(time, 5m) AS time_bucket
FROM "iot-sensors"."temperature-readings"
WHERE measure_name = 'temperature'
  AND time BETWEEN ago(1h) AND now()
GROUP BY device_id, bin(time, 5m)
ORDER BY time_bucket DESC

-- Use built-in time-series functions
SELECT device_id,
       INTERPOLATE_LINEAR(
         CREATE_TIME_SERIES(time, measure_value::double),
         SEQUENCE(min(time), max(time), 1m)
       ) AS interpolated_readings
FROM "iot-sensors"."temperature-readings"
WHERE measure_name = 'temperature'
  AND time BETWEEN ago(6h) AND now()
GROUP BY device_id

Integrations: IoT Core, Kinesis, Grafana, and SageMaker

Integration	How It Works	Use Case
AWS IoT Core	IoT Core rules can route MQTT messages directly to Timestream	Ingest IoT device telemetry without custom code
Amazon Kinesis Data Streams	Kinesis Data Analytics (Flink) can write to Timestream	High-throughput streaming telemetry
Amazon Managed Grafana	Native Timestream data source plugin	Real-time operational dashboards
AWS Lambda	Write records via the Timestream SDK in Lambda	Serverless telemetry pipeline
Amazon SageMaker	Export Timestream data to S3 for ML training	Anomaly detection, forecasting models

💡

The native Grafana integration is one of Timestream's strongest selling points for DevOps and IoT teams. You can stand up a real-time operational dashboard in minutes using Amazon Managed Grafana with a Timestream data source, without building any custom query layer.

Timestream vs InfluxDB vs TimescaleDB vs DynamoDB for Time-Series

Database	Model	Strengths	Weaknesses
Timestream	Serverless, managed, AWS-native	No ops, auto-tiering, AWS integrations, SQL-like	AWS lock-in, limited query flexibility vs SQL
InfluxDB (Cloud)	Purpose-built time-series, open source + managed	Flux query language, rich ecosystem, multi-cloud	Flux learning curve, OSS version self-managed
TimescaleDB	PostgreSQL extension for time-series	Full SQL, ACID, rich ecosystem, familiar	Requires PostgreSQL management (unless Timescale Cloud)
DynamoDB	Key-value NoSQL	Sub-millisecond reads, serverless, global tables	No time-series functions, expensive for high-frequency writes

💡

Choose Timestream when you are already on AWS, want zero infrastructure management, and need native IoT Core or Kinesis integration. Choose TimescaleDB when you need full SQL and your team is comfortable with PostgreSQL. Choose InfluxDB when you need the Flux ecosystem or multi-cloud portability.

Pricing Model

Component	Pricing Basis	Tip
Writes	Per million write requests (per 1 KB chunk)	Batch writes (up to 100 records) to reduce per-request cost
Memory store	Per GB-hour stored	Keep retention short; move to magnetic quickly
Magnetic store	Per GB-month stored	Enable magnetic store writes for late-arriving data
Queries	Per GB of data scanned	Use time predicates to minimize data scanned
Scheduled queries	Per scheduled query execution + data scanned	Materialize aggregates to reduce ad-hoc scan costs

⚠️

Timestream query costs are based on data scanned, similar to Athena. A query without a time range predicate will scan the entire table and generate a large bill. Always include WHERE time BETWEEN ... AND ... in production queries.

🎯

Interview Focus Points

1What is Timestream and what types of workloads is it optimized for?
2Explain the memory store and magnetic store architecture. How does automatic tiering work?
3How does Timestream pricing work? What are the main cost levers?
4Compare Timestream to storing time-series data in DynamoDB or RDS. When does each make sense?
5How do you ingest IoT telemetry from AWS IoT Core into Timestream?
6What are Timestream scheduled queries and why would you use them?
7What are dimensions and measures in the Timestream data model?
8What time-series specific query functions does Timestream provide?