AWS Developer Tools & CI/CD
X-Ray
Distributed tracing to analyze latency and debug microservices and serverless apps
AWS X-Ray is a distributed tracing service that helps you analyze and debug distributed applications, including microservices and serverless architectures. It collects data about requests as they travel through your application, maps service dependencies automatically, and helps you identify performance bottlenecks, errors, and throttling in complex multi-service systems.
Core Concepts: Traces, Segments, and Subsegments
X-Ray organizes telemetry data into a hierarchy:
| Concept | Description | Generated By |
|---|---|---|
| Trace | End-to-end request journey through all services | X-Ray SDK (trace ID header) |
| Segment | Work done by a single service for one request | X-Ray SDK in each service |
| Subsegment | Granular unit within a segment (DB call, HTTP request) | SDK auto-instrumentation or manual |
| Annotation | Key-value pair indexed for search/filtering | Developer adds to segment |
| Metadata | Non-indexed key-value data for debugging | Developer adds to segment |
| Service graph | Visual map of services and their connections | Generated from trace data |
X-Ray uses trace IDs in HTTP headers to correlate requests across services. The header name is X-Amzn-Trace-Id.
# X-Ray trace ID header example
X-Amzn-Trace-Id: Root=1-5e1b4d3e-fb1234567890abcdef012345;Parent=53995c3f42cd8ad8;Sampled=1
# Root: trace ID (timestamp + UUID)
# Parent: parent segment ID
# Sampled: 1 = send to X-Ray, 0 = do not sendIf a service calls a downstream service and does not propagate the X-Amzn-Trace-Id header, X-Ray creates a new root trace for the downstream call. The two traces will not be linked, and the service map will show a broken connection. Always propagate the header in HTTP clients.
SDK Instrumentation: Automatic and Manual
X-Ray SDKs are available for Java, Python, Go, Node.js, Ruby, and .NET. They provide automatic instrumentation for popular frameworks and manual instrumentation via the capture API.
# Python - instrument Flask app and AWS SDK calls
from aws_xray_sdk.core import xray_recorder, patch_all
from aws_xray_sdk.ext.flask.middleware import XRayMiddleware
app = Flask(__name__)
xray_recorder.configure(service='my-flask-app')
XRayMiddleware(app, xray_recorder)
patch_all() # auto-instrument boto3, requests, urllib3
# Add annotation (indexed, filterable)
@xray_recorder.capture('process_order')
def process_order(order_id):
xray_recorder.current_segment().put_annotation('order_id', order_id)
xray_recorder.current_segment().put_metadata('order_detail', order_data)
# ... business logic
# Node.js - Lambda instrumentation
const AWSXRay = require('aws-xray-sdk-core')
const AWS = AWSXRay.captureAWS(require('aws-sdk'))
// Now all AWS SDK calls are automatically traced| Integration | What It Auto-Instruments | SDK/Method |
|---|---|---|
| AWS SDK calls | All API calls to AWS services | patch_all() or captureAWS() |
| Outgoing HTTP | Calls to external APIs and services | patch requests/http.client |
| SQL databases | Queries to MySQL, PostgreSQL | patch sqlalchemy/pg8000 |
| Flask/Django | Incoming HTTP requests as segments | XRayMiddleware |
| Express.js | Incoming HTTP requests as segments | xray.express.openSegment() |
| Lambda | Function invocation as segment | Enable active tracing in function config |
Sampling Rules: Controlling Trace Volume
X-Ray samples requests to avoid tracing 100% of traffic (which would be expensive and noisy). Sampling rules determine which requests get traced.
| Rule Component | Description | Example |
|---|---|---|
| Fixed rate | Percentage of requests sampled after reservoir | 5% = 0.05 |
| Reservoir | Minimum traces per second guaranteed | 5 requests/second always traced |
| Priority | Lower number = higher priority when rules overlap | 1 = highest priority |
| Match criteria | Filter by service, URL path, method, host, user | Path=/api/health/* -> 0% (ignore health checks) |
# Custom sampling rule via CLI - trace all /api/checkout requests
aws xray create-sampling-rule --sampling-rule '{
"RuleName": "checkout-100pct",
"Priority": 1,
"FixedRate": 1.0,
"ReservoirSize": 50,
"ServiceName": "checkout-service",
"ServiceType": "*",
"Host": "*",
"HTTPMethod": "POST",
"URLPath": "/api/checkout",
"Version": 1
}'The default sampling rule traces the first request each second plus 5% of additional requests. For high-traffic services, the default rule may miss intermittent errors. Create higher-rate rules for critical paths (payments, authentication) and lower-rate rules for health check endpoints.
Service Map, Traces Console, and Analytics
The X-Ray console provides several views for analyzing application behavior:
| View | What It Shows | Use Case |
|---|---|---|
| Service Map | Visual graph of all services with latency/error metrics | Identify which service is causing errors |
| Traces | List of individual trace records with timeline | Debug a specific slow or failed request |
| Trace Analytics | Aggregate statistics, percentiles, histograms | Latency trends, error rates over time |
| Insights | ML-detected anomalies and root cause analysis | Proactive issue detection |
| Groups | Filtered subsets of traces with their own service maps | Isolate traces for one customer or feature |
Trace filter expressions let you search for specific traces:
# X-Ray filter expressions
# All traces with errors
fault = true
# Slow requests over 2 seconds
duration > 2
# Traces touching a specific service
service("users-service")
# Filter by annotation
annotation.order_id = "ORD-12345"
# Combine conditions
service("payment-service") AND fault = true AND duration > 1Pricing, Limits, and Integration with CloudWatch
| Dimension | Free Tier | Paid Price |
|---|---|---|
| Traces recorded | 100,000 traces/month | $0.50 per 1 million traces |
| Traces retrieved (console/API) | 1 million per month | $0.50 per 1 million traces |
| Traces scanned (analytics) | 1 million per month | $0.50 per 1 million traces scanned |
X-Ray integrates with CloudWatch ServiceLens, which combines X-Ray traces, CloudWatch metrics, and CloudWatch Logs into a unified observability view. Enabling X-Ray active tracing on Lambda and enabling CloudWatch Container Insights on ECS/EKS automatically populates the ServiceLens service map.
X-Ray trace data is retained for 30 days. After 30 days, traces are automatically deleted. If you need longer retention for compliance or trend analysis, export trace summaries to S3 via X-Ray APIs or use CloudWatch Logs Insights on your application logs instead.
Interview Focus Points
- 1What is a trace vs a segment vs a subsegment in X-Ray?
- 2How does X-Ray correlate requests across multiple services?
- 3What is sampling in X-Ray and why is it important?
- 4How do you add custom annotations and metadata to X-Ray traces?
- 5How does X-Ray active tracing work with Lambda functions?
- 6What is a service map and what information does it show?
- 7How would you debug a latency issue using X-Ray?
- 8What is the X-Ray daemon and when is it needed?
- 9How does X-Ray integrate with CloudWatch ServiceLens?
- 10What are the data retention limits for X-Ray traces?