AWS AI & Machine Learning
SageMaker
End-to-end ML platform to build, train, tune, and deploy models at scale
Amazon SageMaker is a fully managed end-to-end machine learning platform that covers every stage of the ML lifecycle - from data labeling and preparation through model training, hyperparameter tuning, and deployment at scale. It removes the undifferentiated heavy lifting of infrastructure management so data scientists and ML engineers can focus on building better models. For cloud engineers, SageMaker is the go-to service whenever an organization needs to operationalize ML workflows in production.
How SageMaker Works - The ML Lifecycle
SageMaker provides a collection of tightly integrated tools that cover each stage of ML development. Understanding the pipeline is essential for both interviews and real-world deployments.
| Stage | SageMaker Tool | What It Does |
|---|---|---|
| Data Labeling | Ground Truth | Human-in-the-loop labeling with active learning to reduce labeling cost |
| Data Prep | Data Wrangler | Visual data transformation and feature engineering without code |
| Feature Storage | Feature Store | Centralized repository for ML features, online and offline access |
| Experiment Tracking | Experiments | Track runs, parameters, metrics, and artifacts across training jobs |
| Training | Training Jobs | Managed distributed training on any instance type |
| Tuning | Automatic Model Tuning | Bayesian and random hyperparameter optimization |
| Model Registry | Model Registry | Version, approve, and audit models before deployment |
| Deployment | Endpoints / Batch Transform | Real-time inference or batch scoring at scale |
| Monitoring | Model Monitor | Detect data drift and model quality degradation in production |
| Pipelines | Pipelines | CI/CD for ML - DAG-based workflow orchestration |
SageMaker Studio is the web-based IDE that unifies all these tools into a single interface. Most teams start there rather than using the SDK or console directly.
Training Instances, Built-in Algorithms, and Custom Containers
SageMaker training jobs run your code inside Docker containers on managed infrastructure. You can use built-in algorithms, AWS-optimized framework containers (TensorFlow, PyTorch, XGBoost), or bring your own container (BYOC).
| Option | Use Case | Flexibility | Effort |
|---|---|---|---|
| Built-in Algorithms | XGBoost, Linear Learner, k-means, etc. | Low - fixed hyperparameters | Minimal - just pass data |
| Framework Containers | TensorFlow, PyTorch, MXNet, Scikit-learn | High - your script, AWS infra | Medium - write training script |
| BYOC | Custom frameworks, proprietary code | Full control | High - build and push container |
For distributed training, SageMaker supports data parallelism (SageMaker Distributed Data Parallel library) and model parallelism (SageMaker Distributed Model Parallel) for large models that do not fit on a single GPU.
# Launch a training job using the Python SDK
from sagemaker.pytorch import PyTorch
estimator = PyTorch(
entry_point='train.py',
role='arn:aws:iam::123456789012:role/SageMakerRole',
instance_type='ml.p3.2xlarge',
instance_count=2,
framework_version='2.1.0',
py_version='py310',
hyperparameters={'epochs': 10, 'batch-size': 64},
distribution={'smdistributed': {'dataparallel': {'enabled': True}}}
)
estimator.fit({'training': 's3://my-bucket/data/train'})
Spot instances (managed spot training) can reduce training costs by up to 90%. SageMaker automatically handles checkpointing and resumption on spot interruption.
Inference Options - Real-time, Serverless, Batch, and Async
SageMaker offers four distinct inference modes depending on your latency, throughput, and cost requirements.
| Mode | Latency | Scale to Zero | Best For | Max Payload |
|---|---|---|---|---|
| Real-time Endpoint | Milliseconds | No (always on) | Interactive apps, sub-second SLA | 6 MB |
| Serverless Inference | 10-100ms cold start | Yes | Infrequent or unpredictable traffic | 4 MB |
| Asynchronous Inference | Minutes (queued) | Yes (with auto-scaling) | Large payloads, long processing time | 1 GB |
| Batch Transform | Job-based | Yes (ephemeral) | Offline scoring of large datasets | Unlimited |
Real-time endpoints do NOT auto-scale to zero. If you deploy an endpoint and forget about it, you pay for idle instances 24/7. Use serverless inference or shut down endpoints when not in use for dev/test.
Multi-model endpoints (MME) and multi-container endpoints allow you to host multiple models on a single endpoint, significantly reducing costs for many low-traffic models.
MLOps with SageMaker Pipelines and Model Registry
SageMaker Pipelines is a purpose-built CI/CD service for ML. A pipeline is a DAG of steps - each step maps to a SageMaker job (processing, training, tuning, evaluation, condition, model creation, etc.).
# Minimal pipeline definition
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import TrainingStep, ProcessingStep
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.condition_step import ConditionStep
# Define steps (training_step, eval_step defined elsewhere)
condition = ConditionGreaterThanOrEqualTo(
left=JsonGet(step_name=eval_step.name, property_file=eval_report, json_path='metrics.accuracy.value'),
right=0.80
)
register_step = ConditionStep(
name='CheckAccuracy',
conditions=[condition],
if_steps=[model_register_step],
else_steps=[]
)
pipeline = Pipeline(
name='MyMLPipeline',
steps=[training_step, eval_step, register_step]
)
pipeline.upsert(role_arn=role)
pipeline.start()
The Model Registry stores approved model versions with metadata, metrics, and approval status. Models move from PendingManualApproval to Approved before they can be deployed to production endpoints.
Integrate SageMaker Pipelines with GitHub Actions or CodePipeline to trigger ML retraining pipelines on code commits or data drift alerts from Model Monitor.
SageMaker Pricing Model
SageMaker pricing has many dimensions - each component is billed separately. This surprises teams used to all-in-one pricing.
| Component | Billing Unit | Cost Optimization |
|---|---|---|
| Studio notebooks | Per instance-hour | Shut down kernel when idle; use lifecycle configs to auto-stop |
| Training jobs | Per instance-second | Use Spot instances (up to 90% savings) with checkpointing |
| Hyperparameter tuning | Per training job launched | Limit max parallel jobs; use Bayesian strategy over random |
| Real-time endpoints | Per instance-hour (always on) | Use multi-model endpoints; enable auto-scaling with scale-to-zero |
| Serverless endpoints | Per GB-second + per request | Best for < 50 invocations/day per model |
| Batch Transform | Per instance-second of job duration | Parallelize with MaxConcurrentTransforms |
| Feature Store | Per write/read unit + storage GB | Only put high-reuse features in online store |
| Ground Truth labeling | Per labeled object | Enable auto-labeling to reduce human review to ~10% |
SageMaker does not include free data transfer within the same region. Moving large training datasets from S3 into training instances can add meaningful cost at scale.
Interview Focus Points
- 1Explain the difference between SageMaker real-time, serverless, async, and batch transform inference modes. When would you use each?
- 2How does managed spot training work in SageMaker and what do you need to implement in your training script to use it safely?
- 3What is SageMaker Model Monitor and how do you detect data drift in production?
- 4Walk me through how you would set up an MLOps pipeline using SageMaker Pipelines, Model Registry, and CodePipeline.
- 5What is a multi-model endpoint and when would you use it instead of separate endpoints per model?
- 6How does SageMaker Feature Store differ from just storing features in S3 or DynamoDB?
- 7What are the IAM permissions required for a SageMaker training job to access S3 data and write artifacts?
- 8How would you reduce SageMaker costs for a team running frequent experiments in development?
- 9Explain SageMaker distributed training - what is data parallelism vs model parallelism?