AWS Monitoring & Management
Systems Manager
Operational hub for patching, configuring, and managing EC2 and on-premises servers
AWS Systems Manager (SSM) is the operational management hub for AWS infrastructure, providing a unified interface to patch, configure, run commands, and manage EC2 instances and on-premises servers without SSH or RDP. It eliminates the need for bastion hosts and enables compliance-driven automation at scale across thousands of instances.
SSM Agent, IAM Roles, and Managed Instance Registration
Every capability in Systems Manager depends on the SSM Agent running on the target instance and the instance having an IAM instance profile with the AmazonSSMManagedInstanceCore policy. The agent establishes an outbound HTTPS connection to the SSM service endpoints - no inbound ports need to be open.
| Requirement | EC2 Instances | On-Premises Servers |
|---|---|---|
| Agent | Pre-installed on Amazon Linux 2, Windows Server AMIs; manual install for others | Manual install required |
| IAM | EC2 instance profile with SSM permissions | Hybrid activation (creates managed instance ID) |
| Network | SSM, EC2Messages, SSMMessages endpoints (public or VPC endpoints) | Same - outbound HTTPS to endpoints |
| Operating Systems | Linux, Windows, macOS | Linux, Windows |
VPC endpoints for SSM (com.amazonaws.region.ssm, com.amazonaws.region.ec2messages, com.amazonaws.region.ssmmessages) are required if your instances are in private subnets with no NAT Gateway. This is a common interview question about how Session Manager works in locked-down environments.
# Verify SSM agent status on Linux
sudo systemctl status amazon-ssm-agent
# Check if instance is managed by SSM
aws ssm describe-instance-information \
--filters "Key=InstanceIds,Values=i-0123456789abcdef0"
# Register an on-premises server with hybrid activation
aws ssm create-activation \
--default-instance-name "on-prem-web-01" \
--iam-role "AmazonEC2RunCommandRoleForManagedInstances" \
--registration-limit 1 \
--region us-east-1Session Manager vs Run Command - Interactive vs Batch Operations
Session Manager and Run Command are the two most commonly used SSM features. Session Manager provides an interactive shell (or port forwarding) without SSH keys or open ports. Run Command executes scripts or documents across fleets of instances.
| Feature | Session Manager | Run Command |
|---|---|---|
| Interaction | Interactive shell session | One-shot script/command execution |
| Use case | Debugging, troubleshooting, port forwarding | Fleet-wide patching, config changes, app deployments |
| Output | Session logs to S3 or CloudWatch Logs | Command output per instance to S3 or console |
| Targeting | One instance at a time | Tags, resource groups, instance IDs - thousands at once |
| Concurrency | N/A | Configurable rate control (MaxConcurrency, MaxErrors) |
| Audit trail | Session history + full session log | Command invocation history |
Session Manager port forwarding is a powerful but underused feature. It tunnels any TCP port from a private instance to your local machine through the SSM channel - useful for accessing RDS, Elasticsearch, or internal web apps from your laptop without a VPN.
# Start a shell session (no SSH, no bastion host needed)
aws ssm start-session --target i-0123456789abcdef0
# Port forward RDS port to localhost:5432
aws ssm start-session \
--target i-0123456789abcdef0 \
--document-name AWS-StartPortForwardingSessionToRemoteHost \
--parameters "host=my-db.cluster-xyz.us-east-1.rds.amazonaws.com,portNumber=5432,localPortNumber=5432"
# Run command across all instances tagged env=prod
aws ssm send-command \
--targets "Key=tag:env,Values=prod" \
--document-name "AWS-RunShellScript" \
--parameters "commands=['systemctl restart nginx']" \
--max-concurrency "20%" \
--max-errors "5%"Patch Manager, Patch Baselines, and Maintenance Windows
Patch Manager automates the process of patching managed instances with security-related updates. It uses Patch Baselines to define which patches are approved for installation and Maintenance Windows to define when patching occurs.
| Component | What It Does |
|---|---|
| Patch Baseline | Defines approved patches by severity, classification, CVE IDs, or individual patches. Default baselines exist per OS; you create custom ones for stricter control. |
| Patch Group | Tag-based grouping of instances (tag: Patch Group = production). Associates instances with a specific baseline. |
| Maintenance Window | Scheduled time window with max concurrency and error thresholds. Targets resources (instances) and runs tasks (patch, run command, Lambda, Step Functions). |
| Scan vs Install | Scan reports compliance without installing. Install mode applies approved patches and can reboot instances. |
The default AWS-managed patch baselines auto-approve patches after 7 days for critical/security patches. For production databases or stateful workloads, create a custom baseline with a longer delay and explicit patch approval to avoid surprise reboots.
Patch compliance state is reported per instance and visible in the Systems Manager Compliance dashboard and in AWS Config. You can create CloudWatch alarms on non-compliant instance counts and integrate this with your security reporting.
Parameter Store - Configuration and Secret Management
Parameter Store provides secure, hierarchical storage for configuration data and secrets. It integrates natively with IAM for access control, CloudTrail for audit, and most AWS services (Lambda, ECS, EC2 UserData, CodeBuild) for secret injection.
| Standard Parameter | Advanced Parameter | |
|---|---|---|
| Max value size | 4 KB | 8 KB |
| Cost | Free | $0.05/parameter/month |
| Parameter policies | No | Yes (expiration, notification) |
| Higher throughput | 40 TPS | 100 TPS |
| Parameter Store (SecureString) | Secrets Manager | |
|---|---|---|
| Cost | Free (standard params) | $0.40/secret/month |
| Automatic rotation | No (manual Lambda possible) | Yes - built-in for RDS, Redshift, DocumentDB |
| Cross-account access | Possible with CMK | Native cross-account support |
| Versioning | Labeled versions | Full version staging (AWSCURRENT, AWSPENDING) |
| Best for | App config, non-rotating secrets | Credentials that must rotate automatically |
# Store a secret string
aws ssm put-parameter \
--name "/myapp/prod/db-password" \
--value "supersecretpassword" \
--type SecureString \
--key-id "alias/myapp-key"
# Retrieve all parameters under a path
aws ssm get-parameters-by-path \
--path "/myapp/prod/" \
--with-decryption \
--recursiveUse a hierarchical naming convention like /app-name/environment/parameter-name. This lets you use path-based IAM policies to grant a Lambda function access to all parameters under /myapp/prod/ without listing each one.
Automation Documents (Runbooks) and State Manager
SSM Documents (formerly called SSM Documents or just "documents") are the automation primitives in Systems Manager. There are several document types for different use cases:
| Document Type | Use Case | Example |
|---|---|---|
| Command | Run scripts on instances | AWS-RunShellScript, AWS-RunPowerShellScript |
| Automation | Multi-step orchestration of AWS API calls + instance commands | AMI creation, EC2 restart with approval step, cross-account operations |
| Session | Define Session Manager preferences (logging, shell prefs) | AWS-StartSSHSession, AWS-StartPortForwardingSession |
| Policy | Enforce configuration compliance (used by State Manager) | Enforce CloudWatch agent config, install software |
| Package | Distribute and install software packages via Distributor | Install custom agents, security tools |
State Manager ensures instances maintain a defined configuration over time. Associations link a document to target instances and run on a schedule. If an instance drifts from the desired configuration (e.g., CloudWatch agent gets uninstalled), the next association run will re-apply it.
Automation documents support approval steps, multi-account execution, and rate control. They're the right tool for operational runbooks that used to be manual SOPs - things like "rotate RDS credentials", "create golden AMI", or "respond to a security finding".
Interview Focus Points
- 1How does Session Manager work and why is it preferred over SSH + bastion hosts in modern AWS environments?
- 2What three things does an EC2 instance need to be managed by Systems Manager?
- 3What VPC endpoints are required for SSM to work in a private subnet with no NAT Gateway?
- 4Explain the difference between Parameter Store and Secrets Manager - when would you use each?
- 5How does Patch Manager handle patching without causing downtime - what are Maintenance Windows?
- 6What is an SSM Association (State Manager) and how does it differ from Run Command?
- 7How would you use SSM Automation to create a self-healing runbook that restarts a service when a CloudWatch alarm fires?
- 8How do you audit who ran Session Manager sessions and what commands were executed?
- 9How would you use SSM Session Manager port forwarding to access an RDS database in a private subnet?