AWS Storage
Storage Gateway
Hybrid cloud storage bridge connecting on-premises environments to AWS
AWS Storage Gateway is a hybrid cloud storage service that connects on-premises environments to AWS storage, appearing to local applications as a standard file share, iSCSI volume, or virtual tape library while seamlessly backing data to S3, EBS snapshots, or S3 Glacier. It enables organizations to extend on-premises storage to the cloud without re-architecting applications, making it a key migration and hybrid connectivity tool. Storage Gateway bridges the gap between legacy on-premises workloads and cloud-native storage.
Storage Gateway Types - Choosing the Right Mode
Storage Gateway offers four distinct gateway types, each emulating a different storage protocol to integrate with existing on-premises applications without code changes.
| Gateway Type | Protocol | Backend Storage | Primary Use Case |
|---|---|---|---|
| S3 File Gateway | NFS v3/v4.1, SMB | S3 (objects) | Hybrid file storage, on-prem NFS/SMB backed by S3 |
| FSx File Gateway | SMB | FSx for Windows File Server | Local cache for FSx Windows shares, AD integration |
| Volume Gateway - Stored | iSCSI | S3 (snapshots) | Full volume on-prem with async backup to S3 |
| Volume Gateway - Cached | iSCSI | S3 (primary) + local cache | S3 as primary, frequently accessed data cached locally |
| Tape Gateway | iSCSI VTL | S3 (virtual tapes) and Glacier | Replace physical tape libraries with virtual tapes |
S3 File Gateway is the most commonly used type. It allows on-premises applications to write to an NFS or SMB share that is backed by S3, making files immediately accessible in S3 for cloud workloads like Lambda, EMR, and SageMaker.
Deployment Architecture and Local Cache
Storage Gateway runs as a virtual appliance (VMware, Hyper-V, or KVM) on-premises, or as an EC2 instance for cloud-based deployments. It maintains a local cache for low-latency access to frequently used data.
| Deployment Option | Platform | Use Case |
|---|---|---|
| Hardware appliance | Dedicated 1U rack hardware (Storage Gateway Hardware Appliance) | Sites without virtualization infrastructure |
| VMware ESXi | OVF template on VMware | Most common on-prem enterprise deployment |
| Microsoft Hyper-V | VHD template | Windows-centric on-prem environments |
| Linux KVM | QCOW2 disk image | Open-source virtualization environments |
| EC2 instance | AMI in AWS marketplace | Cloud-to-cloud gateway, disaster recovery testing |
The local cache is critical for S3 File Gateway performance. Reads from the cache are sub-millisecond; cache misses require fetching from S3 (tens of milliseconds). Size the cache based on your working set:
# Minimum cache disk sizes:
# S3 File Gateway: 150 GB minimum
# Volume Gateway: 20% of stored data minimum
# Tape Gateway: 150 GB minimum
# Monitor cache hit ratio via CloudWatch
aws cloudwatch get-metric-statistics \
--namespace StorageGateway \
--metric-name CacheHitPercent \
--dimensions Name=GatewayId,Value=sgw-xxxxxxxxx \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-01T24:00:00Z \
--period 3600 \
--statistics AverageIf the Storage Gateway local cache fills up, write performance degrades significantly as all writes must go directly to S3. Monitor CachePercentUsed and add cache capacity proactively when it exceeds 80%.
S3 File Gateway - Deep Dive
S3 File Gateway creates a 1:1 mapping between NFS/SMB file shares and S3 bucket prefixes. Each file becomes an S3 object, preserving the file path as the object key.
| Feature | Detail |
|---|---|
| Object naming | File path maps directly to S3 key: /share/dir/file.txt -> s3://bucket/prefix/dir/file.txt |
| File size limit | 5 TB per file (S3 object size limit) |
| Metadata | POSIX metadata stored as S3 user-defined metadata |
| Refresh cache | S3 changes made outside the gateway require a cache refresh to be visible |
| Notifications | S3 events triggered on write completion - integrates with Lambda, SQS, SNS |
| Access control | IAM role on gateway controls S3 access; SMB share supports AD or guest access |
# Refresh a specific path in the gateway cache when S3 objects changed outside the gateway
aws storagegateway refresh-cache \
--file-share-arn arn:aws:storagegateway:us-east-1:123456789:share/share-xxxxxxxxx \
--folder-list "/" \
--recursiveThe most common S3 File Gateway integration pattern: on-prem applications write files to the NFS share, S3 events trigger Lambda for processing, and results are written back to S3 or DynamoDB. The gateway acts as the bridge enabling legacy applications to feed cloud-native data pipelines.
Tape Gateway - Replacing Physical Tape Libraries
Tape Gateway presents a virtual tape library (VTL) via iSCSI, compatible with all major backup applications (Veeam, Veritas, Commvault, etc.) without requiring software changes. Virtual tapes are backed by S3 and can be archived to Glacier.
| Component | Physical Tape Equivalent | AWS Backend |
|---|---|---|
| Virtual Tape Library (VTL) | Physical tape library robot | Managed by Storage Gateway |
| Virtual Tape Drive | Physical tape drive | Up to 10 per gateway |
| Virtual Tape | Physical tape cartridge | S3 object (100 GiB to 5 TiB) |
| Virtual Tape Shelf (VTS) | Offsite tape vault | S3 Glacier or Glacier Deep Archive |
Tape Gateway eliminates physical tape management, courier costs, and off-site storage fees while maintaining compatibility with existing backup software investments. Retrieval from the Virtual Tape Shelf (Glacier) takes 3-5 hours for standard tier.
Interview Focus Points
- 1What are the four Storage Gateway types and when would you use each?
- 2How does S3 File Gateway handle the case where objects are modified directly in S3 without going through the gateway?
- 3A company wants to keep using their existing Veeam backup software but store backups in AWS instead of physical tape - what Storage Gateway type would you recommend?
- 4Explain the difference between Volume Gateway Stored mode and Cached mode.
- 5What happens when the Storage Gateway local cache becomes full?
- 6How would you use Storage Gateway as part of a data migration strategy from on-premises to S3?
- 7What are the networking requirements for deploying Storage Gateway on-premises?