Ace Cloud Interviews
Home/AWS Tutorial/Storage Gateway
🗄️

AWS Storage

Storage Gateway

Hybrid cloud storage bridge connecting on-premises environments to AWS

AWS Storage Gateway is a hybrid cloud storage service that connects on-premises environments to AWS storage, appearing to local applications as a standard file share, iSCSI volume, or virtual tape library while seamlessly backing data to S3, EBS snapshots, or S3 Glacier. It enables organizations to extend on-premises storage to the cloud without re-architecting applications, making it a key migration and hybrid connectivity tool. Storage Gateway bridges the gap between legacy on-premises workloads and cloud-native storage.

Storage Gateway Types - Choosing the Right Mode

Storage Gateway offers four distinct gateway types, each emulating a different storage protocol to integrate with existing on-premises applications without code changes.

Gateway TypeProtocolBackend StoragePrimary Use Case
S3 File GatewayNFS v3/v4.1, SMBS3 (objects)Hybrid file storage, on-prem NFS/SMB backed by S3
FSx File GatewaySMBFSx for Windows File ServerLocal cache for FSx Windows shares, AD integration
Volume Gateway - StorediSCSIS3 (snapshots)Full volume on-prem with async backup to S3
Volume Gateway - CachediSCSIS3 (primary) + local cacheS3 as primary, frequently accessed data cached locally
Tape GatewayiSCSI VTLS3 (virtual tapes) and GlacierReplace physical tape libraries with virtual tapes
💡

S3 File Gateway is the most commonly used type. It allows on-premises applications to write to an NFS or SMB share that is backed by S3, making files immediately accessible in S3 for cloud workloads like Lambda, EMR, and SageMaker.

Deployment Architecture and Local Cache

Storage Gateway runs as a virtual appliance (VMware, Hyper-V, or KVM) on-premises, or as an EC2 instance for cloud-based deployments. It maintains a local cache for low-latency access to frequently used data.

Deployment OptionPlatformUse Case
Hardware applianceDedicated 1U rack hardware (Storage Gateway Hardware Appliance)Sites without virtualization infrastructure
VMware ESXiOVF template on VMwareMost common on-prem enterprise deployment
Microsoft Hyper-VVHD templateWindows-centric on-prem environments
Linux KVMQCOW2 disk imageOpen-source virtualization environments
EC2 instanceAMI in AWS marketplaceCloud-to-cloud gateway, disaster recovery testing

The local cache is critical for S3 File Gateway performance. Reads from the cache are sub-millisecond; cache misses require fetching from S3 (tens of milliseconds). Size the cache based on your working set:

bash
# Minimum cache disk sizes:
# S3 File Gateway: 150 GB minimum
# Volume Gateway: 20% of stored data minimum
# Tape Gateway: 150 GB minimum

# Monitor cache hit ratio via CloudWatch
aws cloudwatch get-metric-statistics \
  --namespace StorageGateway \
  --metric-name CacheHitPercent \
  --dimensions Name=GatewayId,Value=sgw-xxxxxxxxx \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-01T24:00:00Z \
  --period 3600 \
  --statistics Average
⚠️

If the Storage Gateway local cache fills up, write performance degrades significantly as all writes must go directly to S3. Monitor CachePercentUsed and add cache capacity proactively when it exceeds 80%.

S3 File Gateway - Deep Dive

S3 File Gateway creates a 1:1 mapping between NFS/SMB file shares and S3 bucket prefixes. Each file becomes an S3 object, preserving the file path as the object key.

FeatureDetail
Object namingFile path maps directly to S3 key: /share/dir/file.txt -> s3://bucket/prefix/dir/file.txt
File size limit5 TB per file (S3 object size limit)
MetadataPOSIX metadata stored as S3 user-defined metadata
Refresh cacheS3 changes made outside the gateway require a cache refresh to be visible
NotificationsS3 events triggered on write completion - integrates with Lambda, SQS, SNS
Access controlIAM role on gateway controls S3 access; SMB share supports AD or guest access
bash
# Refresh a specific path in the gateway cache when S3 objects changed outside the gateway
aws storagegateway refresh-cache \
  --file-share-arn arn:aws:storagegateway:us-east-1:123456789:share/share-xxxxxxxxx \
  --folder-list "/" \
  --recursive
💡

The most common S3 File Gateway integration pattern: on-prem applications write files to the NFS share, S3 events trigger Lambda for processing, and results are written back to S3 or DynamoDB. The gateway acts as the bridge enabling legacy applications to feed cloud-native data pipelines.

Tape Gateway - Replacing Physical Tape Libraries

Tape Gateway presents a virtual tape library (VTL) via iSCSI, compatible with all major backup applications (Veeam, Veritas, Commvault, etc.) without requiring software changes. Virtual tapes are backed by S3 and can be archived to Glacier.

ComponentPhysical Tape EquivalentAWS Backend
Virtual Tape Library (VTL)Physical tape library robotManaged by Storage Gateway
Virtual Tape DrivePhysical tape driveUp to 10 per gateway
Virtual TapePhysical tape cartridgeS3 object (100 GiB to 5 TiB)
Virtual Tape Shelf (VTS)Offsite tape vaultS3 Glacier or Glacier Deep Archive
💡

Tape Gateway eliminates physical tape management, courier costs, and off-site storage fees while maintaining compatibility with existing backup software investments. Retrieval from the Virtual Tape Shelf (Glacier) takes 3-5 hours for standard tier.

🎯

Interview Focus Points

  • 1What are the four Storage Gateway types and when would you use each?
  • 2How does S3 File Gateway handle the case where objects are modified directly in S3 without going through the gateway?
  • 3A company wants to keep using their existing Veeam backup software but store backups in AWS instead of physical tape - what Storage Gateway type would you recommend?
  • 4Explain the difference between Volume Gateway Stored mode and Cached mode.
  • 5What happens when the Storage Gateway local cache becomes full?
  • 6How would you use Storage Gateway as part of a data migration strategy from on-premises to S3?
  • 7What are the networking requirements for deploying Storage Gateway on-premises?