Ace Cloud Interviews
🚚

AWS Migration & Transfer

DataSync

Automate high-speed data transfers between on-premises storage and AWS

AWS DataSync is a managed data transfer service that automates moving large amounts of data between on-premises storage (NFS, SMB, HDFS, object storage) and AWS services like S3, EFS, and FSx at speeds up to 10 Gbps per agent. It handles checksums, retry logic, and bandwidth scheduling automatically, removing the need to write custom scripts. DataSync is the preferred tool when you need fast, reliable, auditable data transfers as part of a migration or ongoing synchronization workflow.

DataSync Architecture and Components

DataSync has three main components: the agent (runs on-premises or in another cloud), locations (source and destination endpoints), and tasks (define what to transfer and how).

ComponentWhere It RunsResponsibility
DataSync AgentOn-premises VM (VMware, KVM, Hyper-V) or EC2Reads source data, compresses, encrypts, sends to AWS
Source LocationDefined in AWS console/APIPoints to NFS share, SMB share, S3, HDFS, or object storage
Destination LocationDefined in AWS console/APIPoints to S3 bucket, EFS file system, FSx share
TaskRuns on agent via DataSync serviceOrchestrates the transfer - filters, bandwidth, schedule, verify

The agent connects back to the DataSync service endpoint over TLS. All data is encrypted in transit and you can optionally encrypt at rest using KMS keys on the destination.

💡

One DataSync agent can run multiple tasks sequentially, but only one task at a time per agent. For parallel transfers across many source directories, deploy multiple agents and create separate tasks for each.

DataSync vs Transfer Alternatives

Several AWS services can move data to S3. The right choice depends on volume, frequency, source type, and whether you need transformation.

ServiceBest ForThroughputCost Model
DataSyncNFS/SMB/HDFS migrations, large file sets, ongoing syncUp to 10 Gbps/agentPer GB transferred
S3 Transfer AccelerationClient uploads to S3 over public internetVaries by locationPer GB premium over standard
Storage GatewayHybrid access - on-prem apps accessing S3 via NFS/SMB long termModeratePer GB + gateway instance
Snowball EdgeOffline bulk transfer, limited bandwidth, > 10TBPhysical devicePer device + shipping
Direct ConnectDedicated network link, not a transfer service itselfUp to 100 GbpsPer port-hour + data transfer
⚠️

DataSync is not a sync tool for small files with frequent changes - the task overhead makes it inefficient for directories with millions of tiny files changing constantly. For that pattern, consider S3 replication or custom rsync pipelines.

Task Options and Transfer Behavior

DataSync task options control how files are compared, which files are included, and how the transfer behaves on conflict.

OptionValuesRecommendation
Verify modeONLY_FILES_TRANSFERRED, ALL, NONEONLY_FILES_TRANSFERRED for speed; ALL for compliance migrations
Overwrite modeALWAYS, NEVERALWAYS for initial migration; NEVER for archive sync where destination is authoritative
Preserve deleted filesPRESERVE, REMOVEPRESERVE during migration testing; REMOVE for ongoing sync
POSIX permissionsPRESERVE, NONEPRESERVE when moving to EFS; NONE when moving to S3 (S3 has no POSIX permissions)
Bandwidth limitMB/s value or no limitSet a limit during business hours to avoid saturating your WAN link
bash
# Create a DataSync task via CLI
aws datasync create-task \
  --source-location-arn arn:aws:datasync:us-east-1:123456789012:location/loc-SOURCE \
  --destination-location-arn arn:aws:datasync:us-east-1:123456789012:location/loc-DEST \
  --name "nightly-sync-media-files" \
  --schedule "ScheduleExpression=cron(0 2 * * ? *)" \
  --options '{"VerifyMode":"ONLY_FILES_TRANSFERRED","OverwriteMode":"ALWAYS","PreserveDeletedFiles":"REMOVE","TransferMode":"CHANGED"}'

Monitoring DataSync Transfers

DataSync publishes CloudWatch metrics per task execution and can send detailed logs to CloudWatch Logs for per-file transfer status.

CloudWatch MetricWhat It Measures
BytesTransferredTotal bytes sent across the network (compressed)
BytesWrittenTotal bytes written to the destination (uncompressed)
FilesTransferredNumber of files successfully written
FilesVerificationFailedFiles where checksum comparison failed - investigate immediately
TaskExecutionDurationTotal time for the task execution in seconds
bash
# Check task execution status
aws datasync list-task-executions \
  --task-arn arn:aws:datasync:us-east-1:123456789012:task/task-EXAMPLE

# Get per-file error details from a failed execution
aws datasync describe-task-execution \
  --task-execution-arn arn:aws:datasync:us-east-1:123456789012:task/task-EXAMPLE/execution/exec-EXAMPLE
💡

Enable CloudWatch Logs on the task to get per-file transfer details. Without this, you only see aggregate metrics and cannot identify which specific files failed during a transfer.

DataSync Pricing

DataSync pricing is per GB of data copied. The agent itself is free - you pay for the EC2 or VM running it, plus standard AWS storage costs at the destination.

Transfer TypePrice (us-east-1)
Data transferred to S3, EFS, FSx$0.0125/GB
Data transferred between AWS storage services$0.0125/GB
Agent VM cost (if EC2)Standard EC2 pricing for instance type chosen
💡

DataSync compresses data in transit, so the billed GB may be less than the actual file size on disk. A 10TB file migration of typical business data often transfers 6-8TB after compression, saving 20-40% on transfer costs.

🎯

Interview Focus Points

  • 1How does DataSync differ from AWS Storage Gateway - when would you use each?
  • 2What is the role of the DataSync agent and where does it need to be deployed?
  • 3How would you migrate 500TB of NFS data from on-premises to S3 with DataSync while minimizing impact on production?
  • 4Explain how DataSync verifies data integrity during transfers.
  • 5How do you schedule recurring DataSync tasks and handle incremental transfers after the initial sync?
  • 6What bandwidth throttling options does DataSync provide and how do you configure them?
  • 7How would you troubleshoot a DataSync task that is reporting FilesVerificationFailed?
  • 8Can DataSync transfer data between two AWS regions? What are the cost implications?