AWS AI & Machine Learning
Translate
Neural machine translation supporting 75+ language pairs
Amazon Translate is a fully managed neural machine translation service that delivers fast, accurate, and affordable language translation across 75+ language pairs. It uses advanced neural network models trained on large-scale multilingual datasets and supports real-time translation, batch document translation, and custom terminology. For cloud engineers, Translate is the building block for multilingual applications, document workflows, and customer support systems that need to operate globally.
Real-time Translation vs Batch Document Translation
Translate offers two modes: synchronous (TranslateText) for real-time use cases and asynchronous batch jobs (StartTextTranslationJob) for large document sets.
| Aspect | Real-time (TranslateText) | Batch (StartTextTranslationJob) |
|---|---|---|
| Input | String of up to 10,000 bytes | S3 folder with TXT, HTML, DOCX, PPTX, XLSX, or XLIFF files |
| Output | Synchronous JSON response | Translated files in S3 output folder |
| Max size | 10 KB per request | Up to 1 GB per document (5 GB per job) |
| Latency | Milliseconds | Minutes to hours depending on volume |
| Formatting | Plain text or HTML tags preserved | Document structure (tables, headings) preserved for DOCX/PPTX |
| Use case | Chat, search, API responses | Content localization, document translation at scale |
# Real-time translation
import boto3
translate = boto3.client('translate', region_name='us-east-1')
response = translate.translate_text(
Text='The deployment pipeline failed due to missing environment variables.',
SourceLanguageCode='en',
TargetLanguageCode='es'
)
print(response['TranslatedText'])
# "La canalizaci贸n de implementaci贸n fall贸 debido a variables de entorno faltantes."
# Use 'auto' for automatic language detection (integrates with Comprehend)
response = translate.translate_text(
Text='Merci pour votre aide.',
SourceLanguageCode='auto',
TargetLanguageCode='en'
)
print(response['SourceLanguageCode']) # fr
print(response['TranslatedText']) # Thank you for your help.
Custom Terminology - Controlling Brand and Technical Terms
Custom terminology lets you define how specific terms (brand names, product names, technical jargon) are always translated, overriding the default neural translation. This is critical for consistent brand voice and accurate technical documentation.
Terminology files are CSV or TMX (Translation Memory Exchange) format. You define source terms and their required translations into one or more target languages.
# terminology.csv format
# en,es,fr,de
# AWS Lambda,AWS Lambda,AWS Lambda,AWS Lambda
# Auto Scaling,Auto Scaling,Auto Scaling,Auto Scaling
# "blue/green deployment","despliegue azul/verde","d茅ploiement bleu/vert","Blau/Gr眉n-Deployment"
# Upload custom terminology
import boto3
translate = boto3.client('translate')
with open('terminology.csv', 'rb') as f:
translate.import_terminology(
Name='aws-technical-terms',
MergeStrategy='OVERWRITE',
TerminologyData={'File': f.read(), 'Format': 'CSV'}
)
# Use the terminology in a translation
response = translate.translate_text(
Text='Set up a blue/green deployment using Auto Scaling.',
SourceLanguageCode='en',
TargetLanguageCode='es',
TerminologyNames=['aws-technical-terms']
)
Custom terminology works for exact string matches only. It does not handle morphological variations (plural forms, verb conjugations). For highly inflected languages, you may need entries for multiple word forms.
Parallel Data - Customizing Translation Style
Active Custom Translation (ACT) uses your existing human-translated document pairs (parallel data) to adapt the neural translation model to your specific domain and style. This is more powerful than custom terminology - it affects style, tone, and domain accuracy across all sentences.
| Feature | Custom Terminology | Parallel Data (ACT) |
|---|---|---|
| What it controls | Specific word or phrase translations | Overall style, tone, and domain adaptation |
| Training required | No - just upload a CSV | Yes - creates a Custom Translation job using parallel data |
| Input format | CSV or TMX with term pairs | TMX file with parallel sentence pairs |
| Minimum data | 1 term pair | 1000+ sentence pairs recommended |
| Cost | Free to store, charged at translation time | Storage per GB + training time |
Translate Pricing
| Feature | Pricing | Notes |
|---|---|---|
| Real-time translation | $15.00 per million characters | Free tier: 2M characters/month for 12 months |
| Batch translation | $15.00 per million characters | Same rate as real-time |
| Custom terminology | Free to store | Charged only at translation time (same character rate) |
| Active Custom Translation training | $60.00 per million characters of training data | One-time cost per training run |
| Active Custom Translation inference | $60.00 per million characters | 4x premium over standard for customized translations |
Active Custom Translation inference is 4x more expensive than standard. Measure quality improvement before committing to ACT for all traffic - for many use cases, custom terminology alone achieves sufficient consistency.
Interview Focus Points
- 1What is the difference between custom terminology and Active Custom Translation in Amazon Translate?
- 2How would you build a real-time multilingual customer support chat using Translate and other AWS services?
- 3What document formats does batch translation support and how is document formatting preserved?
- 4How does automatic language detection work in Translate and which service powers it?
- 5When would you choose Amazon Translate vs a third-party translation API like Google Translate?
- 6What are the cost implications of using Active Custom Translation at scale?