Ace Cloud Interviews
Home/AWS Tutorial/Translate
馃

AWS AI & Machine Learning

Translate

Neural machine translation supporting 75+ language pairs

Amazon Translate is a fully managed neural machine translation service that delivers fast, accurate, and affordable language translation across 75+ language pairs. It uses advanced neural network models trained on large-scale multilingual datasets and supports real-time translation, batch document translation, and custom terminology. For cloud engineers, Translate is the building block for multilingual applications, document workflows, and customer support systems that need to operate globally.

Real-time Translation vs Batch Document Translation

Translate offers two modes: synchronous (TranslateText) for real-time use cases and asynchronous batch jobs (StartTextTranslationJob) for large document sets.

AspectReal-time (TranslateText)Batch (StartTextTranslationJob)
InputString of up to 10,000 bytesS3 folder with TXT, HTML, DOCX, PPTX, XLSX, or XLIFF files
OutputSynchronous JSON responseTranslated files in S3 output folder
Max size10 KB per requestUp to 1 GB per document (5 GB per job)
LatencyMillisecondsMinutes to hours depending on volume
FormattingPlain text or HTML tags preservedDocument structure (tables, headings) preserved for DOCX/PPTX
Use caseChat, search, API responsesContent localization, document translation at scale
bash
# Real-time translation
import boto3

translate = boto3.client('translate', region_name='us-east-1')

response = translate.translate_text(
    Text='The deployment pipeline failed due to missing environment variables.',
    SourceLanguageCode='en',
    TargetLanguageCode='es'
)
print(response['TranslatedText'])
# "La canalizaci贸n de implementaci贸n fall贸 debido a variables de entorno faltantes."

# Use 'auto' for automatic language detection (integrates with Comprehend)
response = translate.translate_text(
    Text='Merci pour votre aide.',
    SourceLanguageCode='auto',
    TargetLanguageCode='en'
)
print(response['SourceLanguageCode'])  # fr
print(response['TranslatedText'])  # Thank you for your help.

Custom Terminology - Controlling Brand and Technical Terms

Custom terminology lets you define how specific terms (brand names, product names, technical jargon) are always translated, overriding the default neural translation. This is critical for consistent brand voice and accurate technical documentation.

Terminology files are CSV or TMX (Translation Memory Exchange) format. You define source terms and their required translations into one or more target languages.

bash
# terminology.csv format
# en,es,fr,de
# AWS Lambda,AWS Lambda,AWS Lambda,AWS Lambda
# Auto Scaling,Auto Scaling,Auto Scaling,Auto Scaling
# "blue/green deployment","despliegue azul/verde","d茅ploiement bleu/vert","Blau/Gr眉n-Deployment"

# Upload custom terminology
import boto3

translate = boto3.client('translate')

with open('terminology.csv', 'rb') as f:
    translate.import_terminology(
        Name='aws-technical-terms',
        MergeStrategy='OVERWRITE',
        TerminologyData={'File': f.read(), 'Format': 'CSV'}
    )

# Use the terminology in a translation
response = translate.translate_text(
    Text='Set up a blue/green deployment using Auto Scaling.',
    SourceLanguageCode='en',
    TargetLanguageCode='es',
    TerminologyNames=['aws-technical-terms']
)
馃挕

Custom terminology works for exact string matches only. It does not handle morphological variations (plural forms, verb conjugations). For highly inflected languages, you may need entries for multiple word forms.

Parallel Data - Customizing Translation Style

Active Custom Translation (ACT) uses your existing human-translated document pairs (parallel data) to adapt the neural translation model to your specific domain and style. This is more powerful than custom terminology - it affects style, tone, and domain accuracy across all sentences.

FeatureCustom TerminologyParallel Data (ACT)
What it controlsSpecific word or phrase translationsOverall style, tone, and domain adaptation
Training requiredNo - just upload a CSVYes - creates a Custom Translation job using parallel data
Input formatCSV or TMX with term pairsTMX file with parallel sentence pairs
Minimum data1 term pair1000+ sentence pairs recommended
CostFree to store, charged at translation timeStorage per GB + training time

Translate Pricing

FeaturePricingNotes
Real-time translation$15.00 per million charactersFree tier: 2M characters/month for 12 months
Batch translation$15.00 per million charactersSame rate as real-time
Custom terminologyFree to storeCharged only at translation time (same character rate)
Active Custom Translation training$60.00 per million characters of training dataOne-time cost per training run
Active Custom Translation inference$60.00 per million characters4x premium over standard for customized translations
鈿狅笍

Active Custom Translation inference is 4x more expensive than standard. Measure quality improvement before committing to ACT for all traffic - for many use cases, custom terminology alone achieves sufficient consistency.

馃幆

Interview Focus Points

  • 1What is the difference between custom terminology and Active Custom Translation in Amazon Translate?
  • 2How would you build a real-time multilingual customer support chat using Translate and other AWS services?
  • 3What document formats does batch translation support and how is document formatting preserved?
  • 4How does automatic language detection work in Translate and which service powers it?
  • 5When would you choose Amazon Translate vs a third-party translation API like Google Translate?
  • 6What are the cost implications of using Active Custom Translation at scale?