AWS AI & Machine Learning
Bedrock
Access foundation models from Anthropic, Meta, Mistral, and others via a single API
Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI companies - including Anthropic, Meta, Mistral, Cohere, and Amazon - through a single unified API. It lets you build generative AI applications without managing any model infrastructure, and includes features like knowledge bases, agents, guardrails, and model evaluation. For cloud engineers, Bedrock is the fastest path to production-grade generative AI on AWS.
Foundation Models Available on Bedrock
Bedrock provides access to models from multiple providers without requiring separate API keys or accounts for each. All models are accessed through the same AWS SDK and IAM authentication.
| Provider | Model Family | Strengths | Common Use Cases |
|---|---|---|---|
| Anthropic | Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 | Reasoning, coding, long context (200k tokens) | Analysis, summarization, code generation, agents |
| Amazon | Titan Text, Titan Embeddings, Titan Image | Cost-effective, AWS-native RAG | Embeddings for RAG, basic text generation |
| Meta | Llama 3 (8B, 70B) | Open-weight, flexible fine-tuning | Custom fine-tuned models, cost-sensitive workloads |
| Mistral | Mistral 7B, Mixtral 8x7B, Mistral Large | Efficient, multilingual | European compliance, multilingual applications |
| Cohere | Command R, Embed | RAG-optimized, enterprise search | RAG pipelines, semantic search |
| Stability AI | Stable Diffusion XL | Image generation | Visual content creation |
Model availability varies by AWS region. us-east-1 has the broadest coverage. Always check regional availability before choosing a model for a production application.
Invocation Modes - On-Demand, Batch, and Provisioned Throughput
Bedrock offers three ways to invoke models depending on your latency, throughput, and cost requirements.
| Mode | How It Works | Cost | Best For |
|---|---|---|---|
| On-Demand | Pay per input/output token, no commitment | Highest per-token rate | Variable or low traffic, development |
| Provisioned Throughput | Reserve model units (MUs) for 1 month or 6 months | Lower per-token rate, fixed monthly cost | Consistent high-volume production traffic |
| Batch Inference | Submit a JSONL file, get results async (up to 24h) | Up to 50% cheaper than on-demand | Large-scale offline processing, evals |
# Invoke a model using boto3 (on-demand)
import boto3, json
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
response = bedrock.invoke_model(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
body=json.dumps({
'anthropic_version': 'bedrock-2023-05-31',
'max_tokens': 1024,
'messages': [{'role': 'user', 'content': 'Explain VPC peering in 2 sentences.'}]
}),
contentType='application/json'
)
result = json.loads(response['body'].read())
print(result['content'][0]['text'])
Provisioned Throughput is billed for the full commitment period even if you do not use it. Only commit when you have measured consistent baseline usage that justifies the reserved capacity.
Knowledge Bases - Managed RAG on Bedrock
Bedrock Knowledge Bases automates the full retrieval-augmented generation (RAG) pipeline: document ingestion, chunking, embedding, vector storage, and retrieval. You connect a data source (S3, Confluence, SharePoint, web crawl), choose a vector store, and Bedrock handles the rest.
| Component | Options Available |
|---|---|
| Data Sources | S3, Confluence, SharePoint, Salesforce, Web Crawler |
| Embedding Models | Titan Embeddings V2, Cohere Embed |
| Vector Stores | OpenSearch Serverless, Aurora PostgreSQL (pgvector), Pinecone, Redis Enterprise, MongoDB Atlas |
| Chunking Strategies | Fixed-size, Semantic, Hierarchical (parent-child), Custom Lambda chunker |
| Retrieval Strategies | Semantic search, Hybrid search (semantic + keyword) |
The RetrieveAndGenerate API combines retrieval and generation in one call with automatic source attribution. The Retrieve API gives you raw retrieved chunks if you want to handle generation yourself.
Hierarchical chunking stores both large parent chunks and small child chunks. Retrieval uses child chunks for precision, but returns parent chunks for richer context. This significantly improves answer quality for long documents.
Bedrock Agents - Autonomous Multi-Step Task Execution
Bedrock Agents allow a foundation model to autonomously break down tasks, call APIs (via Action Groups backed by Lambda functions), query Knowledge Bases, and iterate until a goal is achieved - without you writing the orchestration logic.
An agent consists of: a foundation model, an instruction prompt, optional Action Groups (API schemas + Lambda), optional Knowledge Bases, and optional Guardrails. The agent uses ReAct-style reasoning to decide which tool to call next.
| Component | Purpose | Implementation |
|---|---|---|
| Action Group | Tools the agent can call (APIs, functions) | OpenAPI schema + Lambda function |
| Knowledge Base | Information retrieval for grounding | Bedrock Knowledge Base (managed RAG) |
| Guardrails | Safety filtering and topic blocking | Bedrock Guardrails resource |
| Memory | Retain context across sessions | Session attributes or external DynamoDB |
| Code Interpreter | Execute Python code dynamically | Built-in sandbox, no Lambda needed |
Agents invoke Lambda functions with the model's chosen parameters. Always validate and sanitize agent-provided inputs in your Lambda code - do not trust them as safe inputs to downstream systems.
Guardrails, Security, and Compliance
Bedrock Guardrails lets you define content policies that are applied on every model invocation - both input (user prompt) and output (model response). This is essential for production applications.
| Guardrail Feature | What It Does |
|---|---|
| Topic Denial | Block responses on specific topics (e.g. competitor comparisons, financial advice) |
| Content Filters | Filter hate speech, violence, sexual content, and prompt injection attacks |
| PII Redaction | Detect and redact/block 30+ PII types (SSN, credit cards, email, phone) |
| Grounding Check | Verify responses are grounded in provided context (reduces hallucination) |
| Word Filters | Block custom profanity or brand-restricted terms |
All Bedrock API calls are logged to CloudTrail. Model invocation logging (input/output payload) can be sent to S3 or CloudWatch Logs for compliance and audit. Bedrock is HIPAA eligible and supports BAAs.
Bedrock does not use your prompts or completions to train AWS or third-party models. Data is encrypted in transit and at rest. Model providers cannot access your data.
Interview Focus Points
- 1What is the difference between on-demand, provisioned throughput, and batch inference in Bedrock? When would you use each?
- 2How does Bedrock Knowledge Bases work end-to-end? What happens when you ingest a document?
- 3Explain the RAG pattern - what problem does it solve and how does Bedrock implement it?
- 4How do Bedrock Agents decide which tool to call? What is the ReAct reasoning pattern?
- 5What are Bedrock Guardrails and why would you use them instead of prompt engineering alone?
- 6How does Bedrock handle data privacy? Can Amazon or model providers see your prompts?
- 7Compare Bedrock to using OpenAI directly - what are the advantages for an AWS-native team?
- 8How would you set up model invocation logging for compliance in a regulated industry?
- 9What is the difference between semantic chunking and hierarchical chunking in Knowledge Bases?