Bedrock

Access foundation models from Anthropic, Meta, Mistral, and others via a single API

Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI companies - including Anthropic, Meta, Mistral, Cohere, and Amazon - through a single unified API. It lets you build generative AI applications without managing any model infrastructure, and includes features like knowledge bases, agents, guardrails, and model evaluation. For cloud engineers, Bedrock is the fastest path to production-grade generative AI on AWS.

Foundation Models Available on Bedrock

Bedrock provides access to models from multiple providers without requiring separate API keys or accounts for each. All models are accessed through the same AWS SDK and IAM authentication.

Provider	Model Family	Strengths	Common Use Cases
Anthropic	Claude 3 (Haiku, Sonnet, Opus), Claude 3.5	Reasoning, coding, long context (200k tokens)	Analysis, summarization, code generation, agents
Amazon	Titan Text, Titan Embeddings, Titan Image	Cost-effective, AWS-native RAG	Embeddings for RAG, basic text generation
Meta	Llama 3 (8B, 70B)	Open-weight, flexible fine-tuning	Custom fine-tuned models, cost-sensitive workloads
Mistral	Mistral 7B, Mixtral 8x7B, Mistral Large	Efficient, multilingual	European compliance, multilingual applications
Cohere	Command R, Embed	RAG-optimized, enterprise search	RAG pipelines, semantic search
Stability AI	Stable Diffusion XL	Image generation	Visual content creation

💡

Model availability varies by AWS region. us-east-1 has the broadest coverage. Always check regional availability before choosing a model for a production application.

Invocation Modes - On-Demand, Batch, and Provisioned Throughput

Bedrock offers three ways to invoke models depending on your latency, throughput, and cost requirements.

Mode	How It Works	Cost	Best For
On-Demand	Pay per input/output token, no commitment	Highest per-token rate	Variable or low traffic, development
Provisioned Throughput	Reserve model units (MUs) for 1 month or 6 months	Lower per-token rate, fixed monthly cost	Consistent high-volume production traffic
Batch Inference	Submit a JSONL file, get results async (up to 24h)	Up to 50% cheaper than on-demand	Large-scale offline processing, evals

bash

# Invoke a model using boto3 (on-demand)
import boto3, json

bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')

response = bedrock.invoke_model(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    body=json.dumps({
        'anthropic_version': 'bedrock-2023-05-31',
        'max_tokens': 1024,
        'messages': [{'role': 'user', 'content': 'Explain VPC peering in 2 sentences.'}]
    }),
    contentType='application/json'
)
result = json.loads(response['body'].read())
print(result['content'][0]['text'])

⚠️

Provisioned Throughput is billed for the full commitment period even if you do not use it. Only commit when you have measured consistent baseline usage that justifies the reserved capacity.

Knowledge Bases - Managed RAG on Bedrock

Bedrock Knowledge Bases automates the full retrieval-augmented generation (RAG) pipeline: document ingestion, chunking, embedding, vector storage, and retrieval. You connect a data source (S3, Confluence, SharePoint, web crawl), choose a vector store, and Bedrock handles the rest.

Component	Options Available
Data Sources	S3, Confluence, SharePoint, Salesforce, Web Crawler
Embedding Models	Titan Embeddings V2, Cohere Embed
Vector Stores	OpenSearch Serverless, Aurora PostgreSQL (pgvector), Pinecone, Redis Enterprise, MongoDB Atlas
Chunking Strategies	Fixed-size, Semantic, Hierarchical (parent-child), Custom Lambda chunker
Retrieval Strategies	Semantic search, Hybrid search (semantic + keyword)

The RetrieveAndGenerate API combines retrieval and generation in one call with automatic source attribution. The Retrieve API gives you raw retrieved chunks if you want to handle generation yourself.

💡

Hierarchical chunking stores both large parent chunks and small child chunks. Retrieval uses child chunks for precision, but returns parent chunks for richer context. This significantly improves answer quality for long documents.

Bedrock Agents - Autonomous Multi-Step Task Execution

Bedrock Agents allow a foundation model to autonomously break down tasks, call APIs (via Action Groups backed by Lambda functions), query Knowledge Bases, and iterate until a goal is achieved - without you writing the orchestration logic.

An agent consists of: a foundation model, an instruction prompt, optional Action Groups (API schemas + Lambda), optional Knowledge Bases, and optional Guardrails. The agent uses ReAct-style reasoning to decide which tool to call next.

Component	Purpose	Implementation
Action Group	Tools the agent can call (APIs, functions)	OpenAPI schema + Lambda function
Knowledge Base	Information retrieval for grounding	Bedrock Knowledge Base (managed RAG)
Guardrails	Safety filtering and topic blocking	Bedrock Guardrails resource
Memory	Retain context across sessions	Session attributes or external DynamoDB
Code Interpreter	Execute Python code dynamically	Built-in sandbox, no Lambda needed

⚠️

Agents invoke Lambda functions with the model's chosen parameters. Always validate and sanitize agent-provided inputs in your Lambda code - do not trust them as safe inputs to downstream systems.

Guardrails, Security, and Compliance

Bedrock Guardrails lets you define content policies that are applied on every model invocation - both input (user prompt) and output (model response). This is essential for production applications.

Guardrail Feature	What It Does
Topic Denial	Block responses on specific topics (e.g. competitor comparisons, financial advice)
Content Filters	Filter hate speech, violence, sexual content, and prompt injection attacks
PII Redaction	Detect and redact/block 30+ PII types (SSN, credit cards, email, phone)
Grounding Check	Verify responses are grounded in provided context (reduces hallucination)
Word Filters	Block custom profanity or brand-restricted terms

All Bedrock API calls are logged to CloudTrail. Model invocation logging (input/output payload) can be sent to S3 or CloudWatch Logs for compliance and audit. Bedrock is HIPAA eligible and supports BAAs.

💡

Bedrock does not use your prompts or completions to train AWS or third-party models. Data is encrypted in transit and at rest. Model providers cannot access your data.

🎯

Interview Focus Points

1What is the difference between on-demand, provisioned throughput, and batch inference in Bedrock? When would you use each?
2How does Bedrock Knowledge Bases work end-to-end? What happens when you ingest a document?
3Explain the RAG pattern - what problem does it solve and how does Bedrock implement it?
4How do Bedrock Agents decide which tool to call? What is the ReAct reasoning pattern?
5What are Bedrock Guardrails and why would you use them instead of prompt engineering alone?
6How does Bedrock handle data privacy? Can Amazon or model providers see your prompts?
7Compare Bedrock to using OpenAI directly - what are the advantages for an AWS-native team?
8How would you set up model invocation logging for compliance in a regulated industry?
9What is the difference between semantic chunking and hierarchical chunking in Knowledge Bases?