Ace Cloud Interviews
🤖

AWS AI & Machine Learning

Bedrock

Access foundation models from Anthropic, Meta, Mistral, and others via a single API

Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI companies - including Anthropic, Meta, Mistral, Cohere, and Amazon - through a single unified API. It lets you build generative AI applications without managing any model infrastructure, and includes features like knowledge bases, agents, guardrails, and model evaluation. For cloud engineers, Bedrock is the fastest path to production-grade generative AI on AWS.

Foundation Models Available on Bedrock

Bedrock provides access to models from multiple providers without requiring separate API keys or accounts for each. All models are accessed through the same AWS SDK and IAM authentication.

ProviderModel FamilyStrengthsCommon Use Cases
AnthropicClaude 3 (Haiku, Sonnet, Opus), Claude 3.5Reasoning, coding, long context (200k tokens)Analysis, summarization, code generation, agents
AmazonTitan Text, Titan Embeddings, Titan ImageCost-effective, AWS-native RAGEmbeddings for RAG, basic text generation
MetaLlama 3 (8B, 70B)Open-weight, flexible fine-tuningCustom fine-tuned models, cost-sensitive workloads
MistralMistral 7B, Mixtral 8x7B, Mistral LargeEfficient, multilingualEuropean compliance, multilingual applications
CohereCommand R, EmbedRAG-optimized, enterprise searchRAG pipelines, semantic search
Stability AIStable Diffusion XLImage generationVisual content creation
💡

Model availability varies by AWS region. us-east-1 has the broadest coverage. Always check regional availability before choosing a model for a production application.

Invocation Modes - On-Demand, Batch, and Provisioned Throughput

Bedrock offers three ways to invoke models depending on your latency, throughput, and cost requirements.

ModeHow It WorksCostBest For
On-DemandPay per input/output token, no commitmentHighest per-token rateVariable or low traffic, development
Provisioned ThroughputReserve model units (MUs) for 1 month or 6 monthsLower per-token rate, fixed monthly costConsistent high-volume production traffic
Batch InferenceSubmit a JSONL file, get results async (up to 24h)Up to 50% cheaper than on-demandLarge-scale offline processing, evals
bash
# Invoke a model using boto3 (on-demand)
import boto3, json

bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')

response = bedrock.invoke_model(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    body=json.dumps({
        'anthropic_version': 'bedrock-2023-05-31',
        'max_tokens': 1024,
        'messages': [{'role': 'user', 'content': 'Explain VPC peering in 2 sentences.'}]
    }),
    contentType='application/json'
)
result = json.loads(response['body'].read())
print(result['content'][0]['text'])
⚠️

Provisioned Throughput is billed for the full commitment period even if you do not use it. Only commit when you have measured consistent baseline usage that justifies the reserved capacity.

Knowledge Bases - Managed RAG on Bedrock

Bedrock Knowledge Bases automates the full retrieval-augmented generation (RAG) pipeline: document ingestion, chunking, embedding, vector storage, and retrieval. You connect a data source (S3, Confluence, SharePoint, web crawl), choose a vector store, and Bedrock handles the rest.

ComponentOptions Available
Data SourcesS3, Confluence, SharePoint, Salesforce, Web Crawler
Embedding ModelsTitan Embeddings V2, Cohere Embed
Vector StoresOpenSearch Serverless, Aurora PostgreSQL (pgvector), Pinecone, Redis Enterprise, MongoDB Atlas
Chunking StrategiesFixed-size, Semantic, Hierarchical (parent-child), Custom Lambda chunker
Retrieval StrategiesSemantic search, Hybrid search (semantic + keyword)

The RetrieveAndGenerate API combines retrieval and generation in one call with automatic source attribution. The Retrieve API gives you raw retrieved chunks if you want to handle generation yourself.

💡

Hierarchical chunking stores both large parent chunks and small child chunks. Retrieval uses child chunks for precision, but returns parent chunks for richer context. This significantly improves answer quality for long documents.

Bedrock Agents - Autonomous Multi-Step Task Execution

Bedrock Agents allow a foundation model to autonomously break down tasks, call APIs (via Action Groups backed by Lambda functions), query Knowledge Bases, and iterate until a goal is achieved - without you writing the orchestration logic.

An agent consists of: a foundation model, an instruction prompt, optional Action Groups (API schemas + Lambda), optional Knowledge Bases, and optional Guardrails. The agent uses ReAct-style reasoning to decide which tool to call next.

ComponentPurposeImplementation
Action GroupTools the agent can call (APIs, functions)OpenAPI schema + Lambda function
Knowledge BaseInformation retrieval for groundingBedrock Knowledge Base (managed RAG)
GuardrailsSafety filtering and topic blockingBedrock Guardrails resource
MemoryRetain context across sessionsSession attributes or external DynamoDB
Code InterpreterExecute Python code dynamicallyBuilt-in sandbox, no Lambda needed
⚠️

Agents invoke Lambda functions with the model's chosen parameters. Always validate and sanitize agent-provided inputs in your Lambda code - do not trust them as safe inputs to downstream systems.

Guardrails, Security, and Compliance

Bedrock Guardrails lets you define content policies that are applied on every model invocation - both input (user prompt) and output (model response). This is essential for production applications.

Guardrail FeatureWhat It Does
Topic DenialBlock responses on specific topics (e.g. competitor comparisons, financial advice)
Content FiltersFilter hate speech, violence, sexual content, and prompt injection attacks
PII RedactionDetect and redact/block 30+ PII types (SSN, credit cards, email, phone)
Grounding CheckVerify responses are grounded in provided context (reduces hallucination)
Word FiltersBlock custom profanity or brand-restricted terms

All Bedrock API calls are logged to CloudTrail. Model invocation logging (input/output payload) can be sent to S3 or CloudWatch Logs for compliance and audit. Bedrock is HIPAA eligible and supports BAAs.

💡

Bedrock does not use your prompts or completions to train AWS or third-party models. Data is encrypted in transit and at rest. Model providers cannot access your data.

🎯

Interview Focus Points

  • 1What is the difference between on-demand, provisioned throughput, and batch inference in Bedrock? When would you use each?
  • 2How does Bedrock Knowledge Bases work end-to-end? What happens when you ingest a document?
  • 3Explain the RAG pattern - what problem does it solve and how does Bedrock implement it?
  • 4How do Bedrock Agents decide which tool to call? What is the ReAct reasoning pattern?
  • 5What are Bedrock Guardrails and why would you use them instead of prompt engineering alone?
  • 6How does Bedrock handle data privacy? Can Amazon or model providers see your prompts?
  • 7Compare Bedrock to using OpenAI directly - what are the advantages for an AWS-native team?
  • 8How would you set up model invocation logging for compliance in a regulated industry?
  • 9What is the difference between semantic chunking and hierarchical chunking in Knowledge Bases?