Ace Cloud Interviews
📈

AWS Analytics & Big Data

MSK

Fully managed Apache Kafka for real-time event streaming pipelines

Amazon MSK (Managed Streaming for Apache Kafka) is a fully managed service that runs Apache Kafka on AWS without requiring you to provision, configure, or patch Kafka brokers, ZooKeeper, or Kafka Raft (KRaft) nodes. MSK handles broker scaling, storage growth, security patching, and multi-AZ replication, while giving you full access to native Kafka producer/consumer APIs. It is the correct choice when you need Kafka protocol compatibility, very high throughput, or are migrating an existing Kafka workload to AWS.

MSK Architecture - Brokers, ZooKeeper, and KRaft

An MSK cluster consists of Kafka brokers spread across multiple Availability Zones for high availability. MSK manages the control plane (ZooKeeper or KRaft) invisibly.

ComponentMSK Managed?Your Responsibility
Kafka broker EC2 instancesYes - provisioned, patchedChoose instance type and count
ZooKeeper / KRaft quorumYes - fully managedNone
Broker storage (EBS)Yes - auto-expand availableSet initial size; enable auto-expand
Kafka topics and partitionsNoYou create, size, and manage topics
Producer/consumer clientsNoYour application code
MSK Connect (Kafka Connect)Yes - managed connectorsConfigure connector workers
💡

MSK Serverless is a completely hands-off mode - no broker count, no instance types. You just create a cluster and start producing. MSK Serverless is priced per partition-hour and per GB transferred, making it ideal for variable or unpredictable workloads. Provisioned MSK is better for steady high-throughput workloads where you can right-size brokers.

MSK vs Kinesis Data Streams - Detailed Comparison

MSK and Kinesis Data Streams solve similar problems. The choice depends on throughput, existing ecosystem, and operational preference.

DimensionMSK (Kafka)Kinesis Data Streams
ProtocolNative Kafka APIAWS proprietary API
Throughput limitNo hard limit - add brokers/partitions1 MB/s per shard
Partition scalingAdd partitions to existing topics anytimeShard split (minutes delay)
Consumer groupsYes - Kafka consumer groups with offset commitsKCL checkpointing in DynamoDB
RetentionConfigurable per topic (hours to unlimited with tiered storage)24h default, up to 365 days
Message orderingPer partitionPer shard
EcosystemKafka Connect, Kafka Streams, Flink, SparkLambda, KDA, limited ecosystem
Operational burdenMedium - you manage topics, retention, ACLsLow - fewer knobs
Cost at low volumeHigher - broker minimum ~$0.21/hr per brokerLower - pay per shard-hour at $0.015/hr
💡

Choose MSK when: you need Kafka protocol compatibility, you are migrating from on-premises Kafka, you need Kafka Streams or Kafka Connect, or you need very high throughput (100 MB/s+). Choose Kinesis when: you are building a new AWS-native pipeline, throughput is moderate, and you want minimal operational overhead.

MSK Security - Encryption, Authentication, and Authorization

MSK supports multiple security layers that can be combined:

Security LayerOptionsNotes
Encryption in transitTLS (enforced or optional)Enable TLS for all production clusters
Encryption at restAWS KMS (default or CMK)Enabled by default
Client authenticationIAM, SASL/SCRAM, mTLSIAM preferred for AWS-native clients
Authorization (ACLs)Kafka ACLs or IAM policiesKafka ACLs for per-topic control
Network accessVPC with security groupsMSK never exposes public endpoints by default
⚠️

MSK IAM authentication requires the AWS MSK IAM auth library in your producer/consumer clients. It cannot be used with standard Kafka CLI tools without additional configuration. For ops tools and migration tasks, SASL/SCRAM is often simpler. Many teams use IAM for application clients and SASL/SCRAM for Kafka Connect workers.

MSK Connect - Managed Kafka Connect Workers

MSK Connect is a managed Kafka Connect runtime. Instead of running Kafka Connect workers on EC2, you deploy connectors as MSK Connect workers that auto-scale.

Connector TypeUse CaseExample
Source connectorPull data from external systems into KafkaDebezium CDC from RDS, S3 source connector
Sink connectorPush data from Kafka to a destinationS3 Sink (write to data lake), OpenSearch Sink
bash
# Create an MSK Connect connector (S3 Sink)
aws kafkaconnect create-connector \
  --connector-name "s3-sink" \
  --kafka-cluster ClusterArn=arn:aws:kafka:...,VpcConfig={...} \
  --connector-configuration \
    "connector.class=io.confluent.connect.s3.S3SinkConnector,\
tasks.max=4,\
topics=user-events,\
s3.region=us-east-1,\
s3.bucket.name=my-data-lake,\
flush.size=1000,\
storage.class=io.confluent.connect.s3.storage.S3Storage,\
format.class=io.confluent.connect.s3.format.parquet.ParquetFormat" \
  --capacity AutoScaling={...}
💡

Debezium running on MSK Connect is one of the most popular patterns for Change Data Capture (CDC) - it reads database transaction logs (PostgreSQL WAL, MySQL binlog) and produces Kafka events for every row change. This powers real-time data lake synchronization without polling.

MSK Tiered Storage and Cost Optimization

MSK Tiered Storage automatically offloads older log segments to S3, reducing broker EBS storage costs significantly while maintaining consumer access to historical data.

Storage TierLocationCostLatency
Local (hot)Broker EBS volumes$0.10-0.16/GB-monthMicroseconds
Tiered (S3)S3 Standard$0.023/GB-monthMilliseconds (on first read)
💡

Enable tiered storage when topic retention is longer than a few days. The broker EBS cost dominates MSK spend at high retention - tiered storage can reduce storage costs by 80%+ for topics with weeks or months of retention. Consumers do not need to change - they use the same offset-based API.

🎯

Interview Focus Points

  • 1When would you choose MSK over Kinesis Data Streams for a new event streaming pipeline on AWS?
  • 2Explain MSK Serverless - how does it differ from provisioned MSK and what workloads suit it?
  • 3What is Change Data Capture (CDC) and how would you implement it using MSK and Debezium?
  • 4How does MSK handle high availability across Availability Zones?
  • 5Compare IAM authentication vs SASL/SCRAM for MSK client authentication - when do you use each?
  • 6What is MSK Tiered Storage and how does it affect producer/consumer behavior?
  • 7How do Kafka ACLs work in MSK and how do you grant per-topic access to specific consumers?
  • 8Walk me through sizing an MSK cluster for a workload that produces 500 MB/s peak throughput.