Ace Cloud Interviews
🔒

AWS Security & Identity

Macie

Discover and protect sensitive data in S3 using machine learning classifiers

Amazon Macie is a data security service that uses machine learning to automatically discover, classify, and protect sensitive data stored in Amazon S3. It identifies personally identifiable information (PII), financial data, credentials, and other sensitive content, and surfaces findings about data access anomalies. Macie is essential for data governance, GDPR compliance, and detecting accidental S3 data exposure.

How Macie Discovers and Classifies Sensitive Data

Macie uses a combination of managed data identifiers (AWS-maintained classifiers) and custom data identifiers (your own regex patterns) to scan S3 object content. It also continuously monitors S3 bucket configurations for security posture issues.

Detection typeExamples
Managed data identifiers - credentialsAWS secret access keys, private RSA/PGP keys, SSH keys, database connection strings
Managed data identifiers - financialCredit card numbers (PCI DSS), bank account numbers, SWIFT codes
Managed data identifiers - PIISocial Security Numbers, passport numbers, driver's license numbers, names, email addresses, phone numbers
Managed data identifiers - healthDEA registration numbers, NHS numbers, health insurance IDs
Custom data identifiersYour own regex + keyword patterns (e.g., internal employee IDs, proprietary codes)
💡

Macie has two modes of operation: automated discovery (continuous, samples all S3 buckets) and sensitive data discovery jobs (on-demand or scheduled, scans specified buckets fully or partially). Automated discovery is lower cost but less thorough; use jobs when you need definitive coverage for compliance audits.

S3 Bucket Security Posture Assessment

Macie maintains a continuous inventory of all S3 buckets in your account and assesses their security posture independently of sensitive data scanning.

Posture checkWhat Macie flags
Public accessBuckets accessible by any anonymous user (ACL or bucket policy)
Shared externallyBuckets accessible by principals outside your AWS account
Encryption disabledBuckets or objects not encrypted at rest
ReplicationBuckets that replicate data to other accounts or regions
Versioning disabledBuckets without versioning that contain sensitive data
bash
# Get Macie's summary of all S3 buckets
aws macie2 describe-buckets \
  --criteria '{"publicAccess.effectivePermission": {"eq": ["PUBLIC"]}}' \
  --query 'buckets[].{Name:bucketName,Region:region,Public:publicAccess.effectivePermission}'

# List sensitive data findings
aws macie2 list-findings \
  --finding-criteria '{"criterion": {"category": {"eqExactMatch": ["SENSITIVE_DATA"]}}}'

# Create a sensitive data discovery job for a specific bucket
aws macie2 create-classification-job \
  --job-type ONE_TIME \
  --s3-job-definition '{"bucketDefinitions": [{"accountId": "123456789012", "buckets": ["my-data-bucket"]}]}' \
  --name audit-my-data-bucket

Multi-Account Macie with Organizations

Like other security services, Macie integrates with AWS Organizations to provide centralized management. A delegated administrator sees findings from all member accounts.

FeatureHow it works in multi-account
Finding aggregationAll member account findings appear in the administrator account's Macie console
Bucket inventoryAdministrator sees all S3 buckets across the organization
Auto-enableNew accounts added to the Organization can be auto-enabled
Custom data identifiersCan be shared from administrator to member accounts
⚠️

Macie pricing is based on the amount of data scanned. Automated discovery samples buckets but a full classification job on a large S3 bucket with terabytes of data can be expensive. Always estimate costs before running full scans on production data lakes.

Macie Pricing and Cost Control

Pricing dimensionCost
S3 bucket inventory and monitoring$0.10 per bucket per month (first 3,000 buckets; free after 3,000 in most regions)
Automated data discovery - objects evaluated$0.10 per GB
Sensitive data discovery jobs - objects scanned$1.00 per GB (text-like objects); $0.24 per GB (database export objects)
💡

Cost control tips: limit discovery jobs to buckets that actually contain customer data (not log buckets or artifact stores). Use include patterns to scan only file extensions likely to contain PII (.csv, .json, .xlsx). Set a daily discovery budget to cap automated discovery spending.

🎯

Interview Focus Points

  • 1What is the difference between Macie automated discovery and a classification job?
  • 2What types of sensitive data does Macie detect by default?
  • 3How would you use Macie to prove GDPR compliance for data stored in S3?
  • 4A public S3 bucket is found to contain customer PII. Walk me through the Macie findings you would expect and your response.
  • 5How do custom data identifiers work and when would you create one?
  • 6What is the pricing model for Macie and how would you control costs for a large S3 data lake?
  • 7How does Macie integrate with Security Hub?