AWS Security & Identity
Macie
Discover and protect sensitive data in S3 using machine learning classifiers
Amazon Macie is a data security service that uses machine learning to automatically discover, classify, and protect sensitive data stored in Amazon S3. It identifies personally identifiable information (PII), financial data, credentials, and other sensitive content, and surfaces findings about data access anomalies. Macie is essential for data governance, GDPR compliance, and detecting accidental S3 data exposure.
How Macie Discovers and Classifies Sensitive Data
Macie uses a combination of managed data identifiers (AWS-maintained classifiers) and custom data identifiers (your own regex patterns) to scan S3 object content. It also continuously monitors S3 bucket configurations for security posture issues.
| Detection type | Examples |
|---|---|
| Managed data identifiers - credentials | AWS secret access keys, private RSA/PGP keys, SSH keys, database connection strings |
| Managed data identifiers - financial | Credit card numbers (PCI DSS), bank account numbers, SWIFT codes |
| Managed data identifiers - PII | Social Security Numbers, passport numbers, driver's license numbers, names, email addresses, phone numbers |
| Managed data identifiers - health | DEA registration numbers, NHS numbers, health insurance IDs |
| Custom data identifiers | Your own regex + keyword patterns (e.g., internal employee IDs, proprietary codes) |
Macie has two modes of operation: automated discovery (continuous, samples all S3 buckets) and sensitive data discovery jobs (on-demand or scheduled, scans specified buckets fully or partially). Automated discovery is lower cost but less thorough; use jobs when you need definitive coverage for compliance audits.
S3 Bucket Security Posture Assessment
Macie maintains a continuous inventory of all S3 buckets in your account and assesses their security posture independently of sensitive data scanning.
| Posture check | What Macie flags |
|---|---|
| Public access | Buckets accessible by any anonymous user (ACL or bucket policy) |
| Shared externally | Buckets accessible by principals outside your AWS account |
| Encryption disabled | Buckets or objects not encrypted at rest |
| Replication | Buckets that replicate data to other accounts or regions |
| Versioning disabled | Buckets without versioning that contain sensitive data |
# Get Macie's summary of all S3 buckets
aws macie2 describe-buckets \
--criteria '{"publicAccess.effectivePermission": {"eq": ["PUBLIC"]}}' \
--query 'buckets[].{Name:bucketName,Region:region,Public:publicAccess.effectivePermission}'
# List sensitive data findings
aws macie2 list-findings \
--finding-criteria '{"criterion": {"category": {"eqExactMatch": ["SENSITIVE_DATA"]}}}'
# Create a sensitive data discovery job for a specific bucket
aws macie2 create-classification-job \
--job-type ONE_TIME \
--s3-job-definition '{"bucketDefinitions": [{"accountId": "123456789012", "buckets": ["my-data-bucket"]}]}' \
--name audit-my-data-bucketMulti-Account Macie with Organizations
Like other security services, Macie integrates with AWS Organizations to provide centralized management. A delegated administrator sees findings from all member accounts.
| Feature | How it works in multi-account |
|---|---|
| Finding aggregation | All member account findings appear in the administrator account's Macie console |
| Bucket inventory | Administrator sees all S3 buckets across the organization |
| Auto-enable | New accounts added to the Organization can be auto-enabled |
| Custom data identifiers | Can be shared from administrator to member accounts |
Macie pricing is based on the amount of data scanned. Automated discovery samples buckets but a full classification job on a large S3 bucket with terabytes of data can be expensive. Always estimate costs before running full scans on production data lakes.
Macie Pricing and Cost Control
| Pricing dimension | Cost |
|---|---|
| S3 bucket inventory and monitoring | $0.10 per bucket per month (first 3,000 buckets; free after 3,000 in most regions) |
| Automated data discovery - objects evaluated | $0.10 per GB |
| Sensitive data discovery jobs - objects scanned | $1.00 per GB (text-like objects); $0.24 per GB (database export objects) |
Cost control tips: limit discovery jobs to buckets that actually contain customer data (not log buckets or artifact stores). Use include patterns to scan only file extensions likely to contain PII (.csv, .json, .xlsx). Set a daily discovery budget to cap automated discovery spending.
Interview Focus Points
- 1What is the difference between Macie automated discovery and a classification job?
- 2What types of sensitive data does Macie detect by default?
- 3How would you use Macie to prove GDPR compliance for data stored in S3?
- 4A public S3 bucket is found to contain customer PII. Walk me through the Macie findings you would expect and your response.
- 5How do custom data identifiers work and when would you create one?
- 6What is the pricing model for Macie and how would you control costs for a large S3 data lake?
- 7How does Macie integrate with Security Hub?