Ace Cloud Interviews
Home/AWS Tutorial/Lake Formation
📈

AWS Analytics & Big Data

Lake Formation

Build, secure, and manage data lakes on Amazon S3 with fine-grained access control

AWS Lake Formation is a service that simplifies building, securing, and managing data lakes on Amazon S3 by providing a central place to define fine-grained access controls, data permissions, and auditing across all analytics services. It acts as a permissions layer on top of S3 and the Glue Data Catalog, enabling column-level and row-level access control that is enforced uniformly whether users query via Athena, EMR, or Redshift Spectrum. Lake Formation is critical for enterprise data governance and compliance scenarios.

How Lake Formation Controls Data Access

Lake Formation sits between IAM/S3 and analytics services. Instead of managing S3 bucket policies and IAM policies separately per service, you grant permissions on Glue Catalog objects (databases, tables, columns) in Lake Formation. The service then enforces these across all integrated consumers.

Without Lake FormationWith Lake Formation
S3 bucket policies + IAM roles per user/serviceSingle permission grant in Lake Formation console or API
No column-level or row-level control in S3Column masking, column exclusion, row filter expressions
Athena, EMR, Redshift each need their own IAM configOne policy enforced across all three
Audit trail scattered across CloudTrail eventsCentralized data access audit in Lake Formation
💡

Lake Formation uses a model called LF-Tags (attribute-based access control) for scalable permission management. Instead of granting access per-table, you assign tags to tables/columns and policies to principals. This scales much better than resource-based grants for large data lakes with hundreds of tables.

Permission Types - SUPER, SELECT, Column, and Row Filters

Lake Formation has a hierarchy of permission types:

PermissionLevelWhat It Allows
CREATE_DATABASEDatabaseCreate databases in the catalog
CREATE_TABLEDatabaseCreate tables in a database
ALTER, DROPTableModify or delete table metadata
SELECTTable or column subsetQuery data - can be scoped to specific columns
DATA_LOCATION_ACCESSS3 pathRequired for CREATE TABLE and ETL writes
Row filterTableFilter rows via SQL expression per principal
Column maskColumnReplace column value with null or hash for specific principals
bash
# Grant column-level SELECT to an IAM role
aws lakeformation grant-permissions \
  --principal DataLakePrincipalIdentifier=arn:aws:iam::123456789012:role/AnalystRole \
  --permissions SELECT \
  --resource '{
    "TableWithColumns": {
      "DatabaseName": "sales_db",
      "Name": "orders",
      "ColumnNames": ["order_id", "product", "quantity", "created_at"]
    }
  }'
  # email and credit_card_number columns are excluded
⚠️

Lake Formation permissions are in addition to IAM - both must allow the action. A common mistake is granting Lake Formation SELECT but forgetting that the IAM role also needs s3:GetObject on the underlying S3 bucket. Lake Formation provides temporary credentials via its own vend mechanism, which bypasses S3 policies when properly configured.

Cross-Account Data Sharing with Lake Formation

Lake Formation supports sharing Glue Catalog databases and tables with other AWS accounts or AWS Organizations without copying data. The data stays in the producer account's S3; the consumer account queries it via their own Athena or EMR.

StepProducer AccountConsumer Account
1Register S3 location with Lake Formation-
2Grant RAM resource share on database/table-
3-Accept RAM resource share
4-Create resource link to shared database in own catalog
5-Query via Athena using the resource link
💡

Cross-account Lake Formation sharing is a key alternative to copying datasets between accounts. The consumer account sees only the columns and rows they have been granted access to - the fine-grained controls transfer across account boundaries.

LF-Tags - Attribute-Based Access Control for Scale

LF-Tags replace resource-based grants with attribute-based access control (ABAC). You tag catalog resources (tables, columns) with key-value tags, then create tag-based policies that grant access to all resources with matching tags.

ApproachScales to 100+ Tables?Audit ClaritySetup Complexity
Named resource grantsPoor - one grant per table per principalClearSimple for small catalogs
LF-Tag policiesExcellent - one policy covers all matching resourcesGood - tags are self-documentingHigher upfront

Example: tag all PII columns with sensitivity=pii and all marketing tables with domain=marketing. Then grant the marketing analyst role access to domain=marketing but exclude sensitivity=pii columns automatically.

🎯

Interview Focus Points

  • 1What problem does Lake Formation solve that plain S3 bucket policies and IAM cannot?
  • 2Explain column-level and row-level security in Lake Formation - give a real-world use case.
  • 3How does cross-account data sharing work in Lake Formation - does the data get copied?
  • 4What are LF-Tags and why are they better than named resource grants for large data lakes?
  • 5How does Lake Formation interact with Athena and EMR - does it replace IAM permissions or add to them?
  • 6What is the DATA_LOCATION_ACCESS permission and when is it required?
  • 7How does Lake Formation audit data access, and how would you integrate this with a compliance workflow?