Ace Cloud Interviews
Home/AWS Tutorial/OpenSearch Service
📈

AWS Analytics & Big Data

OpenSearch Service

Managed OpenSearch and Elasticsearch clusters for log analytics and full-text search

Amazon OpenSearch Service is a managed service for deploying, operating, and scaling OpenSearch (the open-source fork of Elasticsearch) and Kibana/OpenSearch Dashboards clusters on AWS. It is the go-to service for log analytics, full-text search, real-time application monitoring, and security analytics (SIEM). OpenSearch Service handles cluster provisioning, patching, backups, and cross-cluster replication so you focus on indexing and querying data.

Cluster Architecture - Node Types and Roles

An OpenSearch cluster is made up of nodes with different roles. Properly sizing and separating these roles is critical for production performance.

Node TypeRoleRecommendation
Data nodesStore shards and serve queriesUse storage-optimized (OR1/I3) for hot data
Dedicated master nodesCluster state management onlyUse 3 masters for production (quorum)
UltraWarm nodesRead-only warm tier backed by S3Cost-effective for 7-90 day data
Cold storageLong-term retention in S3 (query on demand)For compliance/audit data
Coordinator nodes (OR1)Route queries, aggregate resultsAdd when query concurrency is high
⚠️

Never skip dedicated master nodes in production. Without them, data nodes also handle cluster state - under heavy indexing load, the cluster can become unstable and split-brain. Use 3 dedicated master nodes with Multi-AZ enabled.

💡

OpenSearch Service uses a primary/replica shard model. Each primary shard has one or more replica shards for redundancy. The number of primary shards is fixed at index creation - plan this carefully. Typical recommendation: shard size between 10-50 GB.

Index State Management - Lifecycle, Rollover, and Tiering

For log workloads, indices grow continuously. Index State Management (ISM) automates the lifecycle: rollover when an index hits a size or age threshold, move to UltraWarm, then cold storage, then delete.

ISM ActionWhen to Use
RolloverCreate new index when current hits X GB or Y days (use with index aliases)
Force mergeReduce segment count on read-only indices to save memory
Move to UltraWarmWhen hot queries are no longer expected (typically 7-30 days)
Move to coldInfrequently queried data (30-365 days)
DeleteWhen retention period expires
bash
# Create an ISM policy that rolls over at 50GB or 7 days,
# moves to UltraWarm at 30 days, deletes at 90 days
# (PUT to /_plugins/_ism/policies/log-policy)
{
  "policy": {
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [{
          "rollover": {
            "min_index_age": "7d",
            "min_size": "50gb"
          }
        }],
        "transitions": [{"state_name": "warm", "conditions": {"min_index_age": "30d"}}]
      },
      {
        "name": "warm",
        "actions": [{"warm_migration": {}}],
        "transitions": [{"state_name": "delete", "conditions": {"min_index_age": "90d"}}]
      },
      {
        "name": "delete",
        "actions": [{"delete": {}}],
        "transitions": []
      }
    ]
  }
}

Ingestion Patterns - Logstash, Fluent Bit, Kinesis, and Direct API

OpenSearch accepts data via its HTTP REST API. Several ingestion patterns are common in production:

Ingestion MethodBest ForNotes
OpenSearch Ingestion (managed)Logs from S3, Kinesis, CloudWatchFully managed pipeline - replaces self-hosted Logstash
Kinesis Data FirehoseHigh-volume event streamsBuilt-in buffering and retry; Firehose handles backpressure
Fluent Bit DaemonSet (K8s)Container logs from EKSLightweight, low CPU; plugin for OpenSearch HTTP
LogstashComplex transformations before indexMore resource-intensive than Fluent Bit
Direct _bulk APICustom applications, batch loadersBatch 500-5000 docs per request for throughput
💡

OpenSearch Ingestion (the managed pipeline service) replaces the need to run self-hosted Logstash or Fluentd on EC2. It scales automatically and integrates with IAM for authentication. Use it for new greenfield deployments.

Pricing - Instances, Storage, and UltraWarm

ComponentCost DriverOptimization
Data node instancesInstance type x hoursReserved instances for 30-60% savings on baseline nodes
EBS storage (gp3)$0.135/GB-monthUse gp3 - cheaper and faster than gp2
UltraWarm storage$0.024/GB-month5x cheaper than EBS for warm data
Cold storage$0.01/GB-monthFor compliance/audit data rarely queried
Data transfer outStandard AWS ratesKeep consumers in same region
Dedicated master nodesInstance type x hoursRequired for production; size down from data nodes
💡

UltraWarm is 5x cheaper than hot EBS storage but queries are slower (seconds vs milliseconds). For log data older than 30 days that is only queried during incidents, UltraWarm provides excellent cost savings without sacrificing operational utility.

OpenSearch vs CloudWatch Logs Insights for Log Analytics

DimensionOpenSearch ServiceCloudWatch Logs Insights
Query languageLucene + SQL + PPLCloudWatch Insights query language
Full-text searchExcellent - inverted indexLimited - pattern matching only
Cost for high volumeLower at scale with UltraWarmExpensive - charged per GB ingested + scanned
DashboardsOpenSearch Dashboards (Kibana fork)CloudWatch Dashboards (simpler)
AlertingBuilt-in alerting + anomaly detectionCloudWatch Alarms + Insights scheduled queries
Setup complexityHigher - cluster sizing, ISMZero - fully serverless
AWS service logsRequires Firehose/delivery pipelineNative - one-click subscriptions
💡

For AWS service logs (VPC Flow, CloudTrail, ALB access), CloudWatch Logs is simpler to set up. For custom application logs at high volume, or when you need full-text search and complex aggregations, OpenSearch is significantly cheaper and more capable.

🎯

Interview Focus Points

  • 1Why do you need dedicated master nodes in an OpenSearch production cluster, and what happens without them?
  • 2Explain the UltraWarm tier - how does it work and what workloads justify it?
  • 3How does Index State Management work and how would you configure a log lifecycle policy?
  • 4Compare OpenSearch Service to CloudWatch Logs Insights for log analytics - when do you choose each?
  • 5What is the impact of shard count on OpenSearch performance and how do you size shards correctly?
  • 6How would you ingest Kubernetes application logs from an EKS cluster into OpenSearch?
  • 7How does OpenSearch fine-grained access control work with IAM and internal users?
  • 8A search query that used to return in 100ms now takes 5 seconds - walk me through diagnosing the issue.