Elasticsearch

Metrics collection

Node-Level

Configuration data

  • Version
  • Cluster
  • Health Status
  • Node-name
  • Node-type
  • Node is Master
  • Node is Master Eligible
  • Transport
  • Log Directory
  • Shards
  • Indices

Performance metrics

Data point Description Granularity
Query Latency Query latency is collected from NodeIndicesStats#SearchStats. 1 second
Number of Queries Query count per second is collected from NodeIndicesStats#SearchStats. 1 second
Overall Documents Total Documents is collected from DocsStats#count. 1 second
Added Documents The total number of indexing operations is collected from IndexingStats#indexCount. 1 second
Removed Documents The number of delete operation executed is collected from IndexingStats#deleteCount. 1 second
Active Shards The number of active shards is collected from IndexRoutingTable#ShardRouting. 1 second
Active Primary Shards The number of active primary shards is collected from IndexRoutingTable#ShardRouting. 1 second
Refresh Count The number of refresh executed per second is collected from NodeIndicesStats#RefreshStats. 1 second
Refresh Time The total time merges have been executed is collected from NodeIndicesStats#RefreshStats. 1 second
Flush Count The total number of flush executed per second is collected from NodeIndicesStats#FlushStats. 1 second
Flush Time The total time merges have been executed is collected from NodeIndicesStats#FlushStats. 1 second
Indices metrics Documents count, Deleted count and Size per index is collected from IndexStats#DocsStats. 1 second
Lucene Segments The number of segments is collected from NodeIndicesStats#SegmentsStats#count. 1 second
Active Threads Search, Index, Bulk, Merge, Flush, Get, Management, Refresh are collected from ThreadPoolStats.Stats#active. 1 second
Queued Threads Search, Index, Bulk, Merge, Flush, Get, Management, Refresh are collected from ThreadPoolStats.Stats#queue. 1 second
Rejected Threads Search, Index, Bulk, Get are collected from ThreadPoolStats.Stats#rejected. 1 second

Health Signatures

Event Description
Capacity limit while rebalancing The node is relocating shards at the time of being at the capacity limit
Heap overallocation The heap size setting of the Elasticsearch is too big
High heap usage The heap usage too high
Node at capacity limits High load and CPU usage on the host, high heap usage and high GC time in the Elasticsearch JVM
Node status Cluster status provided by the Elasticsearch
Rejected actions The number of rejected threads is too high

Cluster-Level

Configuration data

  • Name
  • Health Status
  • Nodes, Masters

Performance metrics

Data point Description Granularity
Query Latency Query latency is calculated as max query latency of all nodes. 1 second
Number of Queries Query count is calculated as query count sum for all nodes. 1 second
Overall Documents Total Documents is calculated as sum of overall documents for all nodes. 1 second
Added Documents Added Documents is calculated as sum of added documents for all nodes. 1 second
Removed Documents Removed Documents is calculated as sum of removed documents for all nodes. 1 second
Indices Number of indices 1 second
Shards Active, Active Primary, Initializing, Relocating, Unassigned is collected from ClusterHealth. 1 second
Cluster State size Size of the ClusterState. 1 second

Health Signatures

Event Description
Cluster Health The status of Elasticsearch cluster
Split Brain (*) Elasticsearch cluster has more than 1 master node

(*) Note: Split Brain will be triggered for environments with two Elastic clusters with the same name.