Host

Supported OS

  • Linux
  • Windows
  • Mac OS/OS X
  • Solaris on Sparc
  • AIX

Configuration

For detailed information, see our agent configuration documentation.

Metrics collection

Configuration data

  • Operating System name and version
  • CPU model and count
  • GPU model and count
  • Memory
  • Max Open Files
  • Hostname
  • Fully Qualified Domain Name
  • Machine ID
  • Boot ID
  • Startup time

Performance metrics

CPU usage

Overall CPU usage as percentage.

Data point: Filesystem

Granularity: 1 second

Memory usage

Total memory used as a percentage.

Data point: Filesystem

Granularity: 1 second

CPU load

Average number of processes being or waiting executed over past selected period of time.

Data point: Filesystem

Granularity: 5 seconds

CPU usage

CPU usage values as a percentage; user, system, wait, nice, and steal. The values are displayed on a graph over a selected time period.

Data point: Filesystem

Granularity: 1 second

Context switches

Total number of context switches. This is supported only on Linux hosts. The value is displayed on a graph over a selected time period.

Data point: Filesystem

Granularity: 1 second

CPU load

CPU load. The value is displayed on a graph over a selected time period.

Data point: Filesystem

Granularity: 1 second

Individual CPU usage

Individual CPU usage values as a percentage; user, system, wait, nice, and steal. The values are displayed on a graph over a selected time period.

Data point: Filesystem

Granularity: 1 second

Individual GPU usage

Individual GPU usage values.

Data point Collected from Granularity Unit
Gpu Usage nvidia-smi 1 second %
Temperature nvidia-smi 1 second °C
Encoder nvidia-smi 1 second %
Decoder nvidia-smi 1 second %
Memory Used nvidia-smi 1 second %
Memory Total nvidia-smi 1 second bytes
Transmitted throughput nvidia-smi 1 second bytes/s
Received throughput nvidia-smi 1 second bytes/s

Supported Nvidia Graphic Cards:

Brand Model
Tesla S1070, S2050, C1060, C2050/70, M2050/70/90, X2070/90, K10, K20, K20X, K40, K80, M40, P40, P100, V100
Quadro 4000, 5000, 6000, 7000, M2070-Q, K-series, M-series, P-series, RTX-series
GeForce varying levels of support, with fewer metrics available than on the Tesla and Quadro products

Supported OS: Linux

Prerequirements: Installed latest official Nvidia drivers.

Starting Instana Agent Docker container with GPU support is documented here: Enable GPU monitoring through Instana Agent container.

GPU Memory/Process

The following list of processes utilizes GPU.

Data point Collected from Granularity
Process Name nvidia-smi 1 second
PID nvidia-smi 1 second
GPU nvidia-smi 1 second
Memory nvidia-smi 1 second

Supported Nvidia Graphic Cards:

Brand Model
Tesla S1070, S2050, C1060, C2050/70, M2050/70/90, X2070/90, K10, K20, K20X, K40, K80, M40, P40, P100, V100
Quadro 4000, 5000, 6000, 7000, M2070-Q, K-series, M-series, P-series, RTX-series
GeForce varying levels of support, with fewer metrics available than on the Tesla and Quadro products

Supported OS: Linux

Prerequirements: Installed latest official Nvidia drivers.

Starting Instana Agent Docker container with GPU support is documented here: Enable GPU monitoring through Instana Agent container.

Memory

Memory used value as a percentage. Memory values as a byte; swap total, swap free, buffers, cached, and available. The values are displayed on a graph over a selected time period.

Data point: Filesystem

Granularity: 1 second

Open files

Open files usage when available on the operating system; current vs max. The values are displayed on a graph over a selected time period.

Data point: Filesystem

Granularity: 1 second

Filesystems

Filesystems per device.

Data point Collected from Granularity
Device Filesystem 60 seconds
Mount Filesystem 60 seconds
Options Filesystem 60 seconds
Type Filesystem 60 seconds
Capacity Filesystem 60 seconds
Used Filesystem 1 second
Leaked* Filesystem 1 second
Inode usage Filesystem 1 second
Reads/s, Bytes Read/s** Filesystem 1 second
Writes/s, Bytes Writes/s** Filesystem 1 second

* Leaked (refers to deleted files that are in use and equates to capacity - used - free. On Linux you can find these files with lsof | grep deleted). ** Reads/Writes are not supported for NFS (Network File System)

Network interfaces

Network traffic and errors per interface.

Data point Collected from Granularity
Interface Filesystem 60 seconds
Mac Filesystem 60 seconds
IPs Filesystem 60 seconds
RX Bytes Filesystem 1 second
RX Errors Filesystem 1 second
TX Bytes Filesystem 1 second
TX Errors Filesystem 1 second

TCP activity

TCP activity values are displayed on a graph over a selected time period.

Data point Collected from Granularity
Establised Filesystem 1 second
Open/s Filesystem 1 second
In Segments/s Filesystem 1 second
Out Segments/s Filesystem 1 second
Established Resets Filesystem 1 second
Out Resets Filesystem 1 second
Fail Filesystem 1 second
Error Filesystem 1 second
Retransmission Filesystem 1 second

Process top list

The process toplist is refreshed approximately every 10 seconds and contains only the processes that have significant system usage - more than 10% cpu usage over the previous 10 seconds, and more than 512MB memory usage (rss).

Linux top semantics are used; 100% CPU refers to a single CPU core, and you can search a history of snapshots for the previous month.

Data point Collected from Granularity
PID Filesystem 30 seconds
Process Name Filesystem 30 seconds
CPU Filesystem 30 seconds
Memory Filesystem 30 seconds

Health signatures

For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.

Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.

For information about the built-in events for the Host sensor, see the Built-in events reference.