Apache Spark

Supported versions

Currently supported Spark versions are from 1.4.x to 2.4.x.

Sensor (Data Collection)

Spark Application

The two main components of a spark application are driver process and executor processes. Executor processes contain data only relevant to the task execution. The Driver is the main process and is responsible for coordinating the execution of a Spark application. It therefore contains all data about the performance and execution of the Spark application. This also includes data about each executor of the Spark application.

Instana collects all spark application data (including executor data) from the driver JVM. To monitor spark applications the Instana agent needs to be installed on the host on which the Spark driver JVM is running.

Please note that there are two ways of submitting spark applications to the cluster manager. Depending how this option is set the location where the driver JVM is running can change.

  • Deploy mode cluster: When submitting with the option --deploy-mode cluster, e.g. ./spark-submit --class org.apache.spark.examples.JavaWordCount --master yarn --deploy-mode cluster /path/to/app.jar, the spark driver JVM will be running on one of the worker nodes of your cluster manager. If the Instana agent is installed on worker nodes, the Spark application (driver) is discovered automatically
  • Deploy mode client: When submitting with the option --deploy-mode client, or without the option --deploy-mode (default value is client), e.g. ./spark-submit --class org.apache.spark.examples.JavaWordCount --master yarn --deploy-mode client /path/to/app.jar or ./spark-submit --class org.apache.spark.examples.JavaWordCount --master yarn /path/to/app.jar, the Spark driver JVM will be running on the host on which this command is executed. For Instana to be able to monitor this spark application, the Instana agent must be installed on the host where the Spark submit is executed.

Depending on the type of the Spark application Instana monitors different data is collected:

Batch Applications

  • Jobs
  • Stages
  • Longest completed stages
  • Executors

Streaming Applications

  • Batching
  • Scheduling delay
  • Total delay
  • Processing time
  • Output operations
  • Input records
  • Receivers
  • Executors

Spark Standalone Cluster Manager

In addition to running on the Mesos or YARN cluster managers, Spark also provides a simple standalone deploy mode. Spark standalone is a cluster manager and is made of master and worker nodes. Instana monitors whole spark standalone cluster through master node of a cluster. It collects cluster wide data and data for each worker node of a cluster.

Tracked Configuration

  • Host
  • Port
  • Rest Uri
  • Version
  • Status

Metrics

  • Alive Workers
  • Dead Workers
  • Decommissioned Workers
  • Workers In Unknown State
  • Used Memory
  • Total Memory
  • Used Cores
  • Total Cores
  • Data and metrics per worker
  • Most recent apps
  • Most recent drivers