Application & Service Management (Deprecated)

Concepts

Traditional Application Performance Management (APM) solutions are about managing the performance and availability of applications.

An application for APM tools is a static set of code runtimes (e.g. JVM or CLR) that are monitored using an agent. Normally the application is defined as a configuration parameter on each agent.

This concept, which was a good model for classical 3-tier applications, does not work anymore in modern (micro)service applications. A (micro)service does not always belong to exactly one application. Think of a credit card payment service that is used in the online store of a company and also in their Point of Sales solution. A solution to this problem could be to define every service as an application, but that would introduce some new issues:

  • Too many applications to monitor. Treating every service as an application would result in hundreds or thousands of applications. Monitoring them using dashboards will not work - just too much data for humans.

  • Loss of context. As every service is treated separately, it would not be possible to understand dependency and understand the role of the service in the context of a problem.

Service Quality Management

Instana introduces the next generation of APM with its Service Quality Management (SQM) approach. Our main goal is to simplify the monitoring of your business’ service quality. Based on the data we collect from traces and component sensors, we discover your application landscape directly from the services actually being implemented. Based upon the logical data collected, we can know the health of every individual service, and, subsequently, the health of the entire application.

Service Map

Usage

Instana’s Service Quality Management function has three essential parts, all of which can be accessed from the “Application” dropdown at the top of the screen.

Application Map

The first is the Map. This will bring you to the Service Map mentioned above. The different services and their dependencies are represented as gray geometric shapes marked with appropriate graphics and names. The logical connections between all these entities are displayed by lines connecting the individual graphics. When the Particle View toggle in the bottom right is turned on, the map will come alive with dotted representations of individual calls moving along the logical connections between the services. The map can be rearranged with either the appropriate toggles in the bottom right, or by manually clicking and dragging services around. The health of the services is displayed with the usual color scheme of gray, yellow, or red. The KPI’s for a service can be found by clicking on the name attached to that service and selecting the link. A flyout will appear on the left side with an overview and an option to open the full dashboard for that service.

Trace View

The second item in the Application dropdown is the actual Trace view. Here you can investigate all the traces collected by Instana laid out in an easy and dynamic table. Click on the head of each column to sort the traces by that metric. Also, any search terms inserted into the Dynamic Focus bar above the table will automatically filter out non-matching traces. You can select a trace by clicking anywhere in its row, and an overview will appear in the right side of the UI. A breakdown and technical explanation of traces in this context can be found on our Tracing page.

Comparison Table

The last part of Instana’s Service Quality Management is the Comparison Table. This is a powerful tool for seeing how specific processes match up to others, and finding stressed and bottlenecked components. Like every other dynamic table in Instana, the comparison table can be sorted by column header. However, you can also get very specific with the data you would like to see. Specify the content on the table with the dropdown labelled “Table content.” As an example, select “Services.” Then select the metric you would like visualized from the dropdown to its right, say “call/s.” Next, select whichever service from the table whose calls you would like visualized. A real-time graph will appear above the table displaying the call/s count for your selected service. To compare the current service with another one, simply select another service from the table and it will appear in the graph as well. Any number of services can be toggled in and out for comparison. Instana will even visualize multiple metrics for multiple services. If you go back to the metrics dropdown, simply click on another metric, such as “error rate,” to see both call/s and error rates for any currently selected services. You can reset your comparisons by selecting the “Clear Selections” button in the top right above the table.

Shortcuts

Some important functions of Instana can be activated using keyboard shortcuts.

  • c : centering the map
  • f : open and focusing the searchbar
  • esc : close the sidebar / dashboard that is currently visible
  • shift + ? : open the help dialog for shortcuts

Configuration

Service Configuration

The Service Mapper lets you customize how Instana should discover your services. It can be found by clicking on your user profile in the top right corner and selecting Settings. Configuration happens to the processing data within Instana’s Backend, there is no need to configure the Instana Agent. All settings are applied after about 30 seconds and only affect new data.

Services are discovered by analyzing traces and grouping them by function. As soon as a service is discovered, Instana will automatically begin monitoring it to learn its health, and to inform you if something needs your attention. To help Instana better match your architecture, you can customize the service detection in a number of ways. You can specify service extraction based on Docker labels and label value matches. This lets you use a substring of a label instead of the entire label. You can also configure it to extract services based on used and new host tags. It can be used to create separate services for production and qa/testing, as well as service region grouping. The following image shows a general service extraction rules which creates services based on the host tag zone and the Docker label component:

Service mapper employing host tag zone and docker label component

If no other technology specific rules, such as HTTP, batch, or ejb, are specified, Instana has fallback extraction rules under the general configuration section. You can also tell Instana to specifically ignore certain collected traces. Toggling the “Mark as ignored service” flag in the Service Mapper will tell Instana to not persist traces matching the specified service definition, and the metric definition will be ignored so no service is created. During this customization please remember that traces are the data needed to discover services.

example service

The first span has the values “Host” and “URL.” Namely they are Host = localhost:84 and URL = /shop. The two parameters can be taken into account to configure the discovery of a service.

By default, we recognize HTTP services using the first request path segment. For example, /productsearch/hello would be mapped to the service “productsearch.”

Whenever you change the default behavior and configure a rule, Instana will group the traces accordingly. Below is an example how the Service Mapper works:

service mapper

The detected service in this example will be called localhost:84/shop because the rule uses the values “Hostname” and “Request Path” to detect the service. Another request done on another server would discover a new service. If you would apply the above rule to the second span in the above example with the values Host = 172.17.42.1 and URL = /productsearch, the service would be named 172.17.42.1/productsearch. You can also set static service names in rules. Here is another example:

sqm 3

Rule Editor

The Service Mapper shows you the currently configured rules. By default, HTTP services are named by the first URL segment only. If you define a matching rule, it overrides the default. In the UI you can define rules, turn them on, and sort them. The rules are evaluated from top to bottom and the first matching rule for a trace is applied.

sqm 4

You are also able to edit the present rules and add rules as JSON, Elasticsearch, and Message Broker. This is especially handy to export and import rules.

Endpoints can be attached to individual rules of any type. These are essentially sub-services that can be integrated to make services run smoother and quicker. Simply click on “Add Endpoint” at the bottom of each rule configuring page to begin defining endpoints. Please see the page Endpoint Monitoring for more details on what endpoints are and how to properly configure them.

Composing Rules

You need to define a Rule Name (feel free to use whatever you want), and then Match conditions.

A Match condition can be set on Request Path, Host Header, Request Method, and Query Parameters. In all cases, you need to define capture groups with regular expressions (regex). The capture groups are defined by ’()‘. Each regex within a capture group that matches a string will extract the matched value. The matched groups are then accessible by using references for the Service Names (for example, “path-1”). The user can also configure captured traces to be ignored on the Service Mapper Rule Definitions page. With the “mark as ignored service” flag enabled, traces matching the service definition will not be persisted and their metric definition will be ignored so that no service is created. You can also test the rules by clicking on Test Rule:

sqm 5

Common Service Rules

First path segment as a service name:

[
 {
    "name": "First request path segment as service name",
    "enabled": false,
    "comment": "",
    "matchSpecification": {
      "path": "^/([^/]*)($|/.*)"
    },
    "extractSpecification": {
      "label": "/{path-1}"
    }
  }
]

Second path segment as service name:

[
  {
    "name": "Second request path segment as service name",
    "enabled": false,
    "comment": "",
    "matchSpecification": {
      "path": "^/[^/]+/([^/]*)($|/.*)"
    },
    "extractSpecification": {
      "label": "/{path-1}"
    }
  }
]

Third path segment as service name:

[
  {
    "name": "Third request path segment as service name",
    "enabled": false,
    "comment": "",
    "matchSpecification": {
      "path": "^/[^/]+/[^/]+/([^/]*)($|/.*)"
    },
    "extractSpecification": {
      "label": "/{path-1}"
    }
  }
]

Differentiate singular / plural REST resource names:

[
  {
    "name": "Differentiate singular / plural REST resource name",
    "enabled": false,
    "comment": "",
    "matchSpecification": {
      "path": "^/(article|articles)($|/.\*)"
    },
    "extractSpecification": {
      "label": "{path-1} REST resource"
    }
  }
]

EJB bean from method:

[
  {
    "name": "EJB bean from method",
    "enabled": false,
    "comment": "",
    "matchSpecification": {
      "method": "^.*?\\.([^.]+)\\.[^.]+\\(.*$"
    },
    "extractSpecification": {
      "label": "{method-1}"
    }
  }
]

Docker Labels

Docker labels can be used to name services, making it easier to track changes in performance and behavior of separate versions of the same component.

The following screenshot shows an example in which the Docker label com.acme.branch is used to distinguish between multiple different versions of the shop running inside a production environment:

sqm 6

Endpoint Configuration

Endpoints can be applied to the various rule types in the Service Mapper, such as Elasticsearch and HTTP. At the bottom of the configuration menu for each individual rule, there is a list of attached endpoints and the capability to define new ones. If, for example, a user specifies a rule that the request path match the extraction string /myRESTService(/.*|$) for a service, an endpoint can be assigned that provides a pathway to make the service run quicker.

Endpoints are specified paths assigned to any given service. For example, if a user wants to track all endpoint methods, they can create a new endpoint with the request method match expression (.*), which just matches all. For the endpoint name, there are now predefined placeholders available. Taking {method} {path}, will result in the endpoints “GET /myRESTService”, “DELETE /myRESTService”, etc.

Endpoint Configuration

Available variables for service mapper

batch

  • job: Define a regular expression to job name. Capture groups from matches of this regular expression are available in the service name field via the prefix job, e.g. {job-1} references the first capture group.

ejb

  • modular: Define a regular expression to modules. Capture groups from matches of this regular expression are available in the service name field via the prefix module, e.g. {module-1} references the first capture group.

  • app: Define a regular expression to match application names. Capture groups from matches of this regular expression are available in the service name field via the prefix app, e.g. {app-1} references the first capture group.

  • bean: Define a regular expression to match beans. Capture groups from matches of this regular expression are available in the service name field via the prefix bean, e.g. {bean-1} references the first capture group.

  • method: Define a regular expression to match method. Capture groups from matches of this regular expression are available in the service name field via the prefix method, e.g. {method-1} references the first capture group.

elastic
  • index: Define a regular expression to indices. Capture groups from matches of this regular expression are available in the service name field via the prefix index, e.g. {index-1} references the first capture group.

  • cluster: Define a regular expression to clusters. Capture groups from matches of this regular expression are available in the service name field via the prefix cluster, e.g. {cluster-1} references the first capture group.

general
  • host tag: Host tags are split by = or : into a key value pair. So zone:test would have the value test. To use the value in a service name, use {host.tag-zone}. When a regular expression is used to match parts of the value, they can be accessed with {host.tag-zone-1’} (for the first match group).
docker
  • label: To use the value of the docker label com.amazonaws.ecs.cluster: prod in a service name, use {docker.label-com.amazonaws.ecs.cluster}. When a regular expression is used to match parts of the value, they can be accessed with {docker.label-com.amazonaws.ecs.cluster-1} (for the first match group).
marathon
  • application id: Define a regular expression to match Marathon application ids. Capture groups from matches of this regular expression are available in the service name field via the prefix maration.appId, e.g. {marathon.appId-1} references the first capture group.

  • label: To use the value of the Marathon label HAPROXY-GROUP: shop in a service name, use {marathon.label-HAPROXY-GROUP}. When a regular expression is used to match parts of the value, they can be accessed with {marathon.label-HAPROXY-GROUP-1} (for the first match group).

nomad
  • task name: Define a regular expression to match Nomad task name. Capture groups from matches of this regular expression are available in the service name field via the prefix nomad.taskName, e.g. {nomad.taskName-1} references the first capture group.

  • job name: Define a regular expression to match Nomad job name. Capture groups from matches of this regular expression are available in the service name field via the prefix nomad.jobName, e.g. {nomad.jobName-1} references the first capture group.

JVM
  • JVM name: Define a regular expression to match the JVM name. This is the JVM name as captured by the JVM sensor. Capture groups from matches of this regular expression are available in the service name field via the prefix jvm.name, e.g. {‘{jvm.name-1}‘} references the first capture group.

http

  • request path: Define a regular expression to match requests paths. Capture groups from matches of this regular expression are available in the service name field via the prefix path, e.g. {path-1} references the first capture group.

  • host: Define a regular expression to match HTTP host headers. Capture groups from matches of this regular expression are available in the service name field via the prefix host, e.g. {host-1} references the first capture group.

  • request method: Define a regular expression to match HTTP request methods. Capture groups from matches of this regular expression are available in the service name field via the prefix method, e.g. {method-1} references the first capture group.

  • query params: Define a regular expression to match HTTP query parameters. Capture groups from matches of this regular expression are available in the service name field via the prefix params, e.g. {params-1} references the first capture group. Query parameters are captured in the form key=value&otherKey=otherValue

  • status code: Define a regular expression to match HTTP status codes. Capture groups from matches of this regular expression are available in the service name field via the prefix status, e.g. {status-1} references the first capture group.

message broker
  • Destination / Queue / Topic: Define a regular expression to match destinations / queues. Capture groups from matches of this regular expression are available in the service name field via the prefix destination, e.g. {destination-1} references the first capture group.

Terminology

Service

In a SQM system, a service is a first class citizen. Instana automatically discovers services, but also lets the user define the way services are discovered (see Service Configuration).

Service Endpoints

Each service can have different types of traces, which we call service endpoints. They are the API, or contract, of the service.

Service Group

Services can be semantically grouped based on their role and dependencies. Instana automatically groups services according to their context. Users can group the services manually based on tags or any other parameter (e.g. container labels), and they can be sorted hierarchically.

Service Quality

For each service, service endpoint, and service group, Instana will calculate, analyze, and predict the quality. Service quality cannot be measured by a simple threshold of a metric. Here is why – a threshold is a moving target based on other variables. For example, the number of requests to a service can be dependent on the number of users that are visiting the corresponding website. The number of users can be different depending on time, date, commercials shown on TV, Facebook Ads, and many more unpredictable factors. We decided to take a KPI approach to measure the health of a service. We took the Four Golden Signals by Google:

  • Load – measures how much demand/traffic is on the service. Normally measures in requests/sec.
  • Latency – response time of the service requests that have no error. Normally measured in milliseconds.
  • Errors – number of errors. Can be measured as errors per second or as a percentage of the overall number of requests vs. the number of requests with errors.
  • Instances – the number of instances of a service. Can be number of containers that are delivering the same service, or number of Tomcat application servers that have a service deployed.

Instana automatically measures these KPIs for every detected service, service endpoint, and service group. Instana applies machine learning on these KPIs to figure out the health of a service.

Application Map

Based on the collected traces, Instana will actually extract services and their dependencies. So if Service A calls Service B, they will be connected and presented in the service map that visualizes the discovered services and their (realtime) interactions. For each trace, a service involved in that trace will have a concrete service context which is the concrete interaction path with the concrete service instances involved.

Distributed Tracing

At the core of service discovery is the concept of Distributed Tracing. Instana’s discovery is based on a Distributed Tracing approach that was initially defined by a team at Google that published their internal tracing concepts in a paper called Google Dapper.

For more details, please refer to our Tracing documentation.

OpenTracing

In order to support its Distributed Tracing model, Instana utilizes OpenTracing, an open and vendor neutral API. Its ability to avoid vendor lock-in makes OpenTracing an extremely common basis for tracing. Instana strives to give customers flexibility on how traces are captured by our system. We treat any trace, regardless of its origin, equally. OpenTracing is an option for those customers who want higher control and flexibility of exactly what is traced. Our automatic instrumentation is fully compatible with our OpenTracing implementation.

For more details, please refer to our OpenTracing documentation.

Trace

A Trace is the flow of interactions between services - it represents the transaction/request workflow through a distributed system. It is capable of troubleshooting performance problems by obtaining method- and code-level details, such as execution times, of monitored applications. A trace is represented as a directed graph of spans. Traces are highly navigable. Instana has implemented a feature that enables a user to access any and all traces within a service from that service. Traces in the middle or even the end of the trace model are directly accessible from their common service or contained database calls.

For more details, please refer to our Tracing documentation.

Span

A Span is a timespan of executed code with attached attributes such as timestamp and duration. Different types of spans can have one or several sets of these, complete with metadata annotations. Spans are the building blocks behind traces, and form hierarchical sets of blocks that are ordered by 64-bit identifiers. These ID numbers are unique to each span and serve as references between “parent” and “child” spans within a trace. The first span occurring in a trace serves as root, and its identifier doubles as the trace identifier.

For more details, please refer to our Tracing documentation.

FAQ

When are services created and when are they removed from the map?

Services are created when the service has traffic. This means that it is traced and a span has been recorded. The service will be removed/expired

  • after 15 minutes without traffic if the underlying infrastructure (e.g. a JVM) is also offline
  • after 2 hours without traffic if the underlying infrastructure (e.g. a JVM) is still online You can go back in time with dynamic focus to access the service.

My HTTP service receives traffic, but it does not seem to be created. Why is that?

This happens if all HTTP calls to a service are responded to with HTTP status codes outside of the 200-299 range. A service entity only gets created if it receives at least one call to which it responds with HTTP 2xx. This is a necessary protection against flooding our customers with service entities when they are hit by a URL scanner. As a consequence, the “Service” column in the trace list view is empty for services that never respond with 2xx.

Once a service has been created by a successful call, all calls to it (4xx and 5xx calls, too) are counted towards the service’s metrics.