Infrastructure Monitoring

Concepts

Infrastructure - physical, virtual, cloud, hybrid, containerized - is the underlying layer to provide the relevant resources and services for applications. There are several challenges when monitoring infrastructure:

  • Monitoring software installation needs to scale and be low maintenance;
  • Discover all relevant components with their configuration, and ensure that the data is current;
  • Track changes;
  • Relate infrastructure issues to application impact, and vice versa.

Especially in modern architectures the infrastructure is constantly changing and leverages concepts like clustering and autoscaling to ensure reliability, scaling, and flexibility. Instana ensures that at any time the infrastructure is monitored and represented as it is. All issues and changes that are detected on the infrastructure are constantly related to any occurring issues and incidents on the application and end user level - giving the users a comprehensive understanding of all parts that deliver the application.

Usage

Infrastructure Map

The Infrastructure map provides an overview of all monitored systems, which are grouped by named zones (two-dimensional colored rectangles). Within each zone are pillars comprised of opaque blocks. Each pillar as a whole represents one agent running on the respective system. Each block within the pillars represents the software components running on that system, and will change color to reflect the component’s health. Specific types of components can be processes (a JVM or an Apache process), or specific servers running within those processes, such as a Tomcat server within a JVM.

To zoom in and out use the +/- buttons in the lower right corner, or your mouse wheel.

infrastructure 1

The map is broken out into a rough hierarchy from broad to specific: Zone → System/Host → Processes → Components.

As said above, each pillar represents an OS instance (a.k.a host), and the blocks within them representing all running processes or components. The default grouping of entities into their zones can be defined either automatically or manually in the configuring the agent - zones. However, users can temporarily reorganize the zoning with the button in the bottom right. A list of different groupings will be available based on container technologies.

Zooming in on a specific pillar reveals visual indicators of the various entities deployed on that system as discovered by Instana’s Agent. They show up as the branded icons running down the right side of the pillar.

In its role as monitoring software, Instana automatically tracks all versions of an entity, including configuration changes, even if the entity briefly goes offline and back online. In dynamic environments, it is normal for entities to not always be online, so knowing when a specific entity is offline means Instana can tell you which entities are actually available for selection.

infrastructure 7

Search Options

The search bar at the top of the map of the infrastructure views is part of the dynamic focus feature. This is such a central part of the product that it has its own documentation. Head over there to learn more!

Changing the view

In the lower right corner of the screen you will find three helpful icons which will assist you in getting necessary system information. The default view is component-based and shows you all monitored boxes in their respective zones, as well as the components running inside. You can see what entities are running within an OS instance and if there are any incidents, events, or changes (see the color change in the above image).

What if I have hundreds of servers and I want to know if there are any with high CPU cycles? Or what if I am simply overwhelmed with the number of servers?

The solution is to change the view using the icons for Tag View, Aggregate Metric View, Metric View & 3D, and Table toggle.

The next section explains these different views in greater detail.

Aggregate Metrics

The default view on values like latency time or CPU usage displays a single value (based on the end of selected time window). Because live view is the default, you will usually only see the momentary status of any selected metric. To view the aggregation of metrics over the selected time period (e.g. one day), you can use the aggregate metrics view toggle. When the UI is in this mode, all the displayed metrics will show the averaged value over the selected time period, therefore you can see, for example, the average CPU usage over 24 hours.

Metric View - Show Metrics

When you look at the dashboard of the Linux box, for example, you will see metrics like CPU usage or the load average. Those are extremely useful metrics, and you can see them for several systems at once in order to spot which ones are overloaded or stick out, even when there is no warning present.

You can do this by using the metric view. To enable it, simply click on the Show Metrics icon (right to the Show Tags one) and select the metric you would like to see on all the boxes. There are three metrics at the moment you can select:

  1. CPU Load
  2. CPU usage
  3. Free Memory

Once you select a metric it will be displayed in realtime on the various boxes on the map. This enables you to see if, for example, the CPU usage or load is more or less equally spread throughout a deployment zone. The icon itself will be highlighted as long as a metric is selected to remind you that it is turned on.

To disable the metric view and switch back to the default component view, simply click the icon and select reset in the lower right corner.

Comparison Table

The 3D map is really nice (actually it is awesome), and it is a great help in getting an overview. However, sometimes a little more structure can be helpful too. We have taken care of that by introducing the comparison table. As the name states, all information will be presented in a table sorted by zones. This makes it easy to sort and find the servers and boxes you would like to look at.

To enable it, use the infrastructure dropdown menu on top of the screen. The table lists all hosts and their main metrics in columns. You can sort each of those columns and get a comprehensive overview over your landscape in terms of CPU usage or memory consumption.

But wait, there’s more. On top of the screen you have a little dropdown menu which enables you to display metrics over time for selected hosts. To be precise, you can select a line in the table by simply clicking on it. Clicking again will unselect it (toggle). This way you can multi select all relevant hosts.

Shortcuts

Some important functions of Instana can be activated using keyboard shortcuts.

  • c : centering the map
  • f : open and focusing the searchbar
  • esc : close the sidebar / dashboard that is currently visible
  • shift + ? : open the help dialog for shortcuts

Terminology

The Infrastructure Monitoring is visualized through the Infrastructure Map. Briefly, the Map is comprised of Zones, Entities, and Tags which are explained in more detail below.

Entity Overview

An entity can be the OS instance itself (the host), or any monitored process running on it. Each entity has its own overview sidebar containing all the details that will appear when you click on it.

A feature called Breadcrumbs presents a stacked view of the selected entity’s relationship with other entities around it. It is a light gray box that appears on the right hand side of the overview sidebar filled with insignias of the entities inside. If there are problems with the related entities in the breadcrumbs box, the corresponding insignia will display its health in the correct color code, making it easier to understand how the current problem affects the hierarchy of processes.

To close the overview again, click anywhere else on the screen (beware: there is no close button, don’t look for it):

Entity Dashboard

Clicking on the dashboard button will lead you to the detailed values over time. The resolution (amount of time rolled up into one datapoint) is dependent on the timeframe you selected, starting with one second and up to 10 minutes per datapoint.

At the top of the graph, you’ll notice a list of the available groups of metrics, where the ones currently visible are shaded in a light grey.

Click on any of those groups to scroll to the desired section and get visibility.

Depending on the type of entity you selected, there will be more information on the left hand side. For a physical host, there is the name, ip address, number of CPU’s, etc, as well as a list of all the software entities running on this machine.

Comparison Table

top, htop, and similar utilities are very helpful as day-to-day tools for administrators. With the process comparison table, there is now a top-like list that helps understanding of resource utilization across either many hosts or a single host. And of course the search functionality can be used to restrict the number of matches as you see fit.

infrastructure 6

Tags

Instana users can add one or more tags to each agent in order to organize their system logically. Please refer to agent configuration - tags for further information how to define tags. Current tags can be found with either the Dynamic Focus or the “Show Tags” button.

When a tag is clicked, only the systems associated with that tag will be presented on the Infrastructure Map. Clicking the same tag again will deselect and revert the map back to the whole picture. Multiple tags can be AND connected so that only the systems associated with ALL selected tags are shown.

Tags can be used to organize not just agents, but also hosts. Users can define a prefix that differentiates the groups of hosts. In the below example, hostGroup= is a configured prefix that will sort all the hosts pertaining to the demo.

infrastructure 9

In order to define individual prefixes, open the Grouping Menu, select Tag Prefix at the bottom of the list, and type the desired prefix into the text field. The map will automatically group the hosts according to that tag. In the event of multiple matches, Instana gives priority to the first tap and ignores the rest.

infrastructure 12

To remind you that your view is not complete, the Show Tags icon will be highlighted as long as a tag is selected. If you can not find a system on the screen, check if there is a tag selected.

Zone

Zones in the infrastructure view have their own dashboards that give users an overview of that particular zone. Click on the colored tag labeling the zone to bring up a sidebar, and then click on Open Dashboard. The user is presented with a list of hosts inside the selected zone, embedded with information on each, such as general health and KPI metrics. This list is organizable by the information type, and the user can drill into individual hosts to see entity specific data.

infrastructure 4

Details of monitored hosts are available for both custom and auto discovered zones. However, for unmonitored hosts, only the hostname and IP are available.

Zones are automatically recognized if possible (e.g. AWS availability zone) and can be set by configuring the agent - zones.