Tracing

Motivation

The usual Business Enterprise Application are running on more that one machine (virtual machine) because they are spitted into several parts (for example FrontEndServer, BackEndServer). inspectIT can monitore each part on its own. That means you are able to see invocation sequences for the FrontEndServer and you can see invocation sequences for the BackEndServer. But you can not combine them in one big trace, because you do not know which one fits together. It is natural to combine these sequences automatically. If you are able to do this, you can see a whole transaction on different machines. This page describes how inspectIT implements remote JVM tracing.

Design

There are significant design decisions that have been made for implementing the remote tracing in inspectIT:

we will continue to use our sensors approach for inserting / reading the tracing information.
inspectIT will implement the opentracing.io API, as it's becoming a standard in tracing. Also we want to provide more flexibility to the users already using the opentracing.io API and option to switch to inspectIT.
inspectIT will provide it's opetracing.io API implementation as the part of the agent SDK
internally the inspectIT sensors will use the new tracer implementation with the functionality that might not be available in the public SDK implementation

Concepts

In order to create best concept for remote tracing we analyzed all the available tracing libraries (Zipkin, Brave, Apache HTrace). We created model that fits our needs bests, but took concept and best ideas from already available libraries, especially the ones defined by opentracing.io. We also analyzed the work done in the [OLD] JVM Crossing Invocation Sequences done by Thomas Kluge and concluded that his approach can not be taken due to imposed limitations:

No correlation between remote data instances, invocation was considered as a glue
No information where trace start (root unknown)
No differentiating between types of relationships
No support for “in-process tracing”
Questionable design

We adopted the Span as the main entity in the tracing representation. Trace is effectively a collection of spans, where each span has it's parent (except root) and reference type to it's parent (child of or follow from). All spans belonging to same trace have same trace ID, but different span ID. Parent ID defines the relationships. We don't keep order of child spans invocations, but we can relate to the start time-stamp if needed. All IDs are 64bit long values, and each span creation we need only one random long generation (the root spans will have same long value for id, trace id and parent id). This concept enables us to:

With span identification can define which Span is root one
All Spans in one trace have same trace ID, thus are correlated
Every span holds duration + time-stamp, thus independent wrt timings
Reference spans (client → server) on both sides of a call, identifies where call occurred, provides timing information from both client and server view
Tags provide additional info (URL, message destination, etc)

Spans are independent to other inspectIT data entities, but invocation sequences do note to which span they belong to (by remembering it's identification) This basically means that we have spans for high level overview, collection of invocations referring to one span for more detailed view and finally data inside invocation for even more details. This design decision allows us to display trace overview and spans without any other inspectIT data types. Only for deep-down drill we need to load invocations that are referring to the trace/span. With this approach we can always answer the following questions:

Spans

Give me all root spans
Give me all spans that belong to following trace
Give me a parent of a span
Give me children of a span

Invocations
- What span I belong to
- What remote calls I make
- Give me all spans that are in the same trace as I am
- Give me all available invocations that are in the same trace as I am

Relation to opentracing.io

Most of the concepts related to tracing are taken from opentracing.io, thus please refer first to their Concepts and Terminology. In addition to the above concepts, we made several decisions with respect to spans:

opentracing.io spans are not perfect fit for inspectIT, as are missing information on which agent the span was created, what's the method id or sensor id. Thus, opentracing.io like spans are transformed to the ones that extend the MethodSensorData class in inspectIT and span context is transformed to the SpanIdent instances. These inspectIT-like spans are not available in the SDK project, but only in the inspectIT shared projects.
opentracing.io allows the concrete implementation to define what are relation between spans making the call on one JVM and receiving the call on the other JVM. Many libraries chose to have same identification ids for both span calling (client) and its called span (server). We decided to have normal child-parent relationships (server span is child or client span), as it can occur that one call is making more spans on the receiving end (for example internally forwarded HTTP request).
opentracing.io does not force implementation to provide sampling rate for tracing frequency. For now inspectIT does not have any sampler and is tracing every single request. In future this must change in order to decrease the amount of captured data on the agents.
developed tracer is thread context aware, this means that it remembers the order of created spans per thread. Thus when new span is created by default it parent will be the latest created and not finished span by that thread.

Remote sensors

In oder to make design flexible and support easy addition of support for the new remote communication frameworks on the agent we developed set of remote sensors. Here following rules apply:

There are client and server remote sensors, so we can differentiate where is remote call made / received
One remote sensor is created for each representing framework we want implement (for example RemoteClientApacheHttpRemoteSensor is responsible for the HTTP client requests made with Apache HTTP client).
There is only one remote client hook and one remote server hook, they are reused by all remote sensors
Client remote hook uses ClientInterceptor to intercept client request
Server remote hook uses ServerInterceptor to intercept server request
Client and server interceptors are independent of the remote technology/framework and are responsible of creating spans using the inspectIT tracer (that implements opentracing.io tracer interface)
In order to be independent, interceptors require to be invoked with the request/response adapters that provide information about the request (url, http status, jms message destination) and provide opentracing.io carriers that can transfer span context information beyond the JVM boundaries
Remote sensors are responsible for initializing the remote hooks with correct adapters, based on the framework they are created for (RemoteClientApacheHttpRemoteSensor will provide the HttpRequestAdapter that uses Apache HttpRequest for reading the data and transferring the IDs in the headers)

With this design approach it's easy to add, for example, the implementation for a new HTTP client framework. The only thing needed is basically implementation of the HttpRequest that corresponds to that specific framework and does the actual information reading, headers setting, etc. Everything else is already available via the ClientInterceptor and HttpRequestAdapter/HttpResponseAdapter.

Flow example

Link: http://prezi.com/kuvkl0vddepu/?utm_campaign=share&utm_medium=copy

Implementation

SDK project

To support implementation of the opentracing.io new project has been created: inspectit.agent.java.sdk. This project has only dependency to opentracing.io API and Noop tracer implementation.

We need to push the artifact of this project to the maven repository with the correctly defined pom in future, so that users can include it when they want to make their own spans are part of a trace.

Supported technologies and frameworks

Currently we support two technologies for the remote communication: HTTP and JMS. Here is the list of the framework related implementations available:

Technology	Type	Framework	Version(s)	Comment
HTTP	Server	Java Servlet	All	Does not support response code reading in version 2.0 or less.
HTTP	Client	Apache Http Client	4.x	Not including the asynchronous client library.
HTTP	Client	Jetty Http Client	6.x-8.x
HTTP	Client	URL Connection	java 1.6, 1.7 & 1.8 (1.9)
HTTP	Client	Spring Rest Template	3.x - 4.x	SRT is a wrapper. We are independent of the underlying client used.
MQ	Server / Client	JMS	1.x -

UI adaptations

Since tracing requires non-agent focused UI, we needed to make a compromise and create an intermediate state. This means that tracing view is agent-less view and as such must not be in the data explorer tree. We decided that for now access to the tracing view is via the data explorer toolbar.

Develop