Services not responsive enough under certain load

Description

Just had a talk with . He presented me a problem in reading the data from the CMR leads to some non-responsive service calls after some time..

His setup was to every 5 seconds do the following:

  1. get overview on all agents

  2. get details of each connected agent (in his case only 1)

  3. load invocation sequence overview for last 5 seconds

We go the thread dump during such non-responsive calls and what can be seen is that the "problem" most likely lies in threads with prefix btpool0-. This is the Jetty pool for answering to HTTP requests. Funny thing is that the thread numbers are always rising, so thread dump also shows the thread with number btpool0-117.. Since it might be pool with min-max threads this can be OK, but still..

Also notice the block on the socketRead0() method.. We did another thread dump few minutes later and thread btpool0-117 was still running in this method. This is very strange.

This should be easy to reproduce. Also think we should also be careful that this is happening when CMR is running in Windows.

Environment

Windows 7

Activity

Show:
Ivan Senic
September 7, 2015, 2:57 PM

I was able to reproduce this problem finally.. Seams it happens when many HTTP requests are fired from our UI.. We most likely did not notice it since we can not fire manually during work so much.. But after just a few tens of requests fired one after another the problem appears.. It seams like something hang on the CMR as the request will finally return but times are increasing heavily ~ 1min..

I researched on the Net about this thread dump and what can it mean, here they had similar problem (http://stackoverflow.com/questions/12740741/embedded-jetty-server-hangs) and answer was that the underlying connection isn't closed/released. So I went back to our Kryo HTTP exporter and executor and find out that we are not closing the input and output streams in the write and read remote invocation methods.. As the super implementation does it, I assumed it can be the reason for the connection not to be released as it waits until stream is closed maybe..

Anyway, I added try with resource and now it seams to be fixed..

Stefan Siegl
September 8, 2015, 10:25 AM

From your argumentation I increased the priority of this bug.

Assignee

Ivan Senic

Reporter

Ivan Senic

Labels

None

Integrator

Patrice Bouillet

Components

Sprint

None

Fix versions

Affects versions

Priority

Highest
Configure