Just had a talk with . He presented me a problem in reading the data from the CMR leads to some non-responsive service calls after some time..
His setup was to every 5 seconds do the following:
get overview on all agents
get details of each connected agent (in his case only 1)
load invocation sequence overview for last 5 seconds
We go the thread dump during such non-responsive calls and what can be seen is that the "problem" most likely lies in threads with prefix btpool0-. This is the Jetty pool for answering to HTTP requests. Funny thing is that the thread numbers are always rising, so thread dump also shows the thread with number btpool0-117.. Since it might be pool with min-max threads this can be OK, but still..
Also notice the block on the socketRead0() method.. We did another thread dump few minutes later and thread btpool0-117 was still running in this method. This is very strange.
This should be easy to reproduce. Also think we should also be careful that this is happening when CMR is running in Windows.
I was able to reproduce this problem finally.. Seams it happens when many HTTP requests are fired from our UI.. We most likely did not notice it since we can not fire manually during work so much.. But after just a few tens of requests fired one after another the problem appears.. It seams like something hang on the CMR as the request will finally return but times are increasing heavily ~ 1min..
I researched on the Net about this thread dump and what can it mean, here they had similar problem (http://stackoverflow.com/questions/12740741/embedded-jetty-server-hangs) and answer was that the underlying connection isn't closed/released. So I went back to our Kryo HTTP exporter and executor and find out that we are not closing the input and output streams in the write and read remote invocation methods.. As the super implementation does it, I assumed it can be the reason for the connection not to be released as it waits until stream is closed maybe..
Anyway, I added try with resource and now it seams to be fixed..
From your argumentation I increased the priority of this bug.