Garbage Collection in inspectIT

At the moment inspectIT CMR uses CMS (Concurrent Mark Sweep) garbage collector. This collector was tuned by the inspectIT team to perform as best as possible. When tuned, the settings were aligned toward the buffer CMR has and a fact that we expect high amount of long living data. However, the tuned garbage collection was never deeply tested and there was no confirmation about it's performance. Because of this, there was need to perform heavy testing of the current collector.

In addition there was an intention to compare the current collector with the G1 garbage collector, that is available in Java 7 (also available in Java 6, but should be improved in newer versions of JVMs).

Test Setup

Two machines where used for the testing:

  1. QuadCore, 16GB RAM, 160GB disc 
  2. QuadCore, 8GB RAM, 160GB disc

On the first machine the inspectIT CMR, occupying the total of 8GB heap space.

The second machine was used to generate load. Since we wanted to have quite a high load we could not use the DVD Store, since monitoring of it does not produce too much data and buffer on the CMR gets full very slow. Instead we used the Invoker application. For each test we started the three Invoker processes, each generating load with 10 threads.

CMS

The inspectIT CMR currently has following JVM settings (adapted for 8GB heap size):

-Xms8192m -Xmx8192m -Xmn2048M -XX:MaxPermSize=192m -XX:PermSize=128m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:SurvivorRatio=5 -XX:TargetSurvivorRatio=90 -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:+UseBiasedLocking -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -server -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -Xloggc:logs/gc.log

From above we can see that there is total of 8GB of heap space, where 6GB belong to old generation, while 2GB are reserved to new generation. In the new generation the ration between eden and survivor space is 1:5.

The interesting and very important options are -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly. They basically define that the concurrent collection should only start when the occupancy of the old generation is above 80%. Since the buffer in the CMR is by default sized from 55%-65% of the old generation, we wanted to make sure that once the concurrent collection start there will be a significant amount of data that is being able for garbage collection.

The test with the described settings showed following results:

ThroughputFreed MB/MinAvg PauseMax PauseFull GC
~93%16011.12s2.06 snein

We were quite surprised with the results, they seamed to be very good. 

G1

The first task with using the G1 was to find the best configuration. We tested many different ones, starting from complete default to changing parameter by parameter. The default configuration showed worst results and with the improvement brought by our second configuration we were sure that more tuning will bring G1 performance close to the CMS. However, we were not able to do so. The throughput was constantly way under 90%. In addition we noticed that all tuning that we do bring very little change in the results and from that we concluded that is quite hard to change the behavior of G1 collector.

The table below summarize the settings and results of G1 collector:

Region SizeReserve PercentHeap Occupancy PercentMax GC PausePause IntervalNew RatioSurvivor RatioThresholdGC ThreadsThroughputFreed MB/MinAvg PauseMax PauseFull GC
DefaultDefaultDefaultDefaultDefaultDefaultDefaultDefaultDefault64.2619115.67 s29.5 sja
16 MB20DefaultDefaultDefaultDefaultDefaultDefaultDefault86.3223340.77 s2.61 snein
Default25Default2 sDefaultDefaultDefaultDefaultDefault88.0723670.7 s4.13 snein
32 MB20DefaultDefaultDefaultDefaultDefaultDefaultDefault84.8325010.87 s19.42 sja
32 MB20Default1 sDefaultDefaultDefaultDefaultDefault86.7623560.8 s4.02 snein
8 MB2001 sDefaultDefaultDefaultDefaultDefault86.5922280.51 s3.74 snein
8 MB20Default1 sDefaultDefaultDefaultDefaultDefault88.2323340.76 s3.86 snein
4 MB20751 s3 sDefaultDefaultDefault476.9320235.14 s24.91ja
4 MB2041 s3 sDefaultDefaultDefault485.1420780.55 s3.16 snein
Default15Default2 sDefault1DefaultDefaultDefault76.4121462.12 s32.65ja
Default15Default2 sDefault355Default88.0223440.84 s3.25 snein
Default15Default2 sDefault35DefaultDefault87.5323960.715.14nein

We concluded that the following G1 configuration seamed to be the best:

-Xms8192m -Xmx8192m -XX:MaxPermSize=192m -XX:PermSize=128m -XX:+UseG1GC -XX:MaxGCPauseMillis=2000 -XX:G1HeapRegionSize=8 -XX:G1ReservePercent=15 -XX:NewRatio=3 -XX:SurvivorRatio=5 -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:+UseBiasedLocking -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -server -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -Xloggc:logs/gc.log

 

24h Tests

After fighting to find the best G1 configuration, we decided to take a step forward and test the CMS and the selected "best" G1 configuration in a 24h tests. The setup was the same, except that we decreed to 2 Invoker processes.

After the machines have been working for 2 complete days, we got the following results:

 Total Run TimeThroughputFreed MB/MinNumber of PausesAvg PauseMax PauseFull GCNumber of Full GCAvg Full GCMax Full GC
CMS23h30m5s95.99%103328311.19s4.43snein---
G123h42m36s84.04%1632166930.81s23.74sja4119.82s23.74s

From the above it is clear that the CMS heavily defeated the G1. Not only that the throughput was much higher. but CMS was fast enough not to make Full Garbage Collection needed. On the other hand, seams like G1 was to slow and 41 "stop-of-the-world" pauses occurred, each lasting for 20 seconds in average.

The difference in number of pauses is also high and it's not easy to conclude why G1 needs almost 6 times more pauses.

Here are the graphs showing the difference in the garbage collection activity:

CMSG1

Very interesting thing is anomaly in the CMS between 5h-5h30m, because as it is visible in the graph the GC pauses in that period took much more time that the usual ones. Still it is not known why.

Conclusion

One of the conclusions is that the current JVM settings for the inspectIT CMR are absolutely OK. The 95% throughput is very high, maybe even highest possible. Thus, we should continue using these settings.

Regarding the G1, we have to say that we were disappointed. We expected much better results, especially because one of the marketing tones of G1 is that is made for heaps over 6GB. It stayed unclear to us, why is the G1 slow enough to make the Full GCs occur in some point.