...
Date | Focus | Link |
---|---|---|
13.09.2012 | Initial discussion of the requirements and possible approaches | 2012-09-13 Ringbuffer meeting/wiki/spaces/ARCHIVE/pages/5019325 |
21.09.2012 | Discussion of approaches2012-09-21 Ring-buffer | /wiki/spaces/ARCHIVE/pages/5019319 |
29.09.2012 | Current status, next steps: Create sample realization2012-09-29 Ringbuffer status update | /wiki/spaces/ARCHIVE/pages/5019311 |
25.10.2012 | Current status, next steps: caliper, another approach2012-10-25 Ringbuffer status update | /wiki/spaces/ARCHIVE/pages/5019155 |
Requirements
The ringbuffer will replace the buffering strategy in the agent completely, while the sending strategy will be kept, as it defines when data should be sent.
The ringbuffer uses a FIFO policy:
...
- wait until there is need for eviction
- lock head lock
- go through list until the required eviction size is reached
- change head to new first element
- unlock head lock
- subtract evicted size from total buffer size
- we assume here that eviction will never empty the buffer, so it does not need to acquire the tail lock
Circular Buffer
Another idea involves a preallocated circular linked list that also uses sentinel nodes for the first and last element, namely head and tail. The eviction is done implicitly by overriding old elements when the buffer becomes full. The drawback is that this approach can only be used with the limiting strategy depending on the number of elements. The figure below explains the approach.
Realization
Class diagram for a lock-free RingBuffer with separate eviction and analysis threads (Approach 2):
...
Comparing the number or size of elements added in 1min for the three implementations (see Figures below), it was observed that the lock-based implementation is the most inefficient, while the other two implementation give similar results, with slightly better results for the completely lock-free implementation, as expected. The figures are based on the average values of three runs for each case.
Microbenchmarking with Caliper
As discussed inthe meeting /wiki/spaces/ARCHIVE/pages/5019155, the results for the lock-free enqueue, lock based dequeue and evict are not always explainable, because the lock-based dequeue and evict is faster than the non locking approach in some cases. We decided to go for microbenchmarking [6], which measures the performance of a very small piece of code. The results, however, were not very conclusive. The lock based approach is definitely the most inefficient, but the other two implementations, the lock free and lock based dequeue and evict, usually give the same results. This can be explained by the rare occurence of the dequeueing operation. Even if it uses a lock or not, it does not happen often enough to have an impact on the overall performace.
Below some results of benchmarking with Caliper are presented. We tried different capacities, different sending intervals, different number/size of elements added, but the results do not give more insight into which method is better.
Code Block | ||
---|---|---|
|
Functionality and Integration Testing
...
0% Scenario{vm=java, trial=0, benchmark=CompletelyLockFree, bufferCapacity=10000, numElementsToAdd=400000, numThreads=16} 2502033754.00 ns; s=815687.04 ns @ 3 trials
17% Scenario{vm=java, trial=0, benchmark=LockFreeEnqueue, bufferCapacity=10000, numElementsToAdd=400000, numThreads=16} 2501636021.00 ns; s=645694.28 ns @ 3 trials
33% Scenario{vm=java, trial=0, benchmark=CompletelyLockFree, bufferCapacity=10000, numElementsToAdd=400000, numThreads=32} 2503686894.00 ns; s=620891.89 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=LockFreeEnqueue, bufferCapacity=10000, numElementsToAdd=400000, numThreads=32} 2502442691.00 ns; s=606238.69 ns @ 3 trials
67% Scenario{vm=java, trial=0, benchmark=CompletelyLockFree, bufferCapacity=10000, numElementsToAdd=400000, numThreads=64} 2512078430.00 ns; s=24169268.96 ns @ 7 trials
83% Scenario{vm=java, trial=0, benchmark=LockFreeEnqueue, bufferCapacity=10000, numElementsToAdd=400000, numThreads=64} 2702248773.00 ns; s=114828734.41 ns @ 10 trials
numThreads benchmark s linear runtime
16 CompletelyLockFree 2.50 ===========================
16 LockFreeEnqueue 2.50 ===========================
32 CompletelyLockFree 2.50 ===========================
32 LockFreeEnqueue 2.50 ===========================
64 CompletelyLockFree 2.51 ===========================
64 LockFreeEnqueue 2.70 ==============================
vm: java
trial: 0
bufferCapacity: 10000
numElementsToAdd: 400000
dequeue every 500ms
0% Scenario{vm=java, trial=0, benchmark=CompletelyLockFree, bufferCapacity=20000, numElementsToAdd=800000, numThreads=16} 2503758676.00 ns; s=581886.19 ns @ 3 trials
17% Scenario{vm=java, trial=0, benchmark=LockFreeEnqueue, bufferCapacity=20000, numElementsToAdd=800000, numThreads=16} 2504004679.00 ns; s=24777692.23 ns @ 6 trials
33% Scenario{vm=java, trial=0, benchmark=CompletelyLockFree, bufferCapacity=20000, numElementsToAdd=800000, numThreads=32} 2507706517.00 ns; s=5005925.11 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=LockFreeEnqueue, bufferCapacity=20000, numElementsToAdd=800000, numThreads=32} 2577092776.50 ns; s=58717140.01 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=CompletelyLockFree, bufferCapacity=20000, numElementsToAdd=800000, numThreads=64} 4825908109.00 ns; s=40336488.36 ns @ 3 trials
83% Scenario{vm=java, trial=0, benchmark=LockFreeEnqueue, bufferCapacity=20000, numElementsToAdd=800000, numThreads=64} 4668499674.50 ns; s=457395064.90 ns @ 10 trials
numThreads benchmark s linear runtime
16 CompletelyLockFree 2.50 ===============
16 LockFreeEnqueue 2.50 ===============
32 CompletelyLockFree 2.51 ===============
32 LockFreeEnqueue 2.58 ================
64 CompletelyLockFree 4.83 ==============================
64 LockFreeEnqueue 4.67 =============================
vm: java
trial: 0
bufferCapacity: 20000
numElementsToAdd: 800000 |
Therefore, we opted for the completely lock free implementation to be used in the inspectIT agent. The functionality tests and integration test presented in the next section concern this approach.
Functionality and Integration Testing
The tests were implemented using TestNG [7] and are used to thoroughly test the completely lock free implementation, to ensure that it is ready to be used for the agent implementation.
The functionality tests aim to ensure that all the units in the RingBuffer work correctly. The tests include:
- initialisation
- adding one element
- object size calculation
- trying to add a negative sized element
- maintaining the adding order of elements at dequeue time
- dequeueing
- two consecutive dequeues
- checking if the eviction starts once the buffer is full
Each tests checks if the size and shouldEvict have the expected values.
For the limiting strategy, a mocking framework was used, Mockito [8].
The integration tests check if everything works as expected when multiple threads add elements to a buffer, while one thread takes all the elements and another one deals with the eviction. The tests are run for a longer period of time (e.g. 1min) with variable number of threads, and at the end we check if there are any elements lost in the process, i.e. if the number/size of elements added is the same as the sum of the number/size of elements dequeued, evicted and remaining in the buffer.
Building and Running
The projects can be built using the eclipse compiler. The Ivy dependency management system takes care of all the dependencies, with the exception of TestNG (should be on the build path via the Eclipse plugin).
The functional and integration test configurations can be found in the corresponding xml files, which can simply be run from Eclipse.
Running the caliper benchmarks can de done by launching the class BenchmarkRunner with the input parameters given as example in the file.
References
[1] Maurice Herlihy, Nir Shavit, "The Art of Multiprocessor Programming, Revised Reprint"
[2] Maged M. Michael, Michael L. Scott, "Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms"
[3] ConcurrentLinkedQueue implementation in Java
[4] CMR buffer specifications
[5] Calculation of object sizes
[6] Microbenchmarking with Caliper
[7] TestNG
[8] Mockito