Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • wait until there is need for eviction
  • lock head lock
    • go through list until the required eviction size is reached
    • change head to new first element
  • unlock head lock
  • subtract evicted size from total buffer size
  • we assume here that eviction will never empty the buffer, so it does not need to acquire the tail lock

Circular Buffer

Another idea involves a preallocated circular linked list that also uses sentinel nodes for the first and last element, namely head and tail. The eviction is done implicitly by overriding old elements when the buffer becomes full. The drawback is that this approach can only be used with the limiting strategy depending on the number of elements. The figure below explains the approach.

Image Added

Realization

Class diagram for a lock-free RingBuffer with separate eviction and analysis threads (Approach 2):

...

Microbenchmarking with Caliper

As discussed inthe meeting 2012-10-25 Ringbuffer status update, the results for the lock-free enqueue, lock based dequeue and evict are not always explainable, because the lock-based dequeue and evict is faster than the non locking approach in some cases. We decided to go for microbenchmarking [6], which ismeasures the performance of a very small piece of code. The results, however, were not very conclusive. The lock based approach is definitely the most inefficient, but the other two implementations, the lock free and lock based dequeue and evict, usually give the same results. This can be explained by the rare occurence of the dequeueing operation. Even if it uses a lock or not, it does not happen often enough to have an impact on the overall performace.

Below some results of benchmarking with Caliper are presented. We tried different capacities, different sending intervals, different number/size of elements added, but the results don't give more insight into which method is better.

 

Code Block
collapsetrue
0% Scenario{vm=java, trial=0, benchmark=CompletelyLockFree, bufferCapacity=10000, numElementsToAdd=400000, numThreads=16} 2502033754.00 ns; s=815687.04 ns @ 3 trials

17% Scenario{vm=java, trial=0, benchmark=LockFreeEnqueue, bufferCapacity=10000, numElementsToAdd=400000, numThreads=16} 2501636021.00 ns; s=645694.28 ns @ 3 trials

33% Scenario{vm=java, trial=0, benchmark=CompletelyLockFree, bufferCapacity=10000, numElementsToAdd=400000, numThreads=32} 2503686894.00 ns; s=620891.89 ns @ 3 trials

50% Scenario{vm=java, trial=0, benchmark=LockFreeEnqueue, bufferCapacity=10000, numElementsToAdd=400000, numThreads=32} 2502442691.00 ns; s=606238.69 ns @ 3 trials

67% Scenario{vm=java, trial=0, benchmark=CompletelyLockFree, bufferCapacity=10000, numElementsToAdd=400000, numThreads=64} 2512078430.00 ns; s=24169268.96 ns @ 7 trials

83% Scenario{vm=java, trial=0, benchmark=LockFreeEnqueue, bufferCapacity=10000, numElementsToAdd=400000, numThreads=64} 2702248773.00 ns; s=114828734.41 ns @ 10 trials

numThreads          benchmark    s linear runtime

        16 CompletelyLockFree 2.50 ===========================

        16    LockFreeEnqueue 2.50 ===========================

        32 CompletelyLockFree 2.50 ===========================

        32    LockFreeEnqueue 2.50 ===========================

        64 CompletelyLockFree 2.51 ===========================

        64    LockFreeEnqueue 2.70 ==============================

vm: java

trial: 0

bufferCapacity: 10000

numElementsToAdd: 400000

dequeue every 500ms

 

0% Scenario{vm=java, trial=0, benchmark=CompletelyLockFree, bufferCapacity=20000, numElementsToAdd=800000, numThreads=16} 2503758676.00 ns; s=581886.19 ns @ 3 trials

17% Scenario{vm=java, trial=0, benchmark=LockFreeEnqueue, bufferCapacity=20000, numElementsToAdd=800000, numThreads=16} 2504004679.00 ns; s=24777692.23 ns @ 6 trials

33% Scenario{vm=java, trial=0, benchmark=CompletelyLockFree, bufferCapacity=20000, numElementsToAdd=800000, numThreads=32} 2507706517.00 ns; s=5005925.11 ns @ 3 trials

50% Scenario{vm=java, trial=0, benchmark=LockFreeEnqueue, bufferCapacity=20000, numElementsToAdd=800000, numThreads=32} 2577092776.50 ns; s=58717140.01 ns @ 10 trials

67% Scenario{vm=java, trial=0, benchmark=CompletelyLockFree, bufferCapacity=20000, numElementsToAdd=800000, numThreads=64} 4825908109.00 ns; s=40336488.36 ns @ 3 trials

83% Scenario{vm=java, trial=0, benchmark=LockFreeEnqueue, bufferCapacity=20000, numElementsToAdd=800000, numThreads=64} 4668499674.50 ns; s=457395064.90 ns @ 10 trials

numThreads          benchmark    s linear runtime

        16 CompletelyLockFree 2.50 ===============

        16    LockFreeEnqueue 2.50 ===============

        32 CompletelyLockFree 2.51 ===============

        32    LockFreeEnqueue 2.58 ================

        64 CompletelyLockFree 4.83 ==============================

        64    LockFreeEnqueue 4.67 =============================

vm: java

trial: 0

bufferCapacity: 20000

numElementsToAdd: 800000

 

 

Therefore, we opted for the completely lock free implementation to be used in the inspectIT agent. The functionality tests and integration test presented in the next section concern this approach.

Functionality and Integration Testing

...

The tests were implemented using TestNG [7] and are used to thoroughly test the completely lock free implementation, to ensure that it is ready to be used for the agent implementation.

The functionality tests aim to ensure that all the units in the RingBuffer work correctly. The tests include:

  • initialisation
  • adding one element
  • object size calculation
  • trying to add a negative sized element
  • maintaining the adding order of elements at dequeue time
  • dequeueing
  • two consecutive dequeues
  • checking if the eviction starts once the buffer is full

Each tests checks if the size and shouldEvict have the expected values.

For the limiting strategy, a mocking framework was used, Mockito [8].

The integration tests check if everything works as expected when multiple threads add elements to a buffer, while one thread takes all the elements and another one deals with the eviction. The tests are run for a longer period of time (e.g. 1min) with variable number of threads, and at the end we check if there are any elements lost in the process, i.e. if the number/size of elements added is the same as the sum of the number/size of elements dequeued, evicted and remaining in the buffer.

Building and Running

The projects can be built using the eclipse compiler. The Ivy dependency management system takes care of all the dependencies, with the exception of TestNG (should be on the build path via the Eclipse plugin).

The functional and integration test configurations can be found in the corresponding xml files, which can simply be run from Eclipse.

Running the caliper benchmarks can de done by launching the class BenchmarkRunner with the input parameters given as example in the file.

References

[1] Maurice Herlihy, Nir Shavit, "The Art of Multiprocessor Programming, Revised Reprint"
[2] Maged M. Michael, Michael L. Scott, "Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms"
[3] ConcurrentLinkedQueue implementation in Java
[4] CMR buffer specifications
[5] Calculation of object sizes

[6] Microbenchmarking with Caliper

[7] TestNG

[8] Mockito