Page Comparison

...

threads put data to this buffer "at the end" (measurements that are gathered and being sent to the CMR)
ONE sending thread (s) take takes data from "the beginning" of this buffer and send sends the data to the CMR
the sending thread takes all the elements it finds in the buffer

As we are dealing with a highly multi-threaded environment, the following requirements for the ringbuffer are important:

...

implementing the concurrent access, put enqueue and take dequeue operations.
implementing the eviction of old elements when the buffer has reached its size limit

...

using two locks, one for queing and one for dequeueing
livelock free
but high contention

Both locking and lock-free approaches will be implemented in order to find the best approach.

Approach for eviction of old elements

similar to CMR atomic buffer implementation [4]
use AbstractObjectSizes class for calculating object sizes [5]
separate thread for analyzing and calculating the buffer size
separate thread for eviction operation by FIFO policy; this operation has to be suspended while the size of all objects in the buffer is smaller than the buffer maximum size
in the analysis thread, signal the eviction thread if needed
use conditions for suspending and activating the threads that work on eviction and analysis

The object size can also be computed when enqueueing an element instead of on a different thread. Both approaches introduce some overhead: size calculation on a separate thread means locking and context switching, while synchronous size calculation means slowing down the enqueue operation, which has to be very fast. Nevertheless, it is hard to say which approach is faster, so both of them will be implemented.

Therefore, we have the following approaches that have to be implemented and compared:

Size Calculation / Locking	Lock-based	Lock-free
Asynchronous (separate thread)	1	2
Synchronous (at enqueue)	3	4

Realization

Class diagram for a lock-free RingBuffer with separate eviction and analysis threads (Approach 2):

The following diagram shows how the enqueueing and dequeueing are performed without using locks.

The dequeue operation takes all the elements in the buffer, while enqueue adds one element.

The first node in the buffer is always a dummy node with a null object. The tail is an atomic reference to the last element in the buffer. In this example, there is no need for an atomic reference to the first element in the buffer, as there is only one sending thread that can dequeue elements and thus change the dummy node's next element.

Image Added

Testing

...

Issues that need to be checked:
Performance
measure time for different number of threads
compare the different approaches
Correctness (harder to check)
linearizability
thread safety
no deadlocks
no livelocks
The general structure of a test application for RingBuffer has:
1 sending thread
loop:
sleeps 5s
dequeues all the elements in the buffer
prints them
n threads that enter data:
n = 2, 10, 100 ...
loop that adds elements
no sleeping between enqueues to make it fast
limiting the size by number of elements:
use random Floats as data
object size not important, only the number of elements
limiting the size by memory
use Strings with variable length

References

[1] Maurice Herlihy, Nir Shavit, "The Art of Multiprocessor Programming, Revised Reprint"
[2] Maged M. Michael, Michael L. Scott, "Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms"
[3] ConcurrentLinkedQueue implementation in Java
[4] CMR buffer specifications
[5] Calculation of object sizes

...

Versions Compared

Old Version 6

New Version 7

Key

Approach for eviction of old elements

Realization

Testing

References