Page Comparison

...

The following diagram shows how the enqueueing and dequeueing are performed without using locks.

The dequeue operation takes all the elements in the buffer, while enqueue adds one element.The first node in the buffer is always a dummy node with a null object. The tail is an atomic reference to the last element in the buffer. In this example, there is no need for an atomic reference to the first element in the buffer, as there is only one sending thread that can dequeue elements and thus change the dummy node's next element.

Image RemovedImage Added

However, it is impossible to implement the eviction functionality within this approach without adding a lock. This lock can be held by dequeue and evict, so that these two operations cannot happen at the same time. Nevertheless, the enqueue is lock-free, thus should be very fast.

A completely lock-free approach is described below.

Image Added

The third approach is lock-based and uses two locks:

tail lock - used by enqueue, dequeue
head lock - used by dequeue, evict

Next, the three operations performed on the buffer in this lock-based implementation are described step by step.

Enqueue:

allocate node, compute and set size
lock tail lock
- link node to tail
- swing tail to new node
unlock tail lock
change buffer size
signal eviction if buffer is full

Dequeue:

if buffer is empty, return
lock tail lock
lock head lock
- get the reference of the first element
- swing head and tail to dummy node
- change size to 0
unlock head lock
unlock tail lock
go through buffer, add objects to list
return list of dequeued nodes

Evict:

wait until there is need for eviction
lock head lock
- go through list until the required eviction size is reached
- change head to new first element
unlock head lock
subtract evicted size from total buffer size
we assume here that eviction will never empty the buffer, so it does not need to acquire the tail lock

Testing

Issues that need to be checked:

...

1 sending thread
- loop:
  - sleeps 5s
  - dequeues all the elements in the buffer
  - prints them
n threads that enter data:
- n = 2, 4, 108, 100 ..., 256
- loop that adds elements
- no sleeping between enqueues to make it fast
- limiting the size by number of elements:
  - use random Floats as data
  - object size not important, only the number of elements
- limiting the size by memory
  - use Strings with variable length

The threads run for some time (e.g. 1 minute) and then are interrupted and the statistics about total number of elements added, retrieved and evicted are printed out.

Results

Comparing the number of elements added in 1min for the three implementations (see Figure below), it was observed that the lock-based implementation is the most inefficient, while the other two implementation give similar results, with some variations for smaller number of threads, where the one that uses a lock for dequeue and eviction proves to be better. The tests use the "Limit by number" strategy for the buffer size.

Image Added

References

[1] Maurice Herlihy, Nir Shavit, "The Art of Multiprocessor Programming, Revised Reprint"
[2] Maged M. Michael, Michael L. Scott, "Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms"
[3] ConcurrentLinkedQueue implementation in Java
[4] CMR buffer specifications
[5] Calculation of object sizes

...

Versions Compared

Old Version 7

New Version 8

Key

Testing

Results

References