Skip to content

Commit

Permalink
Document false sharing
Browse files Browse the repository at this point in the history
  • Loading branch information
polytypic committed Aug 15, 2023
1 parent 9bce842 commit e6c2542
Showing 1 changed file with 124 additions and 0 deletions.
124 changes: 124 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ is distributed under the [ISC license](LICENSE.md).
- [Understanding transactions](#understanding-transactions)
- [A three-stack lock-free queue](#a-three-stack-lock-free-queue)
- [A rehashable lock-free hash table](#a-rehashable-lock-free-hash-table)
- [Avoid false sharing](#avoid-false-sharing)
- [Beware of torn reads](#beware-of-torn-reads)

## A quick tour
Expand Down Expand Up @@ -1984,6 +1985,129 @@ What we have here is a lock-free hash table with rehashing that should not be
highly prone to starvation. In other respects this is a fairly naive hash table
implementation. You might want to think about various ways to improve upon it.

### Avoid false sharing

False sharing is a form of contention that arises when some location, that is
being written to by at least a single core, happens to be in memory next to
— within the same cache line aligned block of memory — another
location that is accessed, read or written, by other cores.

Perhaps contrary to how it is often described, false sharing doesn't require the
use of atomic variables or atomic instructions. Consider the following example:

```ocaml
# type state = { mutable counter: int; mutable finished: bool }
type state = { mutable counter : int; mutable finished : bool; }
# let state = { counter = 1_000; finished = false }
val state : state = {counter = 1000; finished = false}
# let reader = Domain.spawn @@ fun () ->
while not state.finished do
Domain.cpu_relax ()
done
val reader : unit Domain.t = <abstr>
# while 0 < state.counter do
state.counter <- state.counter - 1
done;
- : unit = ()
# state.finished <- true;
- : unit = ()
# Domain.join reader
- : unit = ()
```

The `state` is a record with two fields, `counter` and `finished`, next to each
other, which makes it rather likely for them to happen to reside in the same
cache line aligned block of memory. The main domain repeatedly mutates the
`counter` field and the other domain repeatedly reads the `finished` field. What
this means in practice is that the reads of the `finished` field by the other
domain will be very expensive, because the cache is repeatedly invalidated by
the `counter` updates done by the main domain.

The above example is contrived, of course, but this sort of false sharing can
happen very easily. Cache lines are typically relatively large &mdash; 8, 16, or
even 32 words wide. Typically many, if not most, heap allocated objects in OCaml
are smaller than a cache line, which means that false sharing may easily happen
even between seemingly unrelated objects.

To completely avoid false sharing one would basically need to make sure that
mutable locations (atomic or otherwise) are not allocated next to locations that
might be accessed from other domains. Unfortunately, that is difficult to
achieve without being expensive in itself as it tends to increase memory usage
and the amount of initializing stores.

The
[`Loc.make`](https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Loc/index.html#val-make)
function to create a shared memory location takes an optional `padded` argument,
which can be explicitly specified as `~padded:true` to request the location to
be allocated in a way to avoid false sharing.

Using `~padded:true` on long lived shared memory locations that are being
repeatedly modified can improve performance significantly. Using `~padded:true`
on short lived shared memory locations is not recommended.

Consider the following sketch of a queue data structure:

```ocaml
type 'a queue = {
head: 'a list Loc.t;
tail: 'a list Loc.t;
}
```

Even if you allocate the locations with padding

```ocaml
# let queue () =
{ head = Loc.make ~padded:true [];
tail = Loc.make ~padded:true []; }
val queue : unit -> 'a queue = <fun>
```

the queue is still vulnerable to the same kind of false sharing as in the
earlier example:

```ocaml
# let a_queue : int queue = queue ()
val a_queue : int queue = {head = <abstr>; tail = <abstr>}
# let counter = ref 1_000
val counter : int ref = {contents = 1000}
```

Above the reference cell for the `counter` might exhibit false sharing with the
queue record (which is read-only) and significantly degrade the performance of
the queue for passing messages between domains.

To avoid the above kind of problems, a strategic approach is to also allocate
the queue record in a way to avoid false sharing. Unfortunately OCaml does not
currently provide a standard way to do so. The
[multicore-magic](https://github.com/ocaml-multicore/multicore-magic) library
provides a function
[`copy_as_padded`](https://ocaml-multicore.github.io/multicore-magic/doc/multicore-magic/Multicore_magic/index.html#val-copy_as_padded)
for the purpose. Using
[`copy_as_padded`](https://ocaml-multicore.github.io/multicore-magic/doc/multicore-magic/Multicore_magic/index.html#val-copy_as_padded)
one would write

```ocaml
# let queue () =
{ head = Loc.make ~padded:true [];
tail = Loc.make ~padded:true []; }
|> Multicore_magic.copy_as_padded
val queue : unit -> 'a queue = <fun>
```

to allocate the queue record in a way to avoid false sharing.

Note that allocating long lived data structures, like queues, used for inter
domain communication in the way as described above to avoid false sharing does
not eliminate all false sharing, but it is likely to reduce false sharing
significantly and require with relatively low effort.

### Beware of torn reads

The algorithm underlying **kcas** ensures that it is not possible to read
Expand Down

0 comments on commit e6c2542

Please sign in to comment.