Performance Bottlenecks in Persistent<T> #171

songweijia · 2020-07-22T18:02:16Z

There is a performance bottleneck in Persistent. I found it when I evaluate Persistent bandwidth performance with large operation size and fast NVMe devices (2GB/s peak write throughput). Compared with slow SSD (~500MB/s peak write throughput), fast NVMe only improved the Persistent bandwidth test performance marginally even with large message sizes (~100MB).

I found that Persistent::version() takes a long time to append data to the log entry. The current object of T is first serialized into a new allocated memory buffer and then appended to the log. This introduced extra overhead including allocating new memory buffer and memory copies from the memory buffer to memory mapped log data region. This overhead was acceptable with slow persistent devices. But for fast persistent device (2GB/s is only 1/3 ~ 1/4 of memory bandwidth), that overhead begins to dominate. Two optimizations can be done here:

Acquire the memory regions from the persist log before hand and fill it, like what we do with ordered send buffer.
Use hugepage to manage the memory buffers in PersistLog to reduce the memory allocation overhead in PersistLog::append().

songweijia added enhancement persistence labels Jul 22, 2020

songweijia self-assigned this Jul 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Bottlenecks in Persistent<T> #171

Performance Bottlenecks in Persistent<T> #171

songweijia commented Jul 22, 2020

Performance Bottlenecks in Persistent<T> #171

Performance Bottlenecks in Persistent<T> #171

Comments

songweijia commented Jul 22, 2020