asyncio performance

Random notes about tuning asyncio for performance. Performance means two different terms which might be incompatible:

Number of concurrent requests per second
Request latency in seconds: min/average/max time to complete a request

Architecture: Worker processes

Because of its GIL, CPython is basically only able to use 1 CPU. To increase the number of concurrent requests, one solution is to spawn multiple worker processes. See for example:

Gunicorn
API-Hour

Stream limits

limit parameter of StreamReader/open_connection()
set_write_buffer_limits() low/high water mark on writing for transports

aiohttp uses set_writer_buffer_limits(0) for backpressure support and implemented their own buffering, see:

aio-libs/aiohttp#1369
Some thoughts on asynchronous API design in a post-async/await world (November, 2016) by Nathaniel J. Smith

TCP_NODELAY

Since Python 3.6, asyncio now sets the TCP_NODELAY option on newly created sockets: disable the Nagle algorithm for send coalescing. Disable segment buffering so data can be sent out to peer as quickly as possible, so this is typically used to improve network utilisation.

See Nagle's algorithm.

TCP_QUICKACK

(This option is not used by asyncio by default.)

The TCP_QUICKACK option can be used to send out acknowledgements as early as possible than delayed under some protocol level exchanging, and it's not stable/permanent, subsequent TCP transactions (which may happen under the hood) can disregard this option depending on actual protocol level processing or any actual disagreements between user setting and stack behaviour.

Tune the Linux kernel

Linux TCP sysctls:

/proc/sys/net/ipv4/tcp_mem
/proc/sys/net/core/rmem_default and /proc/sys/net/core/rmem_max: The default and maximum amount for the receive socket memory
/proc/sys/net/core/wmem_default and /proc/sys/net/core/wmem_max: The default and maximum amount for the send socket memory
/proc/sys/net/core/optmem_max: The maximum amount of option memory buffers
net.ipv4.tcp_no_metrics_save
net.core.netdev_max_backlog: Set maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance.rst

performance.rst

asyncio performance

Architecture: Worker processes

Stream limits

TCP_NODELAY

TCP_QUICKACK

Tune the Linux kernel

Files

performance.rst

Latest commit

History

performance.rst

File metadata and controls

asyncio performance

Architecture: Worker processes

Stream limits

TCP_NODELAY

TCP_QUICKACK

Tune the Linux kernel