Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAS-132166 / Address memory consumption related to AIO reads #423

Merged
merged 1 commit into from
Nov 2, 2024

Conversation

anodos325
Copy link
Collaborator

Add backpressure mechanism for AIO reads in io_uring module.

In some edge cases where storage backend can't keep up with client reads, io_uring / samba will allow queueing up reads beyond what is reasonable leading to OOM errors in some extreme edge cases.

This commit keeps a counter of outstanding read requests per-TCON and switches to pread(2) once the queue depth limit is reached. This effectively applies backpressure to client and prevents excessive memory consumption. A total count of synchronous reads performed is logged when TCON is disconnected if log level for the VFS module is sufficiently high (3 or higher).

Fix memory leak related to AIO memory pool for some failure scenarios for AIO reads.

The memory pool for AIO buffers is allocated under a memory context that persists for the duration of the SMB session. When chunks of pool are assigned out for a particular AIO request, a separate small allocation (io_link) is performed with a pointer to the data buffer and a talloc destructor set such that when the io_link is freed the buffer is returned to the pool.

Initially these were performed as two separate steps (assigning out buffer from pool and setting up io_link) which allowed for situations to arise in which the request could error out before the io_link was set resulting in the buffer never being returned to the memory pool.

This commit revises the workflow and API for the memory pool such that the chunk is always assinged with an associated io_link allocated under the memory context of the initial SMB2 read request. Once the request is successful and we're getting ready to respond to client then we reparent the io_link to the context of the response. This ensures that the buffer is always returned when a limited-life talloc chunk is freed.

Add backpressure mechanism for AIO reads in io_uring module.

In some edge cases where storage backend can't keep up with client
reads, io_uring / samba will allow queueing up reads beyond what
is reasonable leading to OOM errors in some extreme edge cases.

This commit keeps a counter of outstanding read requests per-TCON
and switches to pread(2) once the queue depth limit is reached.
This effectively applies backpressure to client and prevents excessive
memory consumption. A total count of synchronous reads performed is
logged when TCON is disconnected if log level for the VFS module is
sufficiently high (3 or higher).

Fix memory leak related to AIO memory pool for some failure scenarios
for AIO reads.

The memory pool for AIO buffers is allocated under a memory context
that persists for the duration of the SMB session. When chunks of
pool are assigned out for a particular AIO request, a separate
small allocation (io_link) is performed with a pointer to the data
buffer and a talloc destructor set such that when the io_link is
freed the buffer is returned to the pool.

Initially these were performed as two separate steps (assigning out
buffer from pool and setting up io_link) which allowed for situations
to arise in which the request could error out before the io_link was
set resulting in the buffer never being returned to the memory pool.

This commit revises the workflow and API for the memory pool such
that the chunk is always assinged with an associated io_link allocated
under the memory context of the initial SMB2 read request. Once the
request is successful and we're getting ready to respond to client
then we reparent the io_link to the context of the response. This
ensures that the buffer is always returned when a limited-life
talloc chunk is freed.
@anodos325 anodos325 changed the title Address memory consumption related to AIO reads NAS-132166 / Address memory consumption related to AIO reads Nov 1, 2024
Copy link
Contributor

@bmeagherix bmeagherix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines +559 to +562
// free initial data blob
TALLOC_FREE(state->io_lnk);
state->out_data = data_blob_null;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So upon schedule_smb2_aio_read failure, we free the buffer. However, on line 597 below ("Ok, read into memory. Allocate the out buffer.") we reallocate one.

Is there any need? / Would it be more efficient to just free on all the return paths between here and there?

If this is being done to force schedule_smb2_sendfile_read down a particular path (when it sees data_blob_null) then maybe add a comment to that effect.

@anodos325 anodos325 merged commit d6cf2f3 into SCALE-v4-20-stable Nov 2, 2024
@anodos325 anodos325 deleted the add-backpressure-aio-reads branch November 2, 2024 11:54
anodos325 added a commit that referenced this pull request Nov 2, 2024
Add backpressure mechanism for AIO reads in io_uring module.

In some edge cases where storage backend can't keep up with client
reads, io_uring / samba will allow queueing up reads beyond what
is reasonable leading to OOM errors in some extreme edge cases.

This commit keeps a counter of outstanding read requests per-TCON
and switches to pread(2) once the queue depth limit is reached.
This effectively applies backpressure to client and prevents excessive
memory consumption. A total count of synchronous reads performed is
logged when TCON is disconnected if log level for the VFS module is
sufficiently high (3 or higher).

Fix memory leak related to AIO memory pool for some failure scenarios
for AIO reads.

The memory pool for AIO buffers is allocated under a memory context
that persists for the duration of the SMB session. When chunks of
pool are assigned out for a particular AIO request, a separate
small allocation (io_link) is performed with a pointer to the data
buffer and a talloc destructor set such that when the io_link is
freed the buffer is returned to the pool.

Initially these were performed as two separate steps (assigning out
buffer from pool and setting up io_link) which allowed for situations
to arise in which the request could error out before the io_link was
set resulting in the buffer never being returned to the memory pool.

This commit revises the workflow and API for the memory pool such
that the chunk is always assinged with an associated io_link allocated
under the memory context of the initial SMB2 read request. Once the
request is successful and we're getting ready to respond to client
then we reparent the io_link to the context of the response. This
ensures that the buffer is always returned when a limited-life
talloc chunk is freed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants