Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of Buffers (last step towards unifying COW and Spilling) #13801

Merged
merged 30 commits into from
Jan 15, 2024

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Aug 2, 2023

This PR de-couples buffer slices/views from owning buffers. As it is now, all buffer classes (ExposureTrackedBuffer, BufferSlice, SpillableBuffer, SpillableBufferSlice) inherent from Buffer, however they are not Liskov substitutable as pointed by @wence- and @vyasr (here and here).

To fix this, we now have a Buffer and a BufferOwner class. We still use the Buffer throughout cuDF but it now points to an BufferOwner.

We have the following class hierarchy:

ExposureTrackedBufferOwner -> BufferOwner 
SpillableBufferOwner -> BufferOwner 
ExposureTrackedBuffer -> Buffer 
SpillableBuffer -> Buffer 

With the following relationship:

Buffer -> BufferOwner 
ExposureTrackedBuffer -> ExposureTrackedBufferOwner 
SpillableBuffer -> SpillableBufferOwner 

Unify COW and Spilling

In a follow-up PR, the spilling buffer classes will inherent from the exposure tracked buffer classes so we get the following hierarchy:

SpillableBufferOwner -> ExposureTrackedBufferOwner -> BufferOwner 
SpillableBuffer -> ExposureTrackedBuffer -> Buffer 

@madsbk madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 2, 2023
@github-actions github-actions bot added the Python Affects Python cuDF API. label Aug 2, 2023
@madsbk madsbk marked this pull request as ready for review August 3, 2023 11:33
@madsbk madsbk requested a review from a team as a code owner August 3, 2023 11:33
@madsbk madsbk requested review from vyasr and mroeschke August 3, 2023 11:33
Copy link

copy-pr-bot bot commented Nov 2, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@madsbk madsbk changed the base branch from branch-23.10 to branch-23.12 November 3, 2023 10:19
@madsbk
Copy link
Member Author

madsbk commented Nov 3, 2023

/ok to test

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly requesting changes for discussion points

docs/cudf/source/developer_guide/library_design.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/library_design.md Outdated Show resolved Hide resolved
docs/cudf/source/developer_guide/library_design.md Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/buffer.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/buffer.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/utils.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/utils.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/utils.py Show resolved Hide resolved
python/cudf/cudf/core/buffer/utils.py Outdated Show resolved Hide resolved
Co-authored-by: Lawrence Mitchell <[email protected]>
@madsbk madsbk changed the base branch from branch-23.12 to branch-24.02 November 23, 2023 15:20
@madsbk
Copy link
Member Author

madsbk commented Nov 23, 2023

/ok to test

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass was pretty high-level to reacquaint myself with the concepts here. My main open question is around the separation between ExposureTrackedBufferOwner and BufferOwner and whether they should maybe be unified if we have to maintain the exposure properties at the BufferOwner level for the two to satisfy the LSP. Maybe we only want to distinguish at the Buffer, not BufferOwner, level (I think the SpillableBufferOwner should stay separate though).

python/cudf/cudf/core/buffer/buffer.py Show resolved Hide resolved
python/cudf/cudf/core/abc.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/buffer.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/exposure_tracked_buffer.py Outdated Show resolved Hide resolved
@madsbk
Copy link
Member Author

madsbk commented Nov 30, 2023

First pass was pretty high-level to reacquaint myself with the concepts here. My main open question is around the separation between ExposureTrackedBufferOwner and BufferOwner and whether they should maybe be unified if we have to maintain the exposure properties at the BufferOwner level for the two to satisfy the LSP. Maybe we only want to distinguish at the Buffer, not BufferOwner, level (I think the SpillableBufferOwner should stay separate though).

Agree, I think merging ExposureTrackedBufferOwner and BufferOwner is a good idea.
@wence-, I think you had a similar idea?

@madsbk
Copy link
Member Author

madsbk commented Dec 14, 2023

/ok to test

@madsbk madsbk requested a review from vyasr January 11, 2024 07:23
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very close! Thanks for all the iterations.

python/cudf/cudf/core/buffer/buffer.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/buffer.py Show resolved Hide resolved
python/cudf/cudf/core/buffer/buffer.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/buffer.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/buffer.py Show resolved Hide resolved
base=self._base, offset=offset + self._offset, size=size
)
@property
def exposed(self) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little odd that all BufferOwner objects know their exposed status, but only a subclass of Buffer does. However, I think I'm OK with that for now. In the next PR when more of these types get unified further we can see exactly what separation of concerns makes the most sense between the final set of classes.

Comment on lines +427 to +432
The sound solution is to modify Dask et al. so that they access the
frames through `.get_ptr()` and holds on to the `spill_lock` until
the frame has been transferred. However, until this adaptation we
use a hack where the frame is a `Buffer` with a `spill_lock` as the
owner, which makes `self` unspillable while the frame is alive but
doesn't expose `self` when `__cuda_array_interface__` is accessed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should reassess this evaluation after the next PR unifying COW and spilling. I don't think I agree with this statement, it implies leaking knowledge of cudf buffer internals to dask. Once we've finished the unification we should revisit whether there's a more API-friendly way of doing this. If not, we need to think about the appropriate generalization of our exposure semantics to generic CAI usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this approach (as the code in this PR uses) is (morally) the way pytorch ensures lifetime of a CAI-supporting object being turned into a torch tensor

python/cudf/cudf/core/buffer/utils.py Outdated Show resolved Hide resolved
@wence-
Copy link
Contributor

wence- commented Jan 12, 2024

/ok to test

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am happy with this! 🎉

@madsbk
Copy link
Member Author

madsbk commented Jan 12, 2024

/ok to test

@madsbk madsbk requested a review from vyasr January 12, 2024 12:18
@madsbk
Copy link
Member Author

madsbk commented Jan 12, 2024

/ok to test

@madsbk
Copy link
Member Author

madsbk commented Jan 15, 2024

Thanks for the reviews. All tests marked spilling passes for me locally so let's merge this PR

CUDF_SPILL=on CUDF_SPILL_DEVICE_LIMIT=1 py.test -m spilling python/cudf/cudf/tests/

@madsbk
Copy link
Member Author

madsbk commented Jan 15, 2024

/merge

@rapids-bot rapids-bot bot merged commit 0710335 into rapidsai:branch-24.02 Jan 15, 2024
67 checks passed
@madsbk madsbk deleted the buffer_owner branch January 15, 2024 08:34
@vyasr vyasr mentioned this pull request Apr 10, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants