Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards soundness of PyByteArray::to_vec #4742

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

robsdedude
Copy link

In free-threaded Python, to_vec needs to make sure to run inside a critical section so that no other Python thread is mutating the bytearray causing UB.

See also #4736

Unfortunately it seems I can't write proper tests for this as Python 3.13t is not yet part of the test matrix. I'm aware that support for testing with 3.13 and 3.13t is still in it's early stages and for instance virtualenv does not yet support it.

In free-threaded Python, to_vec needs to make sure to run inside a critical
section so that no other Python thread is mutating the bytearray causing UB.

See also PyO3#4736
@robsdedude robsdedude changed the title Towards soundness of PyByteArrayMethods::to_vec Towards soundness of PyByteArray::to_vec Nov 29, 2024
@robsdedude robsdedude marked this pull request as ready for review November 29, 2024 09:28
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! We actually do have tests running for the free-threaded build, I would have been unhappy to declare support running without them! Similarly I have had virtualenv working just fine with 3.13t (haven't tried windows, though).

I think we could write a test which spawns a thread which does something to attempt to invalidate the data (maybe write to it using py.run or PySequenceMethods::set_slice) and confirm that the data read is the original data inserted, not the conflicting data (which should hopefully now block on either the GIL or the critical section depending on the build).

newsfragments/4742.fixed.md Outdated Show resolved Hide resolved
Co-authored-by: David Hewitt <[email protected]>
@davidhewitt davidhewitt mentioned this pull request Nov 29, 2024
@robsdedude
Copy link
Author

robsdedude commented Nov 29, 2024

@davidhewitt I tried to write a test runing bytearray.extend in one thread while reading the bytearray with to_vec() in another thread and found that I was able to read inconsistent (more precisely partially uninitialized memory) regardless whether the critical section change was in place or not. Digging deeper, I'm not surprised. If you look the the C implementation of bytearray, you'll see that no critical section is used throughout the whole file. All the memcpy and memmove calls are unprotected 😕

Not sure where to go from here.

However, no matter how hard I tried, I couldn't get it to segfault. So maybe there's something more to it that I'm not aware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants