Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up async code #631

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Clean up async code #631

wants to merge 26 commits into from

Conversation

fyellin
Copy link
Contributor

@fyellin fyellin commented Nov 1, 2024

Asynchronous code in _api needed to know too much about the internal workings of the AwaitHandler. Changed it so that there's a little bit more separation.

Also made some asynchronous functions really be asynchronous.

Added on_submitted_word_done.

Commenting out implementation of create_xxx_pipeline_async because these are unimplemented in wgpu-native. The code is still there, but a constant forces the synchronous implementation.

Added tests. Added dependency on pytest-asyncio to run those tests.

@fyellin fyellin marked this pull request as ready for review November 1, 2024 23:58
@fyellin fyellin requested a review from Korijn as a code owner November 1, 2024 23:58
@Korijn
Copy link
Collaborator

Korijn commented Nov 2, 2024

Very nice improvements!

@fyellin
Copy link
Contributor Author

fyellin commented Nov 3, 2024

While I'm cleaning up. Why do both enumerate_adapters_sync and enumerate_adapters_async exist? There seems to be no value added by the async version. Can these just be merged into `enumerate_adapters"?

When I look at the Rust documentation, there seems to be only a single synchronous operation.

tests/test_async.py Outdated Show resolved Hide resolved
Copy link
Member

@almarklein almarklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I like the set_error() and set_result(). I think that crossed my mind when I worked on this, but I can't recall why I went with that weird result dict.

@almarklein
Copy link
Member

Why do both enumerate_adapters_sync and enumerate_adapters_async exist?

Good question. The reason is that in order to implement this method in JS, we must probably just call request_adapter with different power settings, and request_adapter is async.

@fyellin
Copy link
Contributor Author

fyellin commented Nov 4, 2024 via email

@almarklein
Copy link
Member

I was going to ask about anyio.

For testing It definitely makes sense. For implementation, only if it remains somewhat contained and an implementation detail.

There's a lot of changes coming in how wgpu-native implements futures, and I'm also not sure what we'd need to do to support this running in JS. My point is that this will code will evolve quite a bit in the coming months, and I don't want to lock into a dependency.

@fyellin
Copy link
Contributor Author

fyellin commented Nov 5, 2024 via email

@fyellin
Copy link
Contributor Author

fyellin commented Nov 5, 2024

I need some advice on how to finish this. I'm temporarily using anyio, which can deal with both trio and asyncio and it's mostly working.

The problem I'm running into is on the build. I've included "anyio" in pyproject.toml , and the build and tests run fine. But when the wheel is being built, it doesn't include anyio. How do I tell the wheel-builder to include this?

Just as an aside, I went to a clean environment and did "pip install wgpu". I was surprised to see pycparser included. I can't find where that dependency is coming from.

@Korijn
Copy link
Collaborator

Korijn commented Nov 5, 2024

The problem I'm running into is on the build. I've included "anyio" in pyproject.toml , and the build and tests run fine. But when the wheel is being built, it doesn't include anyio. How do I tell the wheel-builder to include this?

Wheels don't ship their dependencies, they're only listed in their metadata files. When you build a wheel, and then install that wheel into a new environment, you should see anyio being installed.

Just as an aside, I went to a clean environment and did "pip install wgpu". I was surprised to see pycparser included. I can't find where that dependency is coming from.

I'm guessing it's a transient dependency of cffi.

The object returned by WgpuAwaitable is now directly awaitable. Make Almar happy.
@almarklein
Copy link
Member

I'm guessing it's a transient dependency of cffi.

It is: https://github.com/python-cffi/cffi/blob/88f48d22484586d48079fc8780241dfdfa3379c8/pyproject.toml#L12-L14

@almarklein
Copy link
Member

It looks like yield None works for Trio too. The below runs two tasks simultaneously. One uses the awaitable that constantly switches back to the event-loop using yield None, the other prints the time on an interval.

import time
import trio

class Awaitable:    
    def __await__(self):
        while True:
            yield None

async def main():
    async with trio.open_nursery() as nursery:
        nursery.start_soon(child1)
        nursery.start_soon(child2)

async def child1():
    await Awaitable()

async def child2():
    while True:
        print(time.time())
        trio.sleep(0.01)
    
trio.run(main)

@fyellin
Copy link
Contributor Author

fyellin commented Nov 7, 2024

Everything resolved.

@Korijn
Copy link
Collaborator

Korijn commented Nov 8, 2024

I will agree with Almar on the dependency management topic here.

If you switch from anyio to sniffio, you can write something like the following:

def get_async_backend():
    lib = sniffio.current_async_library()
    if lib == "asyncio":
        import asyncio
        return asyncio
    # and so on

That would reduce our dependency footprint quite a lot.

@almarklein I do prefer to await ...sleep() over using yield None. It may work in specific circumstances but it's probably hinging on implementation details and backward compatibility artefacts...

@almarklein
Copy link
Member

Interesting that CI is green; if I checkout this branch and run an example, it panics in _wgpuDeviceCreateRenderPipeline. Note the invalid vertex format for vertex attribute: 0:

thread '<unnamed>' panicked at src/lib.rs:2169:42:
invalid vertex format for vertex attribute: 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at library/core/src/panicking.rs:221:5:
panic in a function that cannot unwind
stack backtrace:
   0:        0x1034a14e8 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h243268f17d714c7f
   1:        0x1034ba19c - core::fmt::write::hb3cfb8a30e72d7ff
   2:        0x10349fb84 - std::io::Write::write_fmt::hfb2314975de9ecf1
   3:        0x1034a2504 - std::panicking::default_hook::{{closure}}::h14c7718ccf39d316
   4:        0x1034a2128 - std::panicking::default_hook::hc62e60da3be2f352
   5:        0x1034a2fc8 - std::panicking::rust_panic_with_hook::h09e8a656f11e82b2
   6:        0x1034a28f0 - std::panicking::begin_panic_handler::{{closure}}::h1230eb3cc91b241c
   7:        0x1034a1974 - std::sys::backtrace::__rust_end_short_backtrace::hc3491307aceda2c2
   8:        0x1034a25e0 - _rust_begin_unwind
   9:        0x1034df108 - core::panicking::panic_nounwind_fmt::h91ee161184879b56
  10:        0x1034df180 - core::panicking::panic_nounwind::heab7ebe7a6cd845c
  11:        0x1034df224 - core::panicking::panic_cannot_unwind::hedc43d82620205bf
  12:        0x1031ec070 - _wgpuDeviceCreateRenderPipeline
  13:        0x1a5051050 - <unknown>
  14:        0x1a5059b04 - <unknown>
  15:        0x101c51188 - _cdata_call
  16:        0x100ed20d0 - __PyObject_Call
  17:        0x100fc4358 - __PyEval_EvalFrameDefault
  18:        0x100eeb4b0 - _gen_send_ex2
  19:        0x100eeb114 - _gen_send_ex
  20:        0x100f20524 - _cfunction_vectorcall_O
  21:        0x100fe3124 - _context_run
  22:        0x100fc3e30 - __PyEval_EvalFrameDefault
  23:        0x100eeb4b0 - _gen_send_ex2
  24:        0x100eeb114 - _gen_send_ex
  25:        0x100fc3cbc - __PyEval_EvalFrameDefault
  26:        0x100fb942c - _PyEval_EvalCode
  27:        0x100fb6d44 - _builtin_exec
  28:        0x100f2024c - _cfunction_vectorcall_FASTCALL_KEYWORDS
  29:        0x100ed1e60 - _PyObject_Vectorcall
  30:        0x100fc2968 - __PyEval_EvalFrameDefault
  31:        0x100fb942c - _PyEval_EvalCode
  32:        0x101019e20 - _run_mod
  33:        0x1010183ec - __PyRun_SimpleFileObject
  34:        0x101017e20 - __PyRun_AnyFileObject
  35:        0x10103bb50 - _Py_RunMain
  36:        0x10103bf10 - _pymain_main
  37:        0x10103bfb0 - _Py_BytesMain
thread caused non-unwinding panic. aborting.

@almarklein
Copy link
Member

almarklein commented Nov 8, 2024

I was wondering how our trio example was able to run, while your new test indicated that trio does not handle the yield None. Turns out that in main, the Awaitable.is_done() calls the poll function, and it seems to always resolve on the first try, so the None is never yielded. Conclusion: the yield None is a asyncio-specific trick.

edit: I was hoping that the yield trick allowed us to stay async-lib-independent without much effort. But looks like this is not possible.

@fyellin
Copy link
Contributor Author

fyellin commented Nov 13, 2024

Is there something holding this up?

@Korijn
Copy link
Collaborator

Korijn commented Nov 13, 2024

Is there something holding this up?

Yes, these two comments:

@fyellin
Copy link
Contributor Author

fyellin commented Nov 14, 2024

I double checked "yield None" with the simplest possible code:

>>> class Foo:
...     def __await__(self):  yield None; return 10
... 
>>> async def foo():
...     await Foo()
... 
>>> import trio
>>> trio.run(foo)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/fy/Pycharm/wgpu-py/.venv/lib/python3.12/site-packages/trio/_core/_run.py", line 2407, in run
    raise runner.main_task_outcome.error
  File "<stdin>", line 2, in foo
  File "<stdin>", line 2, in __await__
TypeError: trio.run received unrecognized yield message None. Are you trying to use a library written for some other framework like asyncio? That won't work without some kind of compatibility shim.
>

Meanwhile asyncio.run(foo()) worked just fine. So the yield None is definitely asyncio specific.

@fyellin
Copy link
Contributor Author

fyellin commented Nov 16, 2024

@almarklein . What are you doing to see your crash? I haven't seen that on either my Mac or on the headless Linux machines I occasionally test on.

@almarklein
Copy link
Member

What are you doing to see your crash?

python examples/triangle.py shows a black window. python examples/cube.py crashes.

@fyellin
Copy link
Contributor Author

fyellin commented Nov 19, 2024

@almarklein. Hmm. It works just fine on my Mac. . . . Let me see if I can re-create this on a headless Linux device. If not, I'll just close this PR. Not worth the hassle it's causing.

@fyellin
Copy link
Contributor Author

fyellin commented Nov 19, 2024

@almarklein Is there some secret to being able to run examples/cube.py on a headless machine with a GPU, and just having the results written to a backing buffer rather than the actual screen? I seem not to be able to figure out how to do that.

What is particularly strange is that although examples/cube.py includes an asynchronous implementation, the code that's actually running is completely synchronous. And the error message you're seeing is utterly bizarre.

@fyellin
Copy link
Contributor Author

fyellin commented Nov 19, 2024

@almarklein. Can you give me details on OS, Python version, and GPU? I rewrote cube.py to be windowless, ran it on a Linux machine, and it just worked. The bug you've found sounds serious, but I just can't reproduce it. . .

@almarklein
Copy link
Member

almarklein commented Nov 20, 2024

I'm on MacOS (M1), python 3.12.4. This is very strange indeed. I'm having a quick look.

edit: I think I know what the problem is. The pipeline descriptor is now build in a separate method, but some of the subfields are not referenced anymore and thefore cleaned up.

@almarklein
Copy link
Member

I found and fixed the problem by adding a new_array() function. It makes sure that the substructs are associated with the array so they are not gc'd prematurely. It also returns null when the list of substructs is empty, which simplifies the code in a few other places (and produces a cleaner struct where we did not consider the case of the empty array). This change should also make further refactoring easier.

@almarklein
Copy link
Member

The last remaining issue is to replace anyio with sniffio.

@fyellin
Copy link
Contributor Author

fyellin commented Nov 21, 2024

@almarklein. Are all bugs gone? I'm still surprised you had GC problems when I didn't. I wonder if cffi is implemented slightly different on different machines.

@almarklein
Copy link
Member

almarklein commented Nov 21, 2024

For the record, I'll try to explain what was happening a bit better. When you create an ffi struct, it does not hold a reference to its fields when these fields are pointers. So if you do:

sub_struct = ffi.new("CSubType")
struct = ffi.new("CType")
struct.aField = sub_struct

then struct does not hold a reference to sub_struct. If sub_struct goes out of scope, its memory is removed, and struct is now pointing to unclaimed memory; using it can result into garbage or a segfault.

We solved this problem long ago using the new_struct() and new_struct_p() functions in _api.py. One of the things these functions do is associated the fields with the struct, using a WeakKeyDictionary.

However, for arrays we had code like this in many places:

list_of_sub_structs = [new_struct("CSubType", ...) for ... in ...]
array_of_sub_structs = ffi.new("CSubType[]", list_of_sub_structs)
struct.a_field = array_of_sub_structs

These arrays are contiguous copies of the list of structs. The problem is not that the sub-structs in the list get cleared by the gc (as I thought earlier), but that any sub-sub-structs and/or sub-sub-arrays of these sub-structs get cleared when the sub-struct gets cleared.

The issue with arrays was not hit earlier, because in most cases we create a descriptor and then use it to instantiate a wgpu object. As code is refactored to create the descriptor in a separate method (which is a great change by itself) the lists go out of scope before the descriptor is passed to Rust.

@almarklein
Copy link
Member

Wow, this PR is interesting, in being quite technical/deep on two very different topics 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants