Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a timeout for shortfin unit tests. #777

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Jan 7, 2025

I'm seeing stalls in test_invoke_mobilenet_multi_fiber_per_fiber from https://github.com/nod-ai/shark-ai/blob/main/shortfin/tests/invocation/mobilenet_program_test.py when the test program fails numerics checks. The other test cases fail and terminate as expected, without needing to use a timeout mechanism.

Tested locally on Windows and the timeout worked (though it isn't pretty):

(.venv) λ pytest tests/ -rA -k test_invoke_mobilenet_multi_fiber_per_fiber --timeout 10
======================================= test session starts =======================================
platform win32 -- Python 3.11.2, pytest-8.3.4, pluggy-1.5.0
rootdir: D:\dev\projects\shark-ai\shortfin
configfile: pyproject.toml
plugins: anyio-4.8.0, timeout-2.3.1
timeout: 10.0s
timeout method: thread
timeout func_only: False
collected 264 items / 263 deselected / 1 selected

tests\invocation\mobilenet_program_test.p
 +++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Captured stdout ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fibers: [Fiber(worker='__init__', devices=[cpu0]), Fiber(worker='__init__', devices=[cpu0]), Fiber(worker='__init__', devices=[cpu0]), Fiber(worker='__init__', devices=[cpu0]), Fiber(worker='__init__', devices=[cpu0])]
Waiting for processes: [Process(pid=1, worker='__init__'), Process(pid=2, worker='__init__'), Process(pid=3, worker='__init__'), Process(pid=4, worker='__init__'), Process(pid=5, worker='__init__')]
Process(pid=1, worker='__init__'): Start
Process(pid=2, worker='__init__'): Start
Process(pid=3, worker='__init__'): Start
Process(pid=4, worker='__init__'): Start
Process(pid=5, worker='__init__'): Start
Process(pid=1, worker='__init__'): Program complete (+116ms)
Process(pid=2, worker='__init__'): Program complete (+111ms)
Process(pid=3, worker='__init__'): Program complete (+107ms)
Process(pid=4, worker='__init__'): Program complete (+101ms)
Process(pid=5, worker='__init__'): Program complete (+97ms)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Captured stderr ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
D:\dev\projects\shark-ai\shortfin\src\shortfin/support/iree_helpers.h:316: UNKNOWN; Unhandled exception: Traceback (most recent call last):
  File "D:\dev\projects\shark-ai\shortfin\tests\invocation\mobilenet_program_test.py", line 77, in assert_mobilenet_ref_output
RuntimeError: Async exception on <Worker '__init__'>): assert 0.8119692911421882 == 5.01964943873882 ± 5.0e-06

  comparison failed
  Obtained: 0.8119692911421882
  Expected: 5.01964943873882 ± 5.0e-06
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of Thread-4 () (9816) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  File "C:\Program Files\Python311\Lib\threading.py", line 995, in _bootstrap
    self._bootstrap_inner()
  File "C:\Program Files\Python311\Lib\threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "C:\Program Files\Python311\Lib\threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)

...

@ScottTodd ScottTodd merged commit ad236fd into nod-ai:main Jan 7, 2025
25 of 28 checks passed
@ScottTodd ScottTodd deleted the test-timeout branch January 7, 2025 21:56
monorimet pushed a commit that referenced this pull request Jan 8, 2025
I'm seeing stalls in `test_invoke_mobilenet_multi_fiber_per_fiber` from
https://github.com/nod-ai/shark-ai/blob/main/shortfin/tests/invocation/mobilenet_program_test.py
when the test program fails numerics checks. The other test cases fail
and terminate as expected, without needing to use a timeout mechanism.

Tested locally on Windows and the timeout worked (though it isn't
pretty):
```
(.venv) λ pytest tests/ -rA -k test_invoke_mobilenet_multi_fiber_per_fiber --timeout 10
======================================= test session starts =======================================
platform win32 -- Python 3.11.2, pytest-8.3.4, pluggy-1.5.0
rootdir: D:\dev\projects\shark-ai\shortfin
configfile: pyproject.toml
plugins: anyio-4.8.0, timeout-2.3.1
timeout: 10.0s
timeout method: thread
timeout func_only: False
collected 264 items / 263 deselected / 1 selected

tests\invocation\mobilenet_program_test.p
 +++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Captured stdout ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fibers: [Fiber(worker='__init__', devices=[cpu0]), Fiber(worker='__init__', devices=[cpu0]), Fiber(worker='__init__', devices=[cpu0]), Fiber(worker='__init__', devices=[cpu0]), Fiber(worker='__init__', devices=[cpu0])]
Waiting for processes: [Process(pid=1, worker='__init__'), Process(pid=2, worker='__init__'), Process(pid=3, worker='__init__'), Process(pid=4, worker='__init__'), Process(pid=5, worker='__init__')]
Process(pid=1, worker='__init__'): Start
Process(pid=2, worker='__init__'): Start
Process(pid=3, worker='__init__'): Start
Process(pid=4, worker='__init__'): Start
Process(pid=5, worker='__init__'): Start
Process(pid=1, worker='__init__'): Program complete (+116ms)
Process(pid=2, worker='__init__'): Program complete (+111ms)
Process(pid=3, worker='__init__'): Program complete (+107ms)
Process(pid=4, worker='__init__'): Program complete (+101ms)
Process(pid=5, worker='__init__'): Program complete (+97ms)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Captured stderr ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
D:\dev\projects\shark-ai\shortfin\src\shortfin/support/iree_helpers.h:316: UNKNOWN; Unhandled exception: Traceback (most recent call last):
  File "D:\dev\projects\shark-ai\shortfin\tests\invocation\mobilenet_program_test.py", line 77, in assert_mobilenet_ref_output
RuntimeError: Async exception on <Worker '__init__'>): assert 0.8119692911421882 == 5.01964943873882 ± 5.0e-06

  comparison failed
  Obtained: 0.8119692911421882
  Expected: 5.01964943873882 ± 5.0e-06
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of Thread-4 () (9816) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  File "C:\Program Files\Python311\Lib\threading.py", line 995, in _bootstrap
    self._bootstrap_inner()
  File "C:\Program Files\Python311\Lib\threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "C:\Program Files\Python311\Lib\threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)

...
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants