You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NOTE: This is not about timeout for test code itself (pytest-timeout works well here), this is about need for timeout in pytest-xdist.
First, let me say big thank you for pytest and pytest-xdist. We use it to run ~400 Docker containers on ~10 servers on AWS. It works wonders!
There are scenarios where pytest-xdist does not detect remote session crash or disconnect and as such will wait for results forever.
Today's xdist code detects session crash via EOF on the SSH session. When network connection is torn down, server marks the worker as dead, and re-adds it. All good.
But... consider a scenario where the SSH is not torn down:
Run N tests on multiple remote machines with pytest-xdist,
Tests spawn a python process on remote machine via SSH
We run in boxed mode, so this process forks to run actual test code
In this case, the server side xdist thinks the session is up and is waiting for the results for really, really long time ;-)
And yes, #2 does not crash normally. In our case it was oom killed quite persistently. All it takes is 1 oom kill for tens of thousands of tests and entire batch is ruined.
Please let me know if I can provide more info on this issue.
[root@nsth-c10 nsth] #.python --version
Python 2.7.10
[root@nsth-c10 nsth] #.py.test --version
This is pytest version 2.8.0, imported from /usr/local/lib/python2.7/site-packages/pytest-2.8.0-py2.7.egg/pytest.pyc
setuptools registered plugins:
pytest-xdist-1.13.1 at /usr/local/lib/python2.7/site-packages/pytest_xdist-1.13.1-py2.7.egg/xdist/boxed.pyc
pytest-xdist-1.13.1 at /usr/local/lib/python2.7/site-packages/pytest_xdist-1.13.1-py2.7.egg/xdist/looponfail.pyc
pytest-xdist-1.13.1 at /usr/local/lib/python2.7/site-packages/pytest_xdist-1.13.1-py2.7.egg/xdist/plugin.pyc
[root@nsth-c10 nsth] #.
i think this one is dependent on #20 - with the current codebase its really tricky to introduce heartbeats on top of the support for node-restarts
since we cant detect a dead ssh due to the default behaviour we need some kind of heartbeat mechanism, so we can be aware of sessions in a unresponsive state
We addressed the underlying root cause by increasing amount of memory each container can use (docker -mem option). But, of course, there are other ways it may lock up or crash, so addressing this will help.
Thank you for taking this into account in the future.
NOTE: This is not about timeout for test code itself (pytest-timeout works well here), this is about need for timeout in pytest-xdist.
First, let me say big thank you for pytest and pytest-xdist. We use it to run ~400 Docker containers on ~10 servers on AWS. It works wonders!
There are scenarios where pytest-xdist does not detect remote session crash or disconnect and as such will wait for results forever.
Today's xdist code detects session crash via EOF on the SSH session. When network connection is torn down, server marks the worker as dead, and re-adds it. All good.
But... consider a scenario where the SSH is not torn down:
In this case, the server side xdist thinks the session is up and is waiting for the results for really, really long time ;-)
And yes, #2 does not crash normally. In our case it was oom killed quite persistently. All it takes is 1 oom kill for tens of thousands of tests and entire batch is ruined.
Please let me know if I can provide more info on this issue.
[root@nsth-c10 nsth] #.python --version
Python 2.7.10
[root@nsth-c10 nsth] #.py.test --version
This is pytest version 2.8.0, imported from /usr/local/lib/python2.7/site-packages/pytest-2.8.0-py2.7.egg/pytest.pyc
setuptools registered plugins:
pytest-xdist-1.13.1 at /usr/local/lib/python2.7/site-packages/pytest_xdist-1.13.1-py2.7.egg/xdist/boxed.pyc
pytest-xdist-1.13.1 at /usr/local/lib/python2.7/site-packages/pytest_xdist-1.13.1-py2.7.egg/xdist/looponfail.pyc
pytest-xdist-1.13.1 at /usr/local/lib/python2.7/site-packages/pytest_xdist-1.13.1-py2.7.egg/xdist/plugin.pyc
[root@nsth-c10 nsth] #.
P.S.
Moved from pytest-dev/pytest#1550
The text was updated successfully, but these errors were encountered: