-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(robot-server): create isolation for tests and run them in parallel #11517
Conversation
tests ?= tests | ||
cov_opts ?= --cov=$(SRC_PATH) --cov-report term-missing:skip-covered --cov-report xml:coverage.xml | ||
test_opts ?= | ||
test_opts ?= -n auto --dist loadscope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your experimentation, does it seem like loadscope
has the intended effect? The online docs suggest it doesn't actually follow fixture scope; it's something coarser-grained based on files?
I guess if it misbehaves, the effect would be that expensive fixtures are redundantly set up across processes, which will be slower than expected but should at least be safe if the tests are written properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this more, I think we have performance concerns and safety concerns, but it's not loadscope
's fault, per se.
On my dual-core 2018 MacBook Air:
Serial (old) | Parallel (pytest-xdist ) |
|
---|---|---|
Run time (wall clock) | 3m25s | 3m45s |
CPU utilization | 50% | 100% |
Worker processes | 1 | 4 |
The parallel test suite is slower despite higher CPU utilization. This suggests to me that the tests are being distributed badly.
pytest-xdist
worker process can initialize fixtures redundantly if test functions are distributed badly to them. For example, say you have 4 test functions that use the same scope="session"
fixture. Then, pytest-xdist
happens to distribute each of those test functions to a different worker process. This will cause the fixture to execute 4 times, doing 4x more work than necessary.
My theory is that we're running into this with some of our expensive fixtures, like run_server
, and the penalty of doing more work is outweighing the benefit of doing that work in parallel.
If this theory is true, any performance improvements or regressions introduced by this PR would be largely luck-based: how many cores does the machine have, and how did pytest-xdist
decide to distribute across them?
To fix this, we could either:
- Make
run_server
(and other expensivescope="session"
fixtures) truly global across worker processes, so they're not executed redundantly even ifpytest-xdists
distributes tests badly.- This would make
scope="session"
fixtures behave more closely to how we'd all hope and intuitively expect. - However, because
run_server
is so stateful, I think this is a dangerous path. Our integration tests can and do leak into each other, and if they do that in parallel or in an unpredictable order, it will cause flakiness and confusing failures.
- This would make
- Help
pytest-xdist
distribute tests better.- Ideally,
pytest-xdist
would schedule things intelligently based on which tests use which fixtures. There's a good and very old proposal for this, but it seems to have stalled. - So we might have to write a custom
pytest-xdist
scheduler, which is a quasi-documented part of thepytest-xdist
API.
- Ideally,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the insight as always Max. If tavern supported parameterized marks other than skipif
we could upgrade pytest-xdist to 2.5 and use --dist loadgroup
with @pytest.mark.xdist_group(name="group1")
but it does not.
So I think we can keep the fixture scoping, add marks to tests optimally run in xdist, then run some tests using xdist and some not.
tests ?= tests | ||
cov_opts ?= --cov=$(SRC_PATH) --cov-report term-missing:skip-covered --cov-report xml:coverage.xml | ||
test_opts ?= | ||
test_opts ?= -n auto --dist loadscope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Can we spell out full-length command line options, for readability?
test_opts ?= -n auto --dist loadscope | |
test_opts ?= --numprocesses auto --dist loadscope |
def _request_session() -> requests.Session: | ||
session = requests.Session() | ||
session.headers.update({API_VERSION_HEADER: LATEST_API_VERSION_HEADER_VALUE}) | ||
return session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this existed before this PR, but a Session
is a resource that should be close()
d, since it represents a connection pool (among other things), and we don't want to leak TCP connections.
I think we should rewrite this like:
@contextmanager
def _request_session() -> Generator[requests.Session, None, None]:
with requests.Session() as session:
session.headers.update({API_VERSION_HEADER: LATEST_API_VERSION_HEADER_VALUE})
yield session
This ties into my other comments about using regular Python functions more and Pytest fixtures less.
@pytest.fixture(scope="session") | ||
def server_temp_directory() -> Iterator[str]: | ||
new_dir = tempfile.mkdtemp() | ||
def request_session() -> requests.Session: | ||
return _request_session() | ||
|
||
|
||
@pytest.fixture(scope="function") | ||
def function_scope_request_session() -> requests.Session: | ||
return _request_session() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These request_session
and function_scope_request_session
fixtures are only used in this file, correct? If I'm reading things correctly, they're just helpers to set up run_server
and function_scope_run_server
, which is what the tests actually care about?
I think our life gets a bit easier if we don't have fixtures for request_session
and function_scope_request_session
. If something in this file needs a Session
, let it call _request_session()
directly. No need to use Pytest's dependency injection machinery for it.
Per your review request, this will help deduplicate the code between run_server
and function_scope_run_server
. See my other comment.
def _free_port() -> str: | ||
with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock: | ||
sock.bind(("localhost", 0)) | ||
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) | ||
return str(sock.getsockname()[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this came from https://stackoverflow.com/a/45690594/497934, right?
I need to refresh my memory on how port allocation stuff like this works, but does it seem fishy that we close the socket when we return the port number? I worry that we're doing this:
- Open a socket and bind it to an automatically-chosen port
- Get the port that we just chose automatically
- Return the port and close the socket, thereby unintentionally freeing the port?
I'll dig around and see if there's a better way to do this, but I wonder if this whole thing needs to become a context manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I've gathered:
- The race condition described above does exist in this code. The code relies on the operating system not recycling port numbers too quickly. See the comments in this similar fixture by another Internet stranger.
- I think it's possible to fix this race condition by turning the
return
into ayield
. But naively doing this causes things to fail with "address already in use" errors, at least on my macOS machine. I got it to work by addingSO_REUSEPORT
, but that threatens to drag us deeper into a hellscape of socket option non-portability. - Aside: the
setsockopt()
call should probably come before thebind()
call, as a general rule. This might not matter here, in the midst of everything else.
Given all of that, I propose we ditch this port auto-allocation and just use random()
port numbers. I figure if things are going to be dodgy no matter what, we can at least keep the code simple.
If random port numbers prove insufficient, we can either:
- Pick a port range that we expect to be free, and implement our own cross-process allocation from that range.
pytest-xdist
gives us the ID of the current worker process and how many worker processes exist in total, which may help. - Have
robot-server
pick a port for itself via the operating system's allocator, and report which port it ended up with back to the pytest runner via a file or something. Uvicorn already logs this to stdout or stderr, and there is a Uvicorn feature request to expose it programmatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will run with the random port and see how it goes.
@pytest.fixture(scope="function") | ||
def function_scope_run_server( | ||
function_scope_request_session: requests.Session, | ||
function_scope_server_temp_directory: str, | ||
function_scope_free_port: str, | ||
) -> Iterator["subprocess.Popen[Any]"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressing your review question about deduplication, I think this gets a lot easier if we use regular Python functions more and Pytest fixtures less.
For example, we could have a _run_server()
function that's a regular Python context manager that runs the server and returns its URI or something.
And then, if we want to reuse the same server across all tests in a file, we can easily wrap that plain Python function in a 5-line module-scope fixture.
For example, this is what we do in one of our integration tests:
opentrons/robot-server/tests/integration/http_api/persistence/test_compatibility.py
Lines 18 to 28 in 0da5ec3
# Module-scope to avoid the overhead of restarting the server between test functions. | |
# This relies on the test functions only reading, never writing. | |
@pytest.fixture(scope="module") | |
def dev_server(module_scope_free_port: str) -> Generator[DevServer, None, None]: | |
port = module_scope_free_port | |
with DevServer( | |
port=port, | |
persistence_directory=_OLDER_PERSISTENCE_DIR, | |
) as server: | |
server.start() | |
yield server |
In other words, the idea is to keep our reusable building blocks defined as plain Python functions, because they're easy to compose; and only wrap them all up in a properly-scoped fixture at the top level.
- free_port | ||
- run_server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would refactor the free_port
and run_server
fixtures into a common fixture called something like server_url_base
or global_server_url_base
.
The value of this fixture would be a string like http://localhost:12345
, pointing to a running dev server. The Tavern tests would use it like url: '{server_url_base}/runs'
.
The Tavern tests would not explicitly use a run_server
fixture.
tests ?= tests | ||
cov_opts ?= --cov=$(SRC_PATH) --cov-report term-missing:skip-covered --cov-report xml:coverage.xml | ||
test_opts ?= | ||
test_opts ?= -n auto --dist loadscope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this more, I think we have performance concerns and safety concerns, but it's not loadscope
's fault, per se.
On my dual-core 2018 MacBook Air:
Serial (old) | Parallel (pytest-xdist ) |
|
---|---|---|
Run time (wall clock) | 3m25s | 3m45s |
CPU utilization | 50% | 100% |
Worker processes | 1 | 4 |
The parallel test suite is slower despite higher CPU utilization. This suggests to me that the tests are being distributed badly.
pytest-xdist
worker process can initialize fixtures redundantly if test functions are distributed badly to them. For example, say you have 4 test functions that use the same scope="session"
fixture. Then, pytest-xdist
happens to distribute each of those test functions to a different worker process. This will cause the fixture to execute 4 times, doing 4x more work than necessary.
My theory is that we're running into this with some of our expensive fixtures, like run_server
, and the penalty of doing more work is outweighing the benefit of doing that work in parallel.
If this theory is true, any performance improvements or regressions introduced by this PR would be largely luck-based: how many cores does the machine have, and how did pytest-xdist
decide to distribute across them?
To fix this, we could either:
- Make
run_server
(and other expensivescope="session"
fixtures) truly global across worker processes, so they're not executed redundantly even ifpytest-xdists
distributes tests badly.- This would make
scope="session"
fixtures behave more closely to how we'd all hope and intuitively expect. - However, because
run_server
is so stateful, I think this is a dangerous path. Our integration tests can and do leak into each other, and if they do that in parallel or in an unpredictable order, it will cause flakiness and confusing failures.
- This would make
- Help
pytest-xdist
distribute tests better.- Ideally,
pytest-xdist
would schedule things intelligently based on which tests use which fixtures. There's a good and very old proposal for this, but it seems to have stalled. - So we might have to write a custom
pytest-xdist
scheduler, which is a quasi-documented part of thepytest-xdist
API.
- Ideally,
@pytest.fixture(scope="session") | ||
def run_server( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For flakiness reasons, I don't think run_server
can remain scope="session"
in a parallel world.
My understanding is that run_server
is used primarily (or exclusively?) by our Tavern integration tests. Historically, these tests have shared a single run_server
per test session, for performance reasons.
Our Tavern tests occasionally leak and affect each other. For example, maybe one test leaves behind a run resource, which affects what subsequent tests see when they do GET /runs
.
So far, the combination of these facts hasn't caused any flakiness, because pytest always runs the tests serially and in an order that's consistent in practice.
But in a pytest-xdist
parallelized world, tests will be allocated to dev servers unpredictably. An integration test might flakily succeed or fail depending on what other tests happened to get allocated to the same server before it.
I think we either need to:
- Run an isolated dev server for each integration test.
- My understanding is that this is just prohibitively slow.
- Make sure the tests that use a shared dev server do so serially, in a consistent order.
- This might require writing a custom scheduler, but we might have to do that anyway because of test(robot-server): create isolation for tests and run them in parallel #11517 (comment).
closing, will reopen after #11682 is merged, will be pushing on that 11/29 |
Overview
Changelog
Review Requests
*run_server
? Since the params are fixtures I couldn't see a way.Risk assessment
Low, test only. Not sure the port cleanup is great? Especially if ctrl-c or on failure?