test(robot-server): create isolation for tests and run them in parallel #11517

y3rsh · 2022-09-28T19:24:22Z

Overview

Massive speed up locally
- M1 mac now takes ~1 minute
- WSL takes ~2 minutes
Not much speed up in CI as the runner only goes to 2 pools

Changelog

Dynamically grab ports
Explicitly scope fixtures and confine tests as needed
Use built in temporary path from pytest

Review Requests

Is there a way not to repeat the code in *run_server? Since the params are fixtures I couldn't see a way.

Risk assessment

Low, test only. Not sure the port cleanup is great? Especially if ctrl-c or on failure?

robot-server/Makefile

SyntaxColoring · 2022-09-29T13:45:14Z

robot-server/Makefile

 tests ?= tests
 cov_opts ?= --cov=$(SRC_PATH) --cov-report term-missing:skip-covered --cov-report xml:coverage.xml
-test_opts ?=
+test_opts ?= -n auto --dist loadscope


In your experimentation, does it seem like loadscope has the intended effect? The online docs suggest it doesn't actually follow fixture scope; it's something coarser-grained based on files?

I guess if it misbehaves, the effect would be that expensive fixtures are redundantly set up across processes, which will be slower than expected ~~but should at least be safe if the tests are written properly?~~

Looking into this more, I think we have performance concerns and safety concerns, but it's not loadscope's fault, per se.

On my dual-core 2018 MacBook Air:

Serial (old) Parallel (pytest-xdist)

Run time (wall clock) 3m25s 3m45s

CPU utilization 50% 100%

Worker processes 1 4

The parallel test suite is slower despite higher CPU utilization. This suggests to me that the tests are being distributed badly.

pytest-xdist worker process can initialize fixtures redundantly if test functions are distributed badly to them. For example, say you have 4 test functions that use the same scope="session" fixture. Then, pytest-xdist happens to distribute each of those test functions to a different worker process. This will cause the fixture to execute 4 times, doing 4x more work than necessary.

My theory is that we're running into this with some of our expensive fixtures, like run_server, and the penalty of doing more work is outweighing the benefit of doing that work in parallel.

If this theory is true, any performance improvements or regressions introduced by this PR would be largely luck-based: how many cores does the machine have, and how did pytest-xdist decide to distribute across them?

To fix this, we could either:

Make run_server (and other expensive scope="session" fixtures) truly global across worker processes, so they're not executed redundantly even if pytest-xdists distributes tests badly.

This would make scope="session" fixtures behave more closely to how we'd all hope and intuitively expect.

However, because run_server is so stateful, I think this is a dangerous path. Our integration tests can and do leak into each other, and if they do that in parallel or in an unpredictable order, it will cause flakiness and confusing failures.

Help pytest-xdist distribute tests better.

Ideally, pytest-xdist would schedule things intelligently based on which tests use which fixtures. There's a good and very old proposal for this, but it seems to have stalled.

So we might have to write a custom pytest-xdist scheduler, which is a quasi-documented part of the pytest-xdist API.

Thank you for the insight as always Max. If tavern supported parameterized marks other than skipif we could upgrade pytest-xdist to 2.5 and use --dist loadgroup with @pytest.mark.xdist_group(name="group1") but it does not.
So I think we can keep the fixture scoping, add marks to tests optimally run in xdist, then run some tests using xdist and some not.

SyntaxColoring · 2022-09-29T13:46:09Z

robot-server/Makefile

 tests ?= tests
 cov_opts ?= --cov=$(SRC_PATH) --cov-report term-missing:skip-covered --cov-report xml:coverage.xml
-test_opts ?=
+test_opts ?= -n auto --dist loadscope


Nitpick: Can we spell out full-length command line options, for readability?

Suggested change

test_opts ?= -n auto --dist loadscope

test_opts ?= --numprocesses auto --dist loadscope

SyntaxColoring · 2022-09-29T13:47:40Z

robot-server/tests/conftest.py

+def _request_session() -> requests.Session:
    session = requests.Session()
    session.headers.update({API_VERSION_HEADER: LATEST_API_VERSION_HEADER_VALUE})
    return session


It looks like this existed before this PR, but a Session is a resource that should be close()d, since it represents a connection pool (among other things), and we don't want to leak TCP connections.

I think we should rewrite this like:

@contextmanager def _request_session() -> Generator[requests.Session, None, None]: with requests.Session() as session: session.headers.update({API_VERSION_HEADER: LATEST_API_VERSION_HEADER_VALUE}) yield session

This ties into my other comments about using regular Python functions more and Pytest fixtures less.

SyntaxColoring · 2022-09-29T13:55:59Z

robot-server/tests/conftest.py

 @pytest.fixture(scope="session")
-def server_temp_directory() -> Iterator[str]:
-    new_dir = tempfile.mkdtemp()
+def request_session() -> requests.Session:
+    return _request_session()
+
+
+@pytest.fixture(scope="function")
+def function_scope_request_session() -> requests.Session:
+    return _request_session()


These request_session and function_scope_request_session fixtures are only used in this file, correct? If I'm reading things correctly, they're just helpers to set up run_server and function_scope_run_server, which is what the tests actually care about?

I think our life gets a bit easier if we don't have fixtures for request_session and function_scope_request_session. If something in this file needs a Session, let it call _request_session() directly. No need to use Pytest's dependency injection machinery for it.

Per your review request, this will help deduplicate the code between run_server and function_scope_run_server. See my other comment.

SyntaxColoring · 2022-09-30T14:10:34Z

robot-server/tests/conftest.py

+def _free_port() -> str:
+    with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
+        sock.bind(("localhost", 0))
+        sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+        return str(sock.getsockname()[1])


It looks like this came from https://stackoverflow.com/a/45690594/497934, right?

I need to refresh my memory on how port allocation stuff like this works, but does it seem fishy that we close the socket when we return the port number? I worry that we're doing this:

Open a socket and bind it to an automatically-chosen port

Get the port that we just chose automatically

Return the port and close the socket, thereby unintentionally freeing the port?

I'll dig around and see if there's a better way to do this, but I wonder if this whole thing needs to become a context manager.

Here's what I've gathered:

The race condition described above does exist in this code. The code relies on the operating system not recycling port numbers too quickly. See the comments in this similar fixture by another Internet stranger.

I think it's possible to fix this race condition by turning the return into a yield. But naively doing this causes things to fail with "address already in use" errors, at least on my macOS machine. I got it to work by adding SO_REUSEPORT, but that threatens to drag us deeper into a hellscape of socket option non-portability.

Aside: the setsockopt() call should probably come before the bind() call, as a general rule. This might not matter here, in the midst of everything else.

Given all of that, I propose we ditch this port auto-allocation and just use random() port numbers. I figure if things are going to be dodgy no matter what, we can at least keep the code simple.

If random port numbers prove insufficient, we can either:

Pick a port range that we expect to be free, and implement our own cross-process allocation from that range. pytest-xdist gives us the ID of the current worker process and how many worker processes exist in total, which may help.

Have robot-server pick a port for itself via the operating system's allocator, and report which port it ended up with back to the pytest runner via a file or something. Uvicorn already logs this to stdout or stderr, and there is a Uvicorn feature request to expose it programmatically.

Will run with the random port and see how it goes.

SyntaxColoring · 2022-09-30T14:22:29Z

robot-server/tests/conftest.py

+@pytest.fixture(scope="function")
+def function_scope_run_server(
+    function_scope_request_session: requests.Session,
+    function_scope_server_temp_directory: str,
+    function_scope_free_port: str,
+) -> Iterator["subprocess.Popen[Any]"]:


Addressing your review question about deduplication, I think this gets a lot easier if we use regular Python functions more and Pytest fixtures less.

For example, we could have a _run_server() function that's a regular Python context manager that runs the server and returns its URI or something.

And then, if we want to reuse the same server across all tests in a file, we can easily wrap that plain Python function in a 5-line module-scope fixture.

For example, this is what we do in one of our integration tests:

opentrons/robot-server/tests/integration/http_api/persistence/test_compatibility.py

Lines 18 to 28 in 0da5ec3

# Module-scope to avoid the overhead of restarting the server between test functions.

# This relies on the test functions only reading, never writing.

@pytest.fixture(scope="module")

def dev_server(module_scope_free_port: str) -> Generator[DevServer, None, None]:

port = module_scope_free_port

with DevServer(

port=port,

persistence_directory=_OLDER_PERSISTENCE_DIR,

) as server:

server.start()

yield server

In other words, the idea is to keep our reusable building blocks defined as plain Python functions, because they're easy to compose; and only wrap them all up in a properly-scoped fixture at the top level.

SyntaxColoring · 2022-10-03T16:38:11Z

robot-server/tests/integration/http_api/commands/test_load_module_failure.tavern.yaml

+      - free_port
      - run_server


I would refactor the free_port and run_server fixtures into a common fixture called something like server_url_base or global_server_url_base.

The value of this fixture would be a string like http://localhost:12345, pointing to a running dev server. The Tavern tests would use it like url: '{server_url_base}/runs'.

The Tavern tests would not explicitly use a run_server fixture.

SyntaxColoring · 2022-10-03T18:48:43Z

robot-server/Makefile

 tests ?= tests
 cov_opts ?= --cov=$(SRC_PATH) --cov-report term-missing:skip-covered --cov-report xml:coverage.xml
-test_opts ?=
+test_opts ?= -n auto --dist loadscope


Looking into this more, I think we have performance concerns and safety concerns, but it's not loadscope's fault, per se.

On my dual-core 2018 MacBook Air:

Serial (old) Parallel (pytest-xdist)

Run time (wall clock) 3m25s 3m45s

CPU utilization 50% 100%

Worker processes 1 4

The parallel test suite is slower despite higher CPU utilization. This suggests to me that the tests are being distributed badly.

pytest-xdist worker process can initialize fixtures redundantly if test functions are distributed badly to them. For example, say you have 4 test functions that use the same scope="session" fixture. Then, pytest-xdist happens to distribute each of those test functions to a different worker process. This will cause the fixture to execute 4 times, doing 4x more work than necessary.

My theory is that we're running into this with some of our expensive fixtures, like run_server, and the penalty of doing more work is outweighing the benefit of doing that work in parallel.

If this theory is true, any performance improvements or regressions introduced by this PR would be largely luck-based: how many cores does the machine have, and how did pytest-xdist decide to distribute across them?

To fix this, we could either:

Make run_server (and other expensive scope="session" fixtures) truly global across worker processes, so they're not executed redundantly even if pytest-xdists distributes tests badly.

This would make scope="session" fixtures behave more closely to how we'd all hope and intuitively expect.

However, because run_server is so stateful, I think this is a dangerous path. Our integration tests can and do leak into each other, and if they do that in parallel or in an unpredictable order, it will cause flakiness and confusing failures.

Help pytest-xdist distribute tests better.

Ideally, pytest-xdist would schedule things intelligently based on which tests use which fixtures. There's a good and very old proposal for this, but it seems to have stalled.

So we might have to write a custom pytest-xdist scheduler, which is a quasi-documented part of the pytest-xdist API.

SyntaxColoring · 2022-10-03T19:10:30Z

robot-server/tests/conftest.py

 @pytest.fixture(scope="session")
 def run_server(


For flakiness reasons, I don't think run_server can remain scope="session" in a parallel world.

My understanding is that run_server is used primarily (or exclusively?) by our Tavern integration tests. Historically, these tests have shared a single run_server per test session, for performance reasons.

Our Tavern tests occasionally leak and affect each other. For example, maybe one test leaves behind a run resource, which affects what subsequent tests see when they do GET /runs.

So far, the combination of these facts hasn't caused any flakiness, because pytest always runs the tests serially and in an order that's consistent in practice.

But in a pytest-xdist parallelized world, tests will be allocated to dev servers unpredictably. An integration test might flakily succeed or fail depending on what other tests happened to get allocated to the same server before it.

I think we either need to:

Run an isolated dev server for each integration test.

My understanding is that this is just prohibitively slow.

Make sure the tests that use a shared dev server do so serially, in a consistent order.

This might require writing a custom scheduler, but we might have to do that anyway because of test(robot-server): create isolation for tests and run them in parallel #11517 (comment).

y3rsh · 2022-11-22T17:23:10Z

closing, will reopen after #11682 is merged, will be pushing on that 11/29

test(robot-server): create isolation for tests and run them in parallel

7ed3100

y3rsh requested review from a team as code owners September 28, 2022 19:24

fix unused import

48e8f75

y3rsh commented Sep 30, 2022

View reviewed changes

robot-server/Makefile Outdated Show resolved Hide resolved

Update robot-server/Makefile

0da5ec3

SyntaxColoring self-requested a review September 30, 2022 13:58

SyntaxColoring reviewed Sep 30, 2022

View reviewed changes

SyntaxColoring reviewed Oct 3, 2022

View reviewed changes

y3rsh closed this Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(robot-server): create isolation for tests and run them in parallel #11517

test(robot-server): create isolation for tests and run them in parallel #11517

y3rsh commented Sep 28, 2022 •

edited

Loading

SyntaxColoring Sep 29, 2022 •

edited

Loading

SyntaxColoring Oct 3, 2022 •

edited

Loading

y3rsh Oct 7, 2022

SyntaxColoring Sep 29, 2022

SyntaxColoring Sep 29, 2022

SyntaxColoring Sep 29, 2022

SyntaxColoring Sep 30, 2022

SyntaxColoring Oct 3, 2022

y3rsh Oct 7, 2022

SyntaxColoring Sep 30, 2022

SyntaxColoring Oct 3, 2022

SyntaxColoring Oct 3, 2022 •

edited

Loading

SyntaxColoring Oct 3, 2022 •

edited

Loading

y3rsh commented Nov 22, 2022

	Serial (old)	Parallel (`pytest-xdist`)
Run time (wall clock)	3m25s	3m45s
CPU utilization	50%	100%
Worker processes	1	4

	test_opts ?= -n auto --dist loadscope
	test_opts ?= --numprocesses auto --dist loadscope

	# Module-scope to avoid the overhead of restarting the server between test functions.
	# This relies on the test functions only reading, never writing.
	@pytest.fixture(scope="module")
	def dev_server(module_scope_free_port: str) -> Generator[DevServer, None, None]:
	port = module_scope_free_port
	with DevServer(
	port=port,
	persistence_directory=_OLDER_PERSISTENCE_DIR,
	) as server:
	server.start()
	yield server

test(robot-server): create isolation for tests and run them in parallel #11517

test(robot-server): create isolation for tests and run them in parallel #11517

Conversation

y3rsh commented Sep 28, 2022 • edited Loading

Overview

Changelog

Review Requests

Risk assessment

SyntaxColoring Sep 29, 2022 • edited Loading

Choose a reason for hiding this comment

SyntaxColoring Oct 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring Oct 3, 2022 • edited Loading

Choose a reason for hiding this comment

SyntaxColoring Oct 3, 2022 • edited Loading

Choose a reason for hiding this comment

y3rsh commented Nov 22, 2022

y3rsh commented Sep 28, 2022 •

edited

Loading

SyntaxColoring Sep 29, 2022 •

edited

Loading

SyntaxColoring Oct 3, 2022 •

edited

Loading

SyntaxColoring Oct 3, 2022 •

edited

Loading

SyntaxColoring Oct 3, 2022 •

edited

Loading