-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set --no-pre-install-wheels and PEX_MAX_INSTALL_JOBS for faster builds of internal pexes #20670
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
c0702c0
to
668449b
Compare
668449b
to
8766e93
Compare
Splitting the local dists change to #20746, since that seems like it might cause independent problems, so handy to have it out, for bisection. |
This marks all local dist PEXes as internal-only, removing the ability for them to be anything but internal. This is almost true already, except for PEXes built via `PexFromTargetsRequest`, where the local dists PEX used for building the "real" PEX has the same internal status as that real PEX. In this case, the local dists PEX still isn't surfaced to users, so it's appropriate for that one to be internal too. This will probably be slightly faster in isolation (building a `pex_binary` that uses in-repo `python_distribution`s will be able to just copy them around with less zip-file manipulation, more often, by creating packed-layout PEXes). However, the real motivation is unblocking #20670, where having this PEX built with `--no-pre-install-wheels` (as internal-only PEXes will, by default) is required to support downstream PEXes using that argument, at least until pex-tool/pex#2299 is fixed. NB. there's still a separate consideration of changing how local dists are incorporated, which isn't changed or considered here: pex-tool/pex#2392 (comment)
8766e93
to
e9c5ec1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice wins!
@@ -1157,6 +1163,9 @@ async def setup_pex_process(request: PexProcess, pex_environment: PexEnvironment | |||
complete_pex_env = pex_environment.in_sandbox(working_directory=request.working_directory) | |||
argv = complete_pex_env.create_argv(pex.name, *request.argv) | |||
env = { | |||
# Set this in case this PEX was built with --no-pre-install-wheels, and thus parallelising | |||
# the install on cold boot is handy. | |||
"PEX_MAX_INSTALL_JOBS": str(request.concurrency_available), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what you were getting at in your notes, but as a rough estimate if I have 16 cores this will potentially thrash into 16 different cache entries, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in theory... depending on what else is running at the same time.
However, on closer examination prompted by your comment: I think I did this wrong:
concurrency_available
is configuration to tell the command runner how much concurrency would be useful for the process (e.g. "I'm processing 34 files, and could do so in parallel" => concurrency available = 34), not necessary how much it is actually allocated.- That allocation is communicated to the process by replacing the
{pants_concurrency}
template in its argv (to be used like ["some-process", ..., "--jobs={pants_concurrency}"]
or similar). - This replacement currently only seems to work on the argv, not env vars.
For myself, relevant code (and surrounds), of the concurrency_available
request flowing through to the replacement:
pants/src/rust/engine/process_execution/src/bounded.rs
Lines 75 to 117 in b7b0e9c
let semaphore_acquisition = self.sema.acquire(process.concurrency_available); | |
let permit = in_workunit!( | |
"acquire_command_runner_slot", | |
// TODO: The UI uses the presence of a blocked workunit below a parent as an indication that | |
// the parent is blocked. If this workunit is filtered out, parents nodes which are waiting | |
// for the semaphore will render, even though they are effectively idle. | |
// | |
// https://github.com/pantsbuild/pants/issues/14680 will likely allow for a more principled | |
// solution to this problem, such as removing the mutable `blocking` flag, and then never | |
// filtering blocked workunits at creation time, regardless of level. | |
Level::Debug, | |
|workunit| async move { | |
let _blocking_token = workunit.blocking(); | |
semaphore_acquisition.await | |
} | |
) | |
.await; | |
loop { | |
let mut process = process.clone(); | |
let concurrency_available = permit.concurrency(); | |
log::debug!( | |
"Running {} under semaphore with concurrency id: {}, and concurrency: {}", | |
process.description, | |
permit.concurrency_slot(), | |
concurrency_available, | |
); | |
// TODO: Both of these templating cases should be implemented at the lowest possible level: | |
// they might currently be applied above a cache. | |
if let Some(ref execution_slot_env_var) = process.execution_slot_variable { | |
process.env.insert( | |
execution_slot_env_var.clone(), | |
format!("{}", permit.concurrency_slot()), | |
); | |
} | |
if process.concurrency_available > 0 { | |
let concurrency = format!("{}", permit.concurrency()); | |
let mut matched = false; | |
process.argv = std::mem::take(&mut process.argv) | |
.into_iter() | |
.map( | |
|arg| match CONCURRENCY_TEMPLATE_RE.replace_all(&arg, &concurrency) { |
So, I'm gonna revert this part of the change. Thanks.
This fixes/removes the use of `PEX_MAX_INSTALL_JOBS` when running a PEX. This was added in #20670 in an attempt to sync with `--no-pre-install-wheels` and do more work in parallel when booting "internal" PEXes... but it was implemented incorrectly. This would need to be set to `PEX_MAX_INSTALL_JOBS={pants_concurrency}` or similar, and: - that substitution isn't currently supported (only argv substitutions) - it's somewhat unclear if we even want to do that at all, as it'll result in more cache misses Noticed in #20670 (comment)
This has all internal PEXes be built with settings to improve performance:
--no-pre-install-wheels
, to package.whl
directly rather than unpack and install them. (NB. this requires Pex 2.3.0 to pick up Guard against mismatched--requirements-pex
. pex-tool/pex#2392)PEX_MAX_INSTALL_JOBS
, to use more concurrency for install, when availableThis is designed to be a performance improvement for any processing where Pants synthesises a PEX internally, like
pants run path/to/script.py
orpants test ...
. pex-tool/pex#2292 has benchmarks for the PEX tool itself.For benchmarks, I did some more purposeful ones with tensorflow (PyTorch seems a bit awkward hard to set-up and Tensorflow is still huge), using https://gist.github.com/huonw/0560f5aaa34630b68bfb7e0995e99285 . I did 3 runs each of two goals, with 2.21.0.dev4 and with
PANTS_SOURCE
pointing to this PR, and pulled the numbers out by finding the relevant log lines:pants --no-local-cache --no-pantsd --named-caches-dir=$(mktemp -d) test example_test.py
. This involves building 4 separate PEXes partially in parallel, partially sequentially:requirements.pex
,local_dists.pex
pytest.pex
, and thenpytest_runner.pex
. The first and last are the interesting ones for this test.pants --no-local-cache --no-pantsd --named-caches-dir=$(mktemp -d) run script.py
. This just builds the requirements intoscript.pex
.(NB. these are potentially unrealistic in they're running with all caching turned off or cleared, so are truly a worst case. This means they're downloading tensorflow wheels and all the others, each time, which takes about 30s on my 100Mbit/s connection. Faster connections will thus see a higher ratio of benefit.)
run script.py
test some_test.py
I also did more adhoc ones on a real-world work repo of mine, which doesn't use any of the big ML libraries, just running some basic goals once.
pants export
on largest resolvepants test path/to/file.py
(1 attempt)Two explicit questions for review:
--no-pre-install-wheels
flag behaviour isn't explicitly tested... but maybe it should be, i.e. validate we're passing this flag for internal PEXes. Thoughts?PEX_MAX_INSTALL_JOBS
as an env var may result in fewer cache hits, e.g.pants test path/to/file.py
may not be able to reuse caches frompants test ::
(and vice versa) because the available concurrency is different, and thus this may be better to do differently. Thoughts?1
)0
to have each PEX process do its own concurrency, e.g. based on number of CPU cores and wheels. Multiple processes in parallel risk overloading the machine since they don't coordinate, though.Fixes #15062