app: Update Forall command to allow multiple concurrent processes #755

pdgendt · 2024-10-16T09:30:22Z

Demonstrate asynchronous behavior for the Forall command and add an argument to select the number of jobs.

The same idea can be applied to the Update command.

pdgendt · 2024-10-16T10:12:40Z

For comparison, timing counting the lines of code using cloc with 20 logical cores:

west forall -c "cloc --quiet ."  103.21s user 5.00s system 70% cpu 2:34.15 total
west forall -j -c "cloc --quiet ."  148.83s user 5.32s system 472% cpu 32.600 total

marc-hb

This looks really cool, thanks for designing and testing this! I'm afraid we still have the issue of concurrent outputs though, see below.

marc-hb · 2024-10-17T21:53:39Z

src/west/app/project.py

+            self.banner(f'running "{args.subcommand}" in {project.name_and_path}:')
+            proc = await asyncio.create_subprocess_shell(args.subcommand,
+                                                         cwd=cwd, env=env, shell=True)
+            return await proc.wait()


I think you need to capture standard outputs from different processes here so they don't interleave concurrently and randomly and become really hard to read. This could even turn into a terminal disaster if they use --color ANSI codes. I think we already touched on this question in #713 and before.

You probably tested this with relatively "quiet" and easy to read output and a reasonable number of threads... can you try again after cranking it all up? This code must be prepared to handle not just the "common" cases but all cases.

You can either use the usual (out, err) = proc.communicate(). This avoids concurrent terminal outputs from different subprocesses but it assumes process outputs are already line-based.
https://docs.python.org/3/library/asyncio-subprocess.html

So it's probably better to play it safer and use some readline() variation. I found a couple examples that seem relevant: https://kevinmccarthy.org/2016/07/25/streaming-subprocess-stdin-and-stdout-with-asyncio-in-python/
https://stackoverflow.com/questions/2804543/read-subprocess-stdout-line-by-line

The real icing on the cake would be an option that prefixes each line of the outputs with the project name! BTW:

Feature request: Ability for west grep to output relative or absolute paths #714

Eventually, that parallelization and output capture code should be generic enough to be-reused by all commands, not just forall! And especially west update where it is ... awaited (pun intended) the most (#713 etc.)

Yes, I didn't want to dive right in here, but it would be nice to get right from the get-go.

I was dabbling with the idea of having the output behave like ninja where the output line is replaced with the banner and maybe some counter indicating the progress. And when the subprocesses is done, print its output as is.

Not sure if this would require something like curses, I think it could be more lightweight.

I was dabbling with the idea of having the output behave like ninja where the output line is replaced with the banner and maybe some counter indicating the progress. And when the subprocesses is done, print its output as is

That would be awesome but I think just 1) making sure the output is readable 2) all commands use the same output "framework" would already be a major milestone and great stepping stone towards something better. And it would give what a lot of users have been waiting for: concurrency at last.

marc-hb

Important process output questions to address.

marc-hb · 2024-10-17T22:53:07Z

src/west/app/project.py

-                continue
-
+    async def run_for_project(self, project, args, sem):
+        async with sem:


I think sem is really too short. Also, it's used only twice so that does not save much.

Suggested change

async with sem:

async with semaphore:

marc-hb · 2024-10-17T22:57:22Z

tests/test_project.py

+
+    cmd('update net-tools Kconfiglib')
+
+    # print order is no longer guaranteed when there are multiple projects


~~Then the banners don't make sense anymore and should not be printed when j > 1~~

It's more complicated...

marc-hb · 2024-10-17T23:04:13Z

src/west/app/project.py

@@ -1670,16 +1671,15 @@ def do_add_parser(self, parser_adder):
        parser.add_argument('projects', metavar='PROJECT', nargs='*',
                            help='''projects (by name or path) to operate on;
                            defaults to active cloned projects''')
+        parser.add_argument('-j', '--jobs', nargs='?', const=-1,


Suggested change

parser.add_argument('-j', '--jobs', nargs='?', const=-1,

# Default to 1 when `-j` is not given because there is no way to

# whether the user commands can be run at the same time safely.

parser.add_argument('-j', '--jobs', nargs='?', const=-1,

Eventually, west grep, west update and others could default to cpu_count() if everything goes well but I think forall should always default to 1.

(such a comment also helps a bit with the peculiar default+const argparse idiom)

marc-hb

You probably tested this with relatively "quiet" and easy to read output and a reasonable number of threads... can you try again after cranking it all up? This code must be prepared to handle not just the "common" cases but all cases.

OK, I just had a test idea and this PR was pretty easy to "break" after all. The following one-liner prints readable output with -j 1 and totally jumbled up with -j > 1

west forall -j 1 -c 'i=8; while test $((i--)) -ge 0;
 do printf " $WEST_PROJECT_NAME"; sleep 0.1; done; printf "\n"'

It would be great to "upgrade" a test like this and make it part to the actual test suite. I don't know how we could make it portable to Windows.. by converting it to Python maybe? Or maybe we don't need to? I think it would be OK to have some tests skipped on some platforms, there would still be value in that.

Don't get me wrong: we DO need some tests that exert the terminal on Windows. Windows terminal issues was the reason for the 7223431 revert and that whole saga. But maybe not all tests need to run on all platforms.

pdgendt · 2024-10-18T17:04:58Z

The updated version is better, as all printing is done from a single thread, I tried to overwrite the "running" line, but it doesn't clear it.

marc-hb · 2024-10-18T17:14:31Z

Colorization was surprisingly more robust than I thought (mainly due to a lack of "toggles") but I managed to craft a test that broke your earlier version. But you just fixed it before I shared the test :-) I'm sharing that test code anyway because I think it's still useful:

west forall -j 10 -c 'i=7; while test $((--i)) -ge 0;
 do printf "\e[$((30+i))m"; sleep 0.0${RANDOM}; printf " $WEST_PROJECT_NAME-$i"; done; printf "\e[0m\n"'

need time to review the new version

pdgendt · 2024-10-21T06:51:38Z

Colorization was surprisingly more robust than I thought (mainly due to a lack of "toggles") but I managed to craft a test that broke your earlier version. But you just fixed it before I shared the test :-) I'm sharing that test code anyway because I think it's still useful:
west forall -j 10 -c 'i=7; while test $((--i)) -ge 0;
 do printf "\e[$((30+i))m"; sleep 0.0${RANDOM}; printf " $WEST_PROJECT_NAME-$i"; done; printf "\e[0m\n"'

Which shell do you use? Doesn't work properly with zsh.

marc-hb · 2024-10-21T17:05:02Z

Which shell do you use? Doesn't work properly with zsh.

The default/standard: bash. ~~Just type bash to switch temporarily.~~ likely won't affect Python.

Should be compatible with all Bourne shells: ash, ksh, etc.

Strange zsh went its own way...

marc-hb · 2024-10-21T20:15:00Z

Which shell do you use? Doesn't work properly with zsh.

In zsh, try: emulate ksh, this should get close enough to POSIX.
https://zsh.sourceforge.io/Doc/Release/Invocation.html#Compatibility
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html

What does not work?

I just checked and $(( )) is POSIX https://mywiki.wooledge.org/ArithmeticExpression
On the other hand, ${RANDOM} is most likely a bashism: replace it with 2 or 3.

A must have for any shell script: https://www.shellcheck.net/ (apt install shellcheck)

marc-hb · 2024-10-21T20:36:01Z

You just made me realize something important...

west forall -h

Runs a shell (on a Unix OS) or batch (on Windows) command
within the repository of each of the specified PROJECTs.

"A shell" is vague. On Linux this is apparently the default shell. Why not but this must be more explicit. Can you please update this help text? Also, is this a login shell or not? It matters: https://superuser.com/questions/183870/difference-between-bashrc-and-bash-profile/

Similar problem with Windows: will west forall ever use a "real" Powershell depending on the user's configuration or only the obsolete and horrible .BAT ?

In the longer term, there should be a new west forall -i zsh option to choose what interpreter gets used. Of course, the user should anyway use a separate script / layer of indirection for anything longer than 2-3 lines and then invoke west forall -c myscript.ps1. Quoting becomes impossible otherwise. So that new option is not high priority but it would be nice to have for interactive use - as just seen above.

marc-hb · 2024-10-21T22:15:38Z

"A shell" is vague. On Linux this is apparently the default shell. Why not but this must be more explicit. Can you please update this help text?

So the reason this is "vague" is because 1) it's a mess 2) accordingly, the Python documentation is vague too - intentionally! python/cpython#114539

So I think we only need a warning that defers to Python documentation here. But we do need that warning. Something like "it's a wildly non-portable and insecure mess, check the Python documentation and don't use this in automation". Something like that.

Allow passing a custom end to banner methods. This is useful for example to print a carriage return. Signed-off-by: Pieter De Gendt <[email protected]>

Demonstrate asynchronous behavior for the Forall command and add an argument to select the number of jobs. Signed-off-by: Pieter De Gendt <[email protected]>

Add test cases for running the forall command with multiple processes. Signed-off-by: Pieter De Gendt <[email protected]>

pdgendt force-pushed the async-command branch 2 times, most recently from e120eb3 to 9592b45 Compare October 16, 2024 11:27

pdgendt requested a review from marc-hb October 16, 2024 11:31

pdgendt force-pushed the async-command branch from 9592b45 to 06fd709 Compare October 16, 2024 12:09

pdgendt marked this pull request as ready for review October 17, 2024 07:30

marc-hb reviewed Oct 17, 2024

View reviewed changes

marc-hb previously requested changes Oct 17, 2024

View reviewed changes

marc-hb reviewed Oct 18, 2024

View reviewed changes

pdgendt force-pushed the async-command branch from 2634eb5 to 1ef4af1 Compare October 18, 2024 16:59

pdgendt added 3 commits November 25, 2024 14:27

app: Add optional end argument to banner print methods

09d6233

Allow passing a custom end to banner methods. This is useful for example to print a carriage return. Signed-off-by: Pieter De Gendt <[email protected]>

app: Update Forall command to allow multiple concurrent processes

db84746

Demonstrate asynchronous behavior for the Forall command and add an argument to select the number of jobs. Signed-off-by: Pieter De Gendt <[email protected]>

tests: Add test for 'forall' with jobs

2d8ebf2

Add test cases for running the forall command with multiple processes. Signed-off-by: Pieter De Gendt <[email protected]>

pdgendt force-pushed the async-command branch from 1ef4af1 to 2d8ebf2 Compare November 25, 2024 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app: Update Forall command to allow multiple concurrent processes #755

app: Update Forall command to allow multiple concurrent processes #755

pdgendt commented Oct 16, 2024

pdgendt commented Oct 16, 2024

marc-hb left a comment

marc-hb Oct 17, 2024 •

edited

Loading

pdgendt Oct 18, 2024

marc-hb Oct 18, 2024

marc-hb left a comment

marc-hb Oct 17, 2024

marc-hb Oct 17, 2024 •

edited

Loading

marc-hb Oct 17, 2024

marc-hb left a comment •

edited

Loading

pdgendt commented Oct 18, 2024

marc-hb commented Oct 18, 2024

pdgendt commented Oct 21, 2024

marc-hb commented Oct 21, 2024 •

edited

Loading

marc-hb commented Oct 21, 2024

marc-hb commented Oct 21, 2024

marc-hb commented Oct 21, 2024


		cmd('update net-tools Kconfiglib')

		# print order is no longer guaranteed when there are multiple projects

app: Update Forall command to allow multiple concurrent processes #755

Are you sure you want to change the base?

app: Update Forall command to allow multiple concurrent processes #755

Conversation

pdgendt commented Oct 16, 2024

pdgendt commented Oct 16, 2024

marc-hb left a comment

Choose a reason for hiding this comment

marc-hb Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

pdgendt Oct 18, 2024

Choose a reason for hiding this comment

marc-hb Oct 18, 2024

Choose a reason for hiding this comment

marc-hb left a comment

Choose a reason for hiding this comment

marc-hb Oct 17, 2024

Choose a reason for hiding this comment

marc-hb Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

marc-hb Oct 17, 2024

Choose a reason for hiding this comment

marc-hb left a comment • edited Loading

Choose a reason for hiding this comment

pdgendt commented Oct 18, 2024

marc-hb commented Oct 18, 2024

pdgendt commented Oct 21, 2024

marc-hb commented Oct 21, 2024 • edited Loading

marc-hb commented Oct 21, 2024

marc-hb commented Oct 21, 2024

marc-hb commented Oct 21, 2024

marc-hb Oct 17, 2024 •

edited

Loading

marc-hb Oct 17, 2024 •

edited

Loading

marc-hb left a comment •

edited

Loading

marc-hb commented Oct 21, 2024 •

edited

Loading