Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(upgrade): Add total progress bar and JSON output #921

Merged
merged 1 commit into from
Dec 13, 2024

Conversation

antheas
Copy link
Contributor

@antheas antheas commented Nov 25, 2024

Adds a total progress bar to the progress output of switch and upgrade that also shows time remaining.

Then, uses the total progress bar to allow sending a jsonl output to stderr that can be read by userspace software. Has to be stderr, as bootc has some spurious prints on stdout. Progressbar is hidden as it writes to stderr.

Progress bar styling needs some cleanup. Jsonl output needs a type arg to identify the type of output, so that there can be other jsonl jsons for e.g., deployment.

@github-actions github-actions bot added the area/install Issues related to `bootc install` label Nov 25, 2024
@antheas
Copy link
Contributor Author

antheas commented Nov 25, 2024

Here are previews of the outputs

image

image

@antheas
Copy link
Contributor Author

antheas commented Nov 25, 2024

Well there are spurious prints on both stderr and stdout, so there is that. Deployment spinner and GLib errors are also stderr.

@cgwalters
Copy link
Collaborator

Thanks for working on this! I've been oscillating a lot on API/RPC frameworks in #522

"jsonl" is yet another option there that is more targeted just for progress reporting (which is indeed mainly what we need now).

But - but - if we do that I'd really like to survey the field and see whether there are any other similar progress APIs.

For example, I think this API right now would constrain us to fetching a single layer at a time ; we could add a layer identifier to the metadata for example? Basically what we include in jsonl should be sufficient to render a "multi-progress" bar.

@antheas
Copy link
Contributor Author

antheas commented Nov 26, 2024

This PR has some issues right now with spurious prints. I will try to fix them today and add different stages for deploy steps.

Incl. Fixing the failed test

As far as multilayer reporting goes, the way I did this PR is for overall progress so it does not make a difference how many layers are being downloaded at a time. The syntax could be extended with a layers dictionary in the future though.

@antheas
Copy link
Contributor Author

antheas commented Nov 26, 2024

Ok, tried to fix spurious prints. Seems to be impossible.

Since println! and eprintln! are used liberally, the functions cannot be dual purposed to json output.

Other than piping stdout to stderr with set_print() which is not in the rust stable API.

@cgwalters
Copy link
Collaborator

Right, we should have e.g. --json-progress-fd=<fdnum> which would be the write half of a pipe() e.g.

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking requested changes per discussion

(Also other minor bits, you need to run cargo fmt)

lib/src/deploy.rs Outdated Show resolved Hide resolved
@cgwalters
Copy link
Collaborator

Before we go too much farther on this can you give a read and comment on #522 too please? Specifically WRT "jsonl" vs varlink vs gRPC and your use case.

@antheas antheas mentioned this pull request Nov 26, 2024
Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments.

Do you mind if I force push some updates to this PR and we can co-author it?

lib/src/cli.rs Outdated Show resolved Hide resolved
@@ -45,6 +46,17 @@ pub(crate) struct ImageState {
pub(crate) ostree_commit: String,
}

/// Download information
#[derive(Debug, serde::Serialize)]
pub struct JsonProgress {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In combination with my suggestion of having a mod jsonl I think this struct would go there. Just to future proof things a bit more I'd like to try to make this a little bit more abstract so that it could handle slightly more generic tasks.

For example, a fsck like operation wouldn't have "done_bytes" and "download_bytes" at least in that same sense, but it'd be good to have support for a similar progress bar for it.

In rpm-ostree we have such a generic progress API (over DBus), it's a bit clunky but see https://github.com/coreos/rpm-ostree/blob/f4fa2c5fe78774775ef3ce4f27d4199fac48f7a9/rust/src/console_progress.rs#L20

Basically I think we can structure this so clients can view progress data in a progressively more complex fashion; if they don't care about the details for an operation then can just take a simple "n/total" numbers and that could be pretty easily reused for an operation like fsck.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am usually using interpreted languages, so I can distinguish between different JSON objects based on "stage". However, if that is not the case for something like rust, then this struct would need to go into another struct as a union on the bottom.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Rust serde has good support for most JSON enum conventions: https://serde.rs/enum-representations.html

@antheas
Copy link
Contributor Author

antheas commented Nov 26, 2024

Go ahead and push

@cgwalters
Copy link
Collaborator

OK, I pushed a minor cleanup - stepping out for the day and I am in theory out the rest of this week, so releasing my mutex on this one.

Overall...I think my disposition is to just get this jsonl stuff in and we can sit on the larger IPC question for later.

One thing I'd still really love to know though: is anyone else out there doing something like this (specifically jsonl for progress bars?).

Oh hum, there's already a crate for this and it has a fair number of reverse dependencies. That's encouraging. Only briefly skimming the source to one of them, this code looks quite related.

Like ideally we share a "schema" with an existing project here or at least something similar. (Maybe a good way to think of it is: could the schema we invent here be sufficiently generic that one could use it for e.g. dnf or apt too?)

@antheas
Copy link
Contributor Author

antheas commented Nov 26, 2024

Neither the format or the concept behind this are particularly novel. The combination perhaps is.

Jsonlines is widely used in machine learning because it is dead simple, to the point it is not worth using a library for (unless performance is required), and supports streaming and compression due to it having separators very often.

Binding JSON to a UI is pretty standard data-binding too (e.g., look at React, and especially backends such as Redux).

I think there is a bit of a rough edge due to the fact that cli commands are used together with json, so those two words are mixed.

If you want some ideas for IPC, when making hhd, I took the json idea a bit to the extreme, and did server side rendering over json, using a set of pre-defined components (the spec is here).

Essentially, the api/settings endpoint sends a JSON of the currently available settings, one of which may be a progress bar, and then there is an api/state endpoint that contains all of the current values and updates similarly to jsonl. It also does internalization too, so all UIs share the translations. The UI just shows the settings that it was given and updates their values, with hhd acting as the model and controller in MVC.

That document was last updated in Dec. I think one month into making hhd and it is almost a year later now. There have been no breaking changes to that API since then, so I'd say the idea works pretty well. There are around 20x more settings now, and the project is 5x the size.

The downside of that API is that it decides how the UI should be structured, and it makes it hard to interact with it over cli. But since the UI works so well now nobody asks for that. As for the UI, the only updates it has are to add stylized components (here). It is actually quite nice to be able to add features without having to update the UI(s).

I will try to work through this PR this week and I think we will try to deploy this PR soon too. Just the progress bar itself is a nice QOL.

pub downloaded: u64,
pub size: u64,
/// "cached", "waiting", "downloading", "done"
pub status: String,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an enum

@antheas
Copy link
Contributor Author

antheas commented Nov 27, 2024

Turns out the mutex was needed after all. Progress reader spins a thread so it consumed the writer

Added deployment stages as well:

> sudo /tmp/bootc switch ghcr.io/ublue-os/bazzite:testing --json-fd=0 --retain 2> /dev/null
{"stage":"fetch","bytes_done":199213658,"bytes_download":205161290,"bytes_total":4816175316,"layers_done":2,"layers_download":3,"layers_total":73,"layers":null}
{"stage":"fetch","bytes_done":201337882,"bytes_download":205161290,"bytes_total":4816175316,"layers_done":2,"layers_download":3,"layers_total":73,"layers":null}
{"stage":"fetch","bytes_done":202688154,"bytes_download":205161290,"bytes_total":4816175316,"layers_done":2,"layers_download":3,"layers_total":73,"layers":null}
{"stage":"fetch","bytes_done":204013466,"bytes_download":205161290,"bytes_total":4816175316,"layers_done":2,"layers_download":3,"layers_total":73,"layers":null}
Fetched layers: 195.66 MiB in 63 seconds (3.12 MiB/s)
{"stage":"deploy","n_steps":3,"step":0,"name":"deploying"}
Deploying...done (13 seconds)
{"stage":"deploy","n_steps":3,"step":1,"name":"pulling_bound_images"}
{"stage":"deploy","n_steps":3,"step":2,"name":"cleaning_up"}
Pruned images: 1 (layers: 6, objsize: 597.9 MB)
Queued for next boot: ghcr.io/ublue-os/bazzite:testing
  Version: testing-41.20241125.1
  Digest: sha256:97e85bdef72416134bde5afc7083a0222baa6fdc12a5d50a14cace1afd431d81

Inbetween these two lines there is a very suspicious 10-40s gap. So something needs to be done there progress wise.

{"stage":"fetch","bytes_done":204013466,"bytes_download":205161290,"bytes_total":4816175316,"layers_done":2,"layers_download":3,"layers_total":73,"layers":null}
Fetched layers: 195.66 MiB in 63 seconds (3.12 MiB/s)

@antheas
Copy link
Contributor Author

antheas commented Nov 27, 2024

$ sudo /tmp/bootc switch ghcr.io/ublue-os/bazzite-deck-gnome:testing --json-fd=0 --quiet
{"stage":"fetch","bytes_done":196820234,"bytes_download":198878861,"bytes_total":5766354157,"layers_done":4,"layers_download":5,"layers_total":73,"layers":null}
{"stage":"fetch","bytes_done":198753546,"bytes_download":198878861,"bytes_total":5766354157,"layers_done":4,"layers_download":5,"layers_total":73,"layers":null}
{"stage":"fetch","bytes_done":198878861,"bytes_download":198878861,"bytes_total":5766354157,"layers_done":5,"layers_download":5,"layers_total":73,"layers":null}
Fetched layers: 189.67 MiB in 7 seconds (26.19 MiB/s)
Committing image to OSTree
{"stage":"deploy","n_steps":5,"step":0,"name":"committing"}
{"stage":"deploy","n_steps":5,"step":1,"name":"sanity_check"}
{"stage":"deploy","n_steps":5,"step":2,"name":"deploying"}
⠙ Deploying                                                                                                                                                           
(bootc:13285): GLib-CRITICAL **: 22:09:10.120: g_atomic_ref_count_dec: assertion 'old_value > 0' failed
⠒ Deploying                                                                                                                                                           
(bootc:13285): GLib-CRITICAL **: 22:09:10.506: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(bootc:13285): GLib-CRITICAL **: 22:09:10.507: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(bootc:13285): GLib-CRITICAL **: 22:09:10.507: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(bootc:13285): GLib-CRITICAL **: 22:09:10.534: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(bootc:13285): GLib-CRITICAL **: 22:09:10.536: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(bootc:13285): GLib-CRITICAL **: 22:09:10.536: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(bootc:13285): GLib-CRITICAL **: 22:09:10.541: g_atomic_ref_count_dec: assertion 'old_value > 0' failed
⠂ Deploying                                                                                                                                                           
(bootc:13285): GLib-CRITICAL **: 22:09:10.557: g_atomic_ref_count_dec: assertion 'old_value > 0' failed
  Deploying: done (12 seconds)                                                                                                                                        {"stage":"deploy","n_steps":5,"step":3,"name":"pulling_bound_images"}
{"stage":"deploy","n_steps":5,"step":4,"name":"cleaning_up"}
Pruned images: 1 (layers: 6, objsize: 996.9 MB)
Queued for next boot: ghcr.io/ublue-os/bazzite-deck-gnome:testing
  Version: testing-41.20241125.1
  Digest: sha256:8265108a96bf3402c377b146a4071215e337d8ef84026a6e886981684bf57381

Looks a lot more granular and nicer now. Corruption bug still there.

@cgwalters
Copy link
Collaborator

Hmm, I think "add update --check json" is scope creeping this; it relates to #472 (comment)

Let's keep this one to what isn't in status --json already; the biggest thing there is just the progress notifications.

@antheas
Copy link
Contributor Author

antheas commented Dec 3, 2024

Hmm, I think "add update --check json" is scope creeping this; it relates to #472 (comment)

Let's keep this one to what isn't in status --json already; the biggest thing there is just the progress notifications.

I guess I was not aware of the following:
#472 (comment)

Sure, we can strip that commit

@cgwalters cgwalters assigned cgwalters and unassigned antheas Dec 3, 2024
@cgwalters
Copy link
Collaborator

Let me take this one back for a bit

@cgwalters
Copy link
Collaborator

(bootc:13285): GLib-CRITICAL **: 22:09:10.120: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

I'm not getting that one here...that's quite worrying. Can you try to get a core dump + stack via e.g. env G_DEBUG=fatal-warnings bootc ... ?

@antheas
Copy link
Contributor Author

antheas commented Dec 5, 2024

Task and id would change. Yes, the caller would assume they completed. Then, the caller could hardcode a ratio between the pull task and the deploy task to show a single progress bar. Or the caller could show two different progress bars.

So it's possible now that an entire "task" could be skipped in theory if it happened fast enough. I guess it's a bit of a corner case: do we care about showing an 100% complete bar if the task happened fast enough? I feel like this would matter a little bit more if we changed the default TTY renderer to consume this stream? But...eh I can't claim this is a serious problem either, I just still find myself leaning towards the previous schema.

I guess so, well, i can partially undo the change and fire the last message as required. This seems like a sane choice.

@cgwalters
Copy link
Collaborator

I guess so, well, i can partially undo the change and fire the last message as required. This seems like a sane choice.

Yeah that'd help. Actually I started to write tests for this and that's where the lossiness could definitely show up as a flake. Always emitting a message when changing tasks seems good enough probably.

@antheas
Copy link
Contributor Author

antheas commented Dec 5, 2024

I still cannot get it to work still. I thought it was inheritability, indeed os.pipe is not inheritable. However, even even with os.set_inheritable(w, True) (or pipe2 which is the same) it still does not work.

fatal runtime error: IO Safety violation: owned file descriptor already closed

import subprocess
import os

proc = None
try:
    r, w = os.pipe2(0)
    proc = subprocess.Popen(
        [
            "sudo",
            "/tmp/bootc",
            "switch",
            "ghcr.io/ublue-os/bazzite:unstable",
            f"--json-fd={w}",
        ],
    )

    with os.fdopen(r) as f:
        while True:
            line = f.readline()
            if not line:
                break
            print(line, end="")

    proc.wait()

except KeyboardInterrupt:
    if proc:
        proc.terminate()

@cgwalters
Copy link
Collaborator

You need to provide the pass_fds to Popen

@antheas
Copy link
Contributor Author

antheas commented Dec 6, 2024

I tried both individually. You are right, it needed both inheritability and pass_fds together

@antheas
Copy link
Contributor Author

antheas commented Dec 6, 2024

I readded staging and import as separate stages.

Then fixed up the script and made it close correctly after the image is merged.

import subprocess
import os

proc = None
try:
    r, w = os.pipe2(0)

    os.set_inheritable(r, True)

    proc = subprocess.Popen(
        [
            "/tmp/bootc",
            "switch",
            "ghcr.io/ublue-os/bazzite:testing",
            f"--json-fd",
            str(w),
        ],
        pass_fds=[w],
        stderr=subprocess.DEVNULL,
    )

    # Close fd in parent process
    os.close(w)

    with os.fdopen(r) as f:
        while proc.poll() is None:
            line = f.readline()
            if not line:
                break
            print(line, end="")

    proc.wait()

except KeyboardInterrupt:
    if proc:
        proc.terminate()

@antheas
Copy link
Contributor Author

antheas commented Dec 6, 2024

I fixed the style to match bootc status --json.

And I think its ready to fork and use for my usecase/it's getting close to be mergeable.

@antheas
Copy link
Contributor Author

antheas commented Dec 7, 2024

And here it is
hhd-dev/hhd@de34916

bootc_update.webm

@cgwalters
Copy link
Collaborator

Juggling a whole lot here but I should be able to circle back to this one by end of the week. One thing that would be helpful (that I was planning to do, to be clear) is add integration testing for this in our tmt-based tests (the current unit test is too synthetic). If you or someone else wants to jump on that it'd help, but again since the current test suite is a bit bespoke and a bit of an investment to learn I was planning to do it myself (also to help get familiar with the code).

The other thing we probably should do is have a basic doc for this in docs/src/bootc-via-api.md or so.

@antheas
Copy link
Contributor Author

antheas commented Dec 10, 2024

No rush from my side, i pinned the current hash on a different branch so that we can deploy that.

I can try to work a bit on the docs later in the week but I am also busy this week so I do not think I can jump on the tests.

@mvo5
Copy link
Contributor

mvo5 commented Dec 12, 2024

Just a quick drive-by/sidenote - in osbuild we had a similar discussion and have a similar json based progress/status format, we went with jsonseq though mostly because there is https://datatracker.ietf.org/doc/html/rfc7464 which defines it. But it seems there is no clear "standard" in the json streaming world so jsonn is a fine choice too.

@antheas
Copy link
Contributor Author

antheas commented Dec 12, 2024

@mvo5 so essentially it is the same here but we prepend with 0x1E?

It is something to consider. I will leave the choice up to Colin.

@mvo5
Copy link
Contributor

mvo5 commented Dec 12, 2024

@mvo5 so essentially it is the same here but we prepend with 0x1E?

Yes, that is my understanding too.

It is something to consider. I will leave the choice up to Colin.

Yeah, I have no strong opinion either way, they are all so similar, we were on the fence flipped a coin (well, picked it because ofthe rfc)

@cgwalters
Copy link
Collaborator

The low level protocol aspect ties into the higher level question of whether there's any way we could share a schema for the actual JSON, I'd love to be able to do that. I mean if we land this as is in bootc, surely at some point we'd end up bridging the bootc json progress into an osbuild progress?

Is the osbuild progress API stable or could we change it?

On the question of jsonl vs jsonseq, well...you know what's kind of maddening honestly is that varlink chose to terminate with a NUL byte instead. Honestly none of these are really difficult to handle...

jsonl seems maybe more widely used because it's just so simple. But I would say IMO varlink's NUL terminator is actually the most obvious here and even though it's not a RFC, it is at least used.

So umm...I dunno, I guess if I had to pick between all 3 of these things I'd say let's go with the varlink style NUL.

But ultimately again it's a less interesting question than trying to share a progress schema right?

@antheas
Copy link
Contributor Author

antheas commented Dec 12, 2024

I'd rather stay with jsonseq or jsonl. Both are separated with \n so they can be readlined

Then jsonseq has the extra delimiter at the start that can be skipped

NUL would make a lot of file handlers close

@cgwalters
Copy link
Collaborator

NUL would make a lot of file handlers close

I don't think that's true, do you have a reference? Go and Rust both have nice high level APIs for this. I am sure there is something in Python but if it exists in the stdlib I couldn't find it in a quick search (or query with a LLM)

Anyways though, yeah I don't think this matters hugely, I am also totally fine with jsonl or jsonseq. Though AFAICS nothing in the serde_json docs actually guarantees it won't emit a literal newline so maybe that's an argument for jsonseq (or trying to add that guarantee to serde_json).

@antheas
Copy link
Contributor Author

antheas commented Dec 12, 2024

NUL would make a lot of file handlers close

I don't think that's true, do you have a reference? Go and Rust both have nice high level APIs for this. I am sure there is something in Python but if it exists in the stdlib I couldn't find it in a quick search (or query with a LLM)

Anyways though, yeah I don't think this matters hugely, I am also totally fine with jsonl or jsonseq. Though AFAICS nothing in the serde_json docs actually guarantees it won't emit a literal newline so maybe that's an argument for jsonseq (or trying to add that guarantee to serde_json).

I got confused, EOL is different. But in any case, sticking \0 in the middle of the output is bound to cause (unnecessary) problems.

jsonseq still appends \n, its the same standard. But it has 0x13 as a backup. In that way it is better because you can e.g., pretty print the output optionally. So I'd support using that instead. The wart with it is that you have to "eat" 0x13 if you dont stop on it but its not a big deal and I can do that when this merges.

@cgwalters
Copy link
Collaborator

@mvo5 I'm looking at osbuild/osbuild@a1eaf3d#diff-0d30cb1d123e195a69ca1c376d6493f22171312db57f78f468d1481b3bfbaa06R93 and thinking about this.

It looks like in that schema there's no explicit separation like we have now between "byte downloads" vs "steps"...that seems desirable here. Something like an optional "disposition": "bytes" maybe?

It also looks like there's no "total number of bytes/steps"...one could hack that by I guess just creating progress entries for them too that stay at zero, and encourage UI renderers to not show progress bars for steps that are still at zero?

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some things I'd like to tweak and get tested more but we just cut a release, and I'd like this to get some max soak time in git main and our continuous releases so let's get this in.

This adds a generic "progress" infrastructure for granular
incremental notifications of downloading in particular, but
we may extend this to other generic tasks in the future too.

Signed-off-by: Antheas Kapenekakis <[email protected]>
Signed-off-by: Colin Walters <[email protected]>
@cgwalters
Copy link
Collaborator

Just rebased 🏄 (to get CI fixes from git main) and squashed ⏬ (because I tend to prefer a relatively lean "git log" even though renovate ruins it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/install Issues related to `bootc install`
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants