Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert generated image.json into the ostree commit #2811

Merged
merged 3 commits into from
Apr 29, 2022

Conversation

cgwalters
Copy link
Member

@cgwalters cgwalters commented Apr 13, 2022

Depends #2806


Serialize grub-script literally into image.json

Prep for injecting image.json into the ostree commit. To make
that meaningful, it has to be entirely independent of coreos-assembler.

This bit is crucial for being able to do image builds in a manner
independent of coreos-assembler's content, using solely the ostree
commit as input.


Insert generated image.json into the ostree commit

This is part of coreos/fedora-coreos-tracker#1151

Our generated disk images are largely just a "shell" around the egg
of an ostree commit. There is almost nothing that lives
in the disk image that isn't in the commit.

(This is especially true now that a preparatory commit previous to
this moved the content of our static grub.cfg into image.json)

In the original coreos-assembler design I'd tried to cleanly
separate builds of the ostree from disk image builds, but also
support linking them together (with matching version numbers, etc.)
The separate image.yaml was part of this. This...mostly worked.

This change furthers that separation by having image builds input from
just the ostree commit. Crucially we would no longer need the config
git repository to perform an image build.

And this in turn unlocks truly better separating ostree builds from
disk image builds in the pipeline and supporting
downstream tooling generating disk images from custom containers.

One neat thing here is we will finally fix a longstanding issue
where coreos-assembler fails when just the image.yaml changes:

Closes: #972


@openshift-ci
Copy link

openshift-ci bot commented Apr 13, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jlebon
Copy link
Member

jlebon commented Apr 13, 2022

Can we hold this until we discuss #2685 with the larger group? I think this would be a good cabal topic.

Prep for injecting `image.json` into the ostree commit.  To make
that meaningful, it has to be entirely independent of coreos-assembler.

This bit is crucial for being able to do image builds in a manner
independent of coreos-assembler's content, using solely the ostree
commit as input.
@cgwalters
Copy link
Member Author

Can we hold this until we discuss #2685 with the larger group? I think this would be a good cabal topic.

Note that this is just an incremental step - it still doesn't change in a fundamental way how the pipelines work. Yet 😄

So...personally, I think unit testing most of the coreos-assembler code
in practice is not very useful.  We have good integration tests
that cover things.
@cgwalters cgwalters force-pushed the imagejson-in-ostree branch 2 times, most recently from cdec04e to 782e94b Compare April 13, 2022 23:45
@cgwalters
Copy link
Member Author

On one hand, it's great that our CI covers all the image variants. About 95% of the changes made here have 0% chance to break them, but this one actually does touch on all the weird corners. Cool to see CI successfully finding problems.

On the other hand, holy cow is the code here a mess of shell script, heavily dynamic object-oriented python, and "script python".

This is part of coreos/fedora-coreos-tracker#1151

Our generated disk images are largely just a "shell" around the egg
of an ostree commit.  There is almost nothing that lives
in the disk image that isn't in the commit.

(This is especially true now that a preparatory commit previous to
 this moved the *content* of our static `grub.cfg` into `image.json`)

In the original coreos-assembler design I'd tried to cleanly
separate builds of the ostree from disk image builds, but also
support linking them together (with matching version numbers, etc.)
The separate `image.yaml` was part of this.  This...mostly worked.

This change furthers that separation by having image builds input from
*just the ostree commit*.  Crucially we would no longer need the config
git repository to perform an image build.

And this in turn unlocks truly better separating ostree builds from
disk image builds in the pipeline *and* supporting
downstream tooling generating disk images from custom containers.

One neat thing here is we will finally fix a longstanding issue
where coreos-assembler fails when just the `image.yaml` changes:

Closes: coreos#972
@cgwalters cgwalters force-pushed the imagejson-in-ostree branch from 782e94b to 089f9cd Compare April 14, 2022 13:52
@cgwalters
Copy link
Member Author

/retest

1 similar comment
@cgwalters
Copy link
Member Author

/retest

@cgwalters
Copy link
Member Author

OK, got a green jenkins run here 🎉 ! The RHCOS failures are unrelated github ratelimits I think, will do another retry later.

@cgwalters
Copy link
Member Author

/retest

@cgwalters cgwalters requested a review from miabbott April 26, 2022 12:58
@cgwalters
Copy link
Member Author

OK so...while this isn't directly blocking anything at the moment, I'd like to make forward progress on this too.

@miabbott
Copy link
Member

miabbott commented Apr 28, 2022

This looks ok to me; I'd like to do some sanity tests that the affected artifacts build successfully.

Specifically the ppc64le build; @ravanelli do you have some cycles to test this PR in a ppc64le setting?

I'll try to get a ppc64le machine from Beaker, too, so that I can try testing it.

@miabbott
Copy link
Member

This looks ok to me; I'd like to do some sanity tests that the affected artifacts build successfully.

Specifically the ppc64le build; @ravanelli do you have some cycles to test this PR in a ppc64le setting?

I'll try to get a ppc64le machine from Beaker, too, so that I can try testing it.

I should have realized that all the affected artifacts are already built as part of the CI jobs, so my local testing is kind of moot.

Still proceeding with some ppc64le testing

@miabbott
Copy link
Member

I grabbed a BM ppc64le system from Beaker and built this PR into a cosa container. I was able to successfully build FCOS, too

But when I booted the FCOS build initially, I didn't see /usr/share/coreos-assembler/image.json present on the system as I expected.

I did see the layer in the tmp/repo location and even found the file in the final ostree commit:

[coreos-assembler]$ ostree --repo=/srv/tmp/repo refs
overlay/16disable-zincati-and-pinger
overlay/05core
cosa-image-json
overlay/09misc
overlay/20platform-chrony
fedora/ppc64le/coreos/testing-devel
overlay/14NetworkManager-plugins
overlay/35coreos-iptables
overlay/15fcos
overlay/08nouveau
35.20220429.dev.0
[coreos-assembler]$ ostree --repo=/srv/tmp/repo rev-parse 35.20220429.dev.0
0dbf6f1efdb716566facaa5f02ddd3c158a0c17dce827b5ee06d3a1eddc21255
[coreos-assembler]$ ostree --repo=/srv/tmp/repo cat 0dbf6f1efdb716566facaa5f02ddd3c158a0c17dce827b5ee06d3a1eddc21255 /usr/share/coreos-assembler/image.json | jq .
{
  "bootfs": "ext4",
  "bootfs_metadata_csum_seed": true,
  "deploy-via-container": false,
  "extra-kargs": [
    "mitigations=auto,nosmt"
  ],
  "grub-script": "set pager=1\n# petitboot doesn't support -e and doesn't support an empty path part\nif [ -d (md/md-boot)/grub2 ]; then\n  # fcct currently creates /boot RAID with superblock 1.0, which allows\n  # component partitions to be read directly as filesystems.  This is\n  # necessary because transposefs d,
  "ignition-network-kcmdline": [],
  "ostree-format": "oci",
  "ostree-remote": "fedora",
  "rootfs": "xfs",
  "rootfs-args": "",
  "size": 10,
  "squashfs-compression": "zstd",
  "vmware-hw-version": 13,
  "vmware-os-type": "fedora64Guest"
}
[coreos-assembler]$ cosa run
KVM: Failed to create TCE64 table for liobn 0x71000003
KVM: Failed to create TCE64 table for liobn 0x80000000
[QEMU guest is booting] Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM                                                                                                                          
Falling back to kernel-irqchip=off
[QEMU guest is booting] [    0.804040] virtio-pci 0000:00:00.0: ibm,query-pe-dma-windows(2026) 0 8000000 20000000 returned 0KVM: Failed to create TCE64 table for liobn 0x80000001
Fedora CoreOS 35.20220429.dev.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos
                                                                               
Last login: Fri Apr 29 14:15:37 2022
[core@cosa-devsh ~]$ ls /usr/shared/coreos-assembler/image.json
ls: cannot access '/usr/shared/coreos-assembler/image.json': No such file or directory

Even more bizarre, if I repeated cosa run, the file did appear eventually?

[coreos-assembler]$ cosa run                                                   
KVM: Failed to create TCE64 table for liobn 0x71000003
KVM: Failed to create TCE64 table for liobn 0x80000000
[QEMU guest is booting] Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM                                                                                                                          
Falling back to kernel-irqchip=off                                            
[QEMU guest is booting] [    0.802855] virtio-pci 0000:00:00.0: ibm,query-pe-dma-windows(2026) 0 8000000 20000000 returned 0KVM: Failed to create TCE64 table for liobn 0x80000001                                                                                                                                            
Fedora CoreOS 35.20220429.dev.0                                      
Tracker: https://github.com/coreos/fedora-coreos-tracker                  
Discuss: https://discussion.fedoraproject.org/tag/coreos            
                                                                               
Last login: Fri Apr 29 14:20:04 2022                                           
[core@cosa-devsh ~]$ ls -l /usr/share/coreos-assembler/image.json        
-rw-r--r--. 2 root root 3308 Jan  1  1970 /usr/share/coreos-assembler/image.json                                                                                           
[core@cosa-devsh ~]$ exit                                                      
logout                                                                      
[SESSION] Clean exit from SSH, terminating instance                   
                                                                               
[coreos-assembler]$ cosa run                                     
KVM: Failed to create TCE64 table for liobn 0x71000003           
KVM: Failed to create TCE64 table for liobn 0x80000000                                                                                                                                                                                                                                                                        
[QEMU guest is booting] Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM                                                                                                                          
Falling back to kernel-irqchip=off         
[QEMU guest is booting] [    0.816020] virtio-pci 0000:00:00.0: ibm,query-pe-dma-windows(2026) 0 8000000 20000000 returned 0KVM: Failed to create TCE64 table for liobn 0x80000001                                                                                                                                            
Fedora CoreOS 35.20220429.dev.0 
Tracker: https://github.com/coreos/fedora-coreos-tracker  
Discuss: https://discussion.fedoraproject.org/tag/coreos
                                                                               
Last login: Fri Apr 29 14:21:36 2022              
[core@cosa-devsh ~]$ ls -l /usr/share/coreos-assembler/image.json
-rw-r--r--. 2 root root 3308 Jan  1  1970 /usr/share/coreos-assembler/image.json            

I did cosa clean and then cosa fetch && cosa build again, but this time didn't encounter the missing file...not sure what I did differently.

@cgwalters
Copy link
Member Author

Isn't it just that you had an extra d on the first attempt?

$ ls /usr/shared/coreos-assembler/image.json

@miabbott
Copy link
Member

Isn't it just that you had an extra d on the first attempt?

$ ls /usr/shared/coreos-assembler/image.json

facepam

@cgwalters
Copy link
Member Author

OK I know there's some hesitancy on this but I think it's going to work and not break anything 😄

If it does, we can revert and/or use the stable cosa without much harm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RHCOS pipeline fails if just image.yaml changes
3 participants