-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/mounting squashfs extract #514
Fix/mounting squashfs extract #514
Conversation
It seems to me that in the original code, if we do 20 ExtractSingleSquash()s, and are root in a non-init user namespace with full UID range, we will only try kernel mounting once, but with your updated code we will try it every time we call ExtractSinglesquash(). Maybe that doesn't matter, as that surely must be a very rare case. |
In the original code it basically tries each option in order every time (kernel-mouint, squashfuse, unsquashfs). It never evicts an option from the list due to failure. The new code tries each option in order only once and then subsequently uses only the option that worked. I tried to explain that inline. I also tried to make it readable, but possibly failed on that. I'm open to suggestions on how to make it better. I also acknowledge that there is a problem where if the first time ExtractSingleSquash is called it is given a squashfs image that does not work with kernel mounting but did work with squashfuse. it would never try kernel mounting again. i dont' think this is a really big deal. The problem really is just that we don't have a good way to determine why something failed. |
0fc8954
to
844c80a
Compare
521ee24
to
2be7900
Compare
I'm happy with the state of this PR now, but I kind of expect it to fail c-i. I'm attaching a test script that shows the failure that i'm hitting. One thing to note... current stacker main branch does not see this error, although it should. The reason it does not is that the kernel mount gets done once, then ExtratSingleSquash gets called again and the kernel mount fails (because it is already mounted) and a squashfuse mount is put over the kernel mount. the end result is that master uses squashfuse and thus doesn't show the problem. The failed run output with this branch is below for easy viewing.
|
this looks like another place where caller needs to accept that squashsfs/overlay can return EOPNOTSUPP for some files while other files support it |
2be7900
to
39984ea
Compare
Codecov Report
@@ Coverage Diff @@
## main #514 +/- ##
==========================================
- Coverage 13.77% 13.34% -0.43%
==========================================
Files 40 40
Lines 5671 5852 +181
==========================================
Hits 781 781
- Misses 4762 4943 +181
Partials 128 128
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
It is fixed on my system by building with opencontainers/umoci#506 |
Some status on where we are right now on this. |
We could keep inside the stacker binary an empty squashfs; every time stacker starts, we try to mount it from and to a tmpdir, and use that to inform whether to use kernel mounting... It may seem wasteful, but apart from not having to do the Once() for the first kernel mount of an image, it also prevents us from misinterpreting an attempted mount of a bad squashfs blob as insufficient privilege. |
I kind of like it.
4096 bytes isn't too wasteful. |
39984ea
to
b8fba13
Compare
unpackOne required the caller to provide it with a boolean 'isSquashfs' which then made each caller have to consider the layer type. Update unpackOne to take a Descriptor and let it do the determination of how to unpack the layer in a single place. The result of calling unpackLayer is either error, or the contents available at the provided path. The caller does not have to check if the content is already present. Also here, drop the 'storage' parameter to ExtractSingleSquash that had become unused. Signed-off-by: Scott Moser <[email protected]>
Overall, there are 3 "good things" done by this change: 1. Fix bug in the current code which tries mounting with each option every time. The problem with doing that is really that the kernel mount option didn't work very well. It would fail with "already mounted", and then squashfuse would end up getting used to mount over the top. 2. Fix race conditions in the current code. overlay.Unpack starts a thread pool and tries to unpack all layers at once. That is fine if there are no duplicate layers. But if there are duplicate layers used by a stacker.yaml file, then there were races on extraction. The end result really was that things would get mounted more than once. Example stacker that shows this: l1: from: type: docker url: docker://busybox:latest run: | echo build layer 1 l2: from: type: docker url: docker://busybox:latest run: | echo build layer 1 There, the busybox layer would get extracted multiple times. The code here has a single lock on ExtractSingleSquash, it would be better to have lock being taken per extractDir. 3. Allow the user to control the list of extractors. If they knew that they could not use kernel mounts (or could, but didn't want to) or wanted to use unsquashfs they can now do that. STACKER_SQUASHFS_EXTRACT_POLICY=kmount stacker build .. or STACKER_SQUASHFS_EXTRACT_POLICY="squashfuse kmount" stacker build ... This adds a SquashExtractor interface, with 3 implementers (KernelExtractor, SquashFuseExtractor, UnsquashfsExtractor). A ExtractPolicy is basically a list of Extractors to try. The first time ExtractPolicy is used it will try each of the Extractors in order. It then stores the result in .Extractor and uses that subsequently. Signed-off-by: Scott Moser <[email protected]>
This seeks to improve some of the existing debug messages and add some additional debug in the overlay storage. Signed-off-by: Scott Moser <[email protected]>
b8fba13
to
fd41a4b
Compare
@hallyn since you reviewed/approved I did a few things, which can be seen here.
- Mount(source, dest string) (bool, error)
+ Mount(source, dest string) error the 'bool' was intended to be "is mounted", but in all cases if err was != nil then mounted was true.
|
What type of PR is this?
Which issue does this PR fix:
What does this PR do / Why do we need it:
If an issue # is not available please add repro steps and logs showing the issue:
Testing done on this change:
Automation added to e2e:
Will this break upgrades or downgrades?
Does this PR introduce any user-facing change?:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.