-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental depmod #253
base: master
Are you sure you want to change the base?
Incremental depmod #253
Conversation
@hramrach, @hreinecke for information. |
43ae162
to
6bf3c3d
Compare
Rebased onto master. |
This is a nice feature |
While the CI workflows are waiting for moderator approval here, the results for 6bf3c3d can be examined in my fork. |
My fear with a PR with 28 commits was on the complexity being added. Upon checking the diffstat, it doesn't seem the much for "the part that matters" though:
The idea is cool, but I will need some time to properly review. Right now I only have questions:
The first run with code cache (and without warmup) was ~1.6sec. Can you share the absolute time you are seeing? |
Yes. OpenSUSE Leap is derived from SUSE Linux Enterprise. The kernel base package contains only modules that have official commercial support in SLE. Then there's
Actually, Fedora seems to do something similar:
And Ubuntu has multiple module packages, too:
Here are results of the test script included in the test, on my laptop with Ryzen 7 4750U and M.2 SSD:
So we're talking about 0.2s vs. 1.0s1. That's not much, but still in a range that users will notice. This is with a kernel that uses zstd compression, though. Doing the same test on a kernel with xz compression yields very different results, on the same hardware:
That's 2s vs. 33s with xz. This is an improvement that's very clearly noticeable by users. I used to have this kind of kernel with xz-compressed modules on a Raspi-like ARM device, where kernel updates would take several minutes, partly due to depmod slowness. That was my main incentive to start this work ;-). Also, especially when handling kernel module packages and checking them for Kernel ABI compatibility, depmod needs to be run multiple times during a single package installation. These times add up and contribute significantly to the overall time required for installing or updating a KMP. You may have noticed from the commit time stamps that I've been working on this patch set for more than 3 years (unfortunately it took me so long to clean up the set and finish the tests). Back then, (open)SUSE still used xz module compression. At that time I investigated different options to speed up both kernel updates and KMP installations, and incremental depmod turned out to be an important factor, even if replacing xz by zstd was more effective. If you are interested, have a look at my presentation from 2021, where I investigated several ways to speed up the process of installing or updating packages containing kernel modules2. Footnotes
|
Few notes in random order:
Since v33 we have landed a bunch of perf changes. Can you share some absolute numbers of v33, v33 + this patch and master. At a glance, we can improve zstd side by pre-allocating/reusing the decompression state. Doing it once, instead of allocating/de-allocating it 1000s of times should shave a few us/ms. Changing to dlopen only the required deps will also save a couple of us. |
Alright. I need to include the bug fixes in this PR though, so it will be on top of the small PR. I hope that's acceptable.
It depends on the transaction and on settings. For (open)SUSE kernel updates, it is now possible to set the environment variable
I am open to suggestions. I pondered this a lot, and the solution I went for seemed like the best to me. Note that it is "99%" backward-compatible. As mentioned above, the only problem I saw was For this feature to work reliably, storing the symbol versions somewhere is a must, and the current format doesn't support that, so some sort of change is necessary. I could go for a separate index file I could also refrain from storing symbol versions in Again, let me know you preferences.
I'm open to suggestions here, too. I didn't want to change the semantics of the existing depmod invocations. The biggest semantic clash that I see is between
I can't easily provide numbers for v33 + this patch, because I've had a lot of rebasing to do since v33, and that rebasing was happened modifications in my own code as I added tests and found bugs. Results for full depmod (
In contrast, here are results for
As you can see, the difference is in a different order of magnitude. And that's no surprise, as the number of module files that must be decompressed and read by depmod is far lower.
That sounds promising, too, but I hope you are not expecting me to work on it :) Footnotes
|
6bf3c3d
to
60a6c26
Compare
Rebased onto current master. |
@mwilck thanks for the context, that helps a lot.... for the uncompressed or zstd-compressed it's hard to justify by these numbers, but for xz it indeed is a lot of improvements. I will try reviewing this soon. Just a heads up that we may decide to postpone this for after the next release as this is already the biggest release ever. |
60a6c26
to
e211a58
Compare
Rebased onto #257, as requested. I also added some more detail to the commit messages of the bug fixes. |
Wrt the (in)compatible file format: We are already creating Let me know if this would be preferred. |
Yeah, I realize that the timing of this PR is pretty bad, as there has been a lot of action in this project lately. While I certainly wish that this PR get merged soon, I understand that it feels risky at this point in time. |
e211a58
to
fdadd2d
Compare
Rebased onto updated #257. |
fdadd2d
to
8ad94fa
Compare
rebased onto updated #257 |
... and use them to implement kmod_index_dump(). These helpers will be useful in depmod. Signed-off-by: Martin Wilck <[email protected]>
8ad94fa
to
a5bc60d
Compare
Codecov ReportAttention: Patch coverage is
|
Rebased again. @lucasdemarchi, is the code coverage critically low? I thought I'd added quite a bit of test cases. |
Symbol mappings are unique: depmod_symbol_add() doesn't allow more than one owner of a symbol by storing it in the hash. Thus the regular symbol lookup will only ever return a list with a single entry, the module containing the symbol. Rename kmod_lookup_alias_from_alias_bin to kmod_lookup_alias_from_alias_bin_n, adding an extra parameter indicating the maximum number of values to return. kmod_lookup_alias_from_alias_bin() calls this function setting the maximum to UINT_MAX. When looking up symbols, in modules.symbols.bin, set the maximum to 1. This way the CRC will not be mistaken as a module name. Signed-off-by: Martin Wilck <[email protected]>
Add a the symbol's CRC as second value to every index entry in modules.symbols.bin, using a pseudo "priority" value of UINT_MAX. This will cause modprobe versions that lack the previous patch to complain about invalid modules if modules are looked up by symbol: alias, because they will interpret the CRC as a module name: modprobe: ERROR: could not find module by name='01fe7347' modprobe: ERROR: could not insert '01fe7347': Unknown symbol in module, or unknown parameter (see dmesg) The module owning the symbol and its dependencies will be loaded nonetheless, though. In the unlikely case [*] that the CRC matches a module name, modprobe will try to load that module, too. If the predecessor patch is applied, the CRC will be ignored by modprobe. [*] Afaics, there is currently no module name that matches ^[a-f0-9]{8}$. Signed-off-by: Martin Wilck <[email protected]>
Add the option -I/--incremental. With this option, depmod will add the modules given on the command line to the existing module data base. Don't write files related to built-in modules in this mode. This patch just adds the option and documentation, the implementation follows in subsequent patches. Signed-off-by: Martin Wilck <[email protected]>
For incremental depmod, we need to read in the complete state of the current depmod data base first. We read modules and module dependencies from "modules.dep.bin", symbols and their CRCs from "modules.symbols.bin", using the previously added CRC value. The aliases can't be read from "modules.alias.bin", because in the .bin file, dashes are substituted by underscores in a way that can't be easily undone. That doesn't matter for kmod/modprobe, but for depmod it does. Thus we read the aliases from "modules.alias" instead. Finally, read the softdeps, too. Signed-off-by: Martin Wilck <[email protected]>
Add an additional field in "struct mod" to track the list of users of a module (other modules depending on this module). This allows quicker lookup later on. Signed-off-by: Martin Wilck <[email protected]>
The functionality to determine whether a module replaces another one is neede by the incremental code, too. Signed-off-by: Martin Wilck <[email protected]>
The functionality to load a single module is needed by the incremental code. Signed-off-by: Martin Wilck <[email protected]>
If the same module (same path) occurs twice in the list of modules, depmod_module_is_higher_priority() must return 1 (it returns 1 for same prio). This can't happen during regular runs, but it is possible in incremental mode (a previously added module can be listed on the command line again). Signed-off-by: Martin Wilck <[email protected]>
depmod_module_del() should also remove any references to the module being deleted in other data structures. Signed-off-by: Martin Wilck <[email protected]>
This patch implements the incremental addition of modules from the command line, using the previously added functionality. Signed-off-by: Martin Wilck <[email protected]>
Add a new mode in which depmod searches the directory tree for modules that are not in the current dependency data base, and adds them with their dependencies, without re-calculating the dependencies of modules that had been in the data base previously. The existing -a/--all flag is abused for selecting this mode with "depmod -I -a". Signed-off-by: Martin Wilck <[email protected]>
As "depmod -I -a" is now able to search the tree for added modules, it would be nice to be able to handle module deletion, too. It turns out that this is actually quite easy in the existing framework. Instead of failing when module can't be opened, try to handle the situation gracefully by re-checking the modules that depend on the removed ones. Errors are only raised if a remaining module depends on functionality that had been provided by a deleted one. While this is most useful for automatically searching the module tree for additions and removals with "depmod -I -a", it can also be used with "depmod -I" with modules listed on the command line. Signed-off-by: Martin Wilck <[email protected]>
Add the option -x/--errexit. If this option is given, depmod will exit with error code 1 if either unresolved symbols or (if crc checking is also enabled) symbol CRC mismatch is detected. Signed-off-by: Martin Wilck <[email protected]>
Add a test using the currently existing modules mod-foo and mod-foo-{a,b,c}. mod-foo contains on all 3 other modules. Signed-off-by: Martin Wilck <[email protected]>
This patch adds two more tests. Both add one module to the previous mod-foo test. The added module depends on mod-foo-b. Only the name of the added module differs: it's mod-bar in case 1 and mod-bah in the second case. Because of internals of depmod's sorting algorithm, mod-bar gets sorted before mod-foo, but mod-bah behind it. While the ordering of lines in modules.dep this isn't a big issue, it's quite surprising that this also affects the order of dependecies printed for mod-foo (compare "correct-modules.dep" for the two cases). The dependency list isn't "wrong" in either case, as mod-foo-{a,b,c} don't depend on each other. But the example shows that the ordering of mutually independent modules in modules.dep is hard to predict, making it very difficult to implement automatic verification (using simple "diff") for depmod's incremental mode. Signed-off-by: Martin Wilck <[email protected]>
Add a python script that creates a non-trivial dependency tree of modules. The script works by creating simple modules that can both export and call functions with either "void f(void)" or "void f(int)" prototypes. The kernel build will resolve these dependencies and a dependency tree will be generated. By default, the script generates a set of modules that is is inspired by a real world example from the block and SCSI layer of the Linux kernel. Signed-off-by: Martin Wilck <[email protected]>
Add a test using the script from the previous patch, with expected results. The generated modules need some basic symbols from vmlinux. Create code to generate the symvers file. This makes it possible to check symbol versions in the test suite and fail if they don't match, using the "-x" option. Signed-off-by: Martin Wilck <[email protected]>
…ies() TL;DR: This patch modifies the dependency sort algorithm such that the list of dependencies of any module is deterministically ordered, independent of the order in which modules were detected. This is important for verifying that incremental depmod and "ordinary" depmod lead to the same results. With the changes for dependency ordering, the expected test results change. Note that (as intended) the results for test-dependencies{,-1,-2} are now identical. Details: The main constraint enforced by the module sorting algorithm is that every module is sorted before all modules it depends upon. The list of dependencies for modules.dep is sorted using this ordering. This way, modprobe can simply walk the list of dependencies backward without encounterung undefined symbols. However, this is the only constraint the sorting algorithm enforces. In particular, no ordering is implied between mutually independent modules. Thus if module A depends on B and C, but B and C are independent, both A, B, C and A, C, B are valid orderings. The sort algorithm starts out with those modules on which no other module depends ("roots"). The algorithm walks over these "roots". For every root node N, it walks through the list of its dependencies M, and "forgets" the dependency of N on M, decrementing the number of "users" of M by 1. If N was the last "user" of M, M becomes a new "root" node, which is sorted after all root nodes found so far. The position in which this node is now inserted becomes its sort index (dep_sort_idx). This way it is ensured that the constraint above is always met. One peculiarity of the current algorithm is that the list of "root" nodes is walked backwards, while new nodes are being appended to the end. This works, but it's hard to follow and leads to some surprising results (for example, modules with dependencies A->B, C->D will be ordered A, B, D, C). The final ordering depends on the initial ordering of the modules, and the order of the list of dependencies of every module. The initial module order is determined by the module hash. This ordering is deterministic (the hash function is deterministic, and in every hash bucket, modules are sorted alphabetically), but it depends on the hash implementation and should rather not be relied upon. Also, the determinism is hard to understand because of the hash involved. A module's list of dependencies depends on the order the modules are scanned. This is deterministic for regular depmod (determined by the depth-first search in depmod_modules_search_dir()), but not for incremental depmod, where the order of modules listed on the command line is arbitrary. This patch modifies the algorithm by not incrementing dep_sort_idx for every node. Rather, dep_sort_idx is set to the node's "rank", which is defined as follows: Modules with no users have rank 1. The rank of an arbitrary module M is the maximum of the ranks of all modules depending on M, plus 1. By this definition, modules of rank N can only depend on modules of rank N+1 or higher. A prerequisite for implementing this change is that the list of "roots" is now scanned forward rather than backward. Modules are sorted by rank; this way the main ordering constraint (see above) remains fulfilled. Modules of equal rank are sorted alphabetically by module name, leading to a completely deterministic order for all modules. Example of previous behavior ("->" indicates "depends on"): A->(B, C); B->D; C->D; E->B input: A, B, C, D, E sorted: A, E, C, B, D deps of A: C, B, D input: E, A, B, C, D sorted: E, A, B, C, D deps of A: B, C, D Note that the list of dependencies of A depends on the input order. With this patch, ranks A: 0, E: 0, B: 1, C: 1, D: 2 sorted: A, E, B, C, D (independent of input order) deps of A: B, C, D Signed-off-by: Martin Wilck <[email protected]>
This test uses the same module set as "big-01". But it simulates an incremental addition. The test directory contains modules.dep and other output files from a "previous" depmod run, where some modules (scsi_transport_sas, qla2xxx and their dependencies) had been missing. The test then runs "depmod -I -a" over this directory with the "previously missing" modules now present. The final results should be equal to those of the big-01 test. Signed-off-by: Martin Wilck <[email protected]>
This test also uses the module set from "big-01". It simulates deletion of some modules (again, scsi_transport_sas and qla2xxx plus their dependencies) and subsequent running of "depmod -I -a". The input directory contains the depmod results for the full module set. The results should be equal to the inputs of the big-01-incremental test. Signed-off-by: Martin Wilck <[email protected]>
This alternative playground will be used to build module sets with disagreeing symbol versions. This must be done in separate directories, otherwise the kernel build will mix up the dependencies because some symbols will be defined multiple times with different symbol version. Signed-off-by: Martin Wilck <[email protected]>
This test takes again the module set from "big-01" and replaces two modules such that a symbol version changes. "depmod -I -a" is expected to pick up this change. Signed-off-by: Martin Wilck <[email protected]>
This test is like the "replace" test, but it uses depmods's built-in priority logic by copying the replacement modules into the "updates" subdirectory. "depmod -I -a" is expected to detect this change. Signed-off-by: Martin Wilck <[email protected]>
Add a test script to test incremental depmod on a real kernel, as an addition to the normal kmod CI. This script is meant to be run against the installed kernel on a test system. This test script compares the results (i.e. the modules.* files generated by depmod) of incremental depmod with a full depmod run. The expectation is that the results are always identical. The script takes a list of module basenames as arguments, the "test modules", like this: test-incremental.sh scsi_mod drm_display_helper 1. The script looks for a kernel module tree in the system it's running on and copies this tree to the working directory, and runs a full depmod there. 2. It copies the entire tree again, deletes the test modules and all modules depending on them, and runs full depmod once more. 3. It copies the tree created in 2., re-adds the previously deleted modules, and runs "depmod --incremental" with the added modules as arguments, and compares the results with those from step 1. 4. It copies the tree created in 2., re-adds the previously deleted modules, runs "depmod --incremental --all", and compares the results with those from step 1. 5. It copies the tree created in 2, runs "depmod --incremental --all", and compares the results with those from step 2. If any differences are found in step 3., 4., or 5., the script will report them. By default, all operations are done below a temporary directory under /tmp, so no changes are made to the system. The autodetected parameters should work on most systems. Otherwise, the script has various options to override them. Copying this script into meson's build directory requires meson's "fs.copyfile" feature, which was introduced in meson 0.64. The meson build must therefore be disabled on Ubuntu 22.04, which still ships meson 0.61. Signed-off-by: Martin Wilck <[email protected]>
a5bc60d
to
79d2e48
Compare
I've added commit 1772e9f ("libkmod: only use first value when looking up symbols"). This used to be part of the set, but somehow during repeated rebasing and refactoring, it got lost before I created the PR. Without that commit, modprobe complained when loading modules by
With the added commit, that doesn't happen any more.
|
Let me know if you want me to add more comments to satisfy the github-advanced-security check. |
Massive thanks for the numbers. Seems like the perf polish we did in master is within the noise margin, when xz is used. Hey at least we didn't make things slower 😅
I wouldn't be too worried about the code coverage reports atm. We have some pre-existing gaps which may be miss-counted and the PR adds a very good amount of test IMHO.
If it were me, I would split in 2-3 sub-functions which will also improve legibility. Although it's nothing to be too concerned about.
Lucas opened a PR for the dlopen side. Once it lands I will open another to speed-up zstd a wee-bit. Wrt this feature, I'm a bit split since:
|
What do you mean with "fragile"? I'm aware that it isn't immediately obvious that incremental mode doesn't break the depmod db. That's the reason why I spent much effort on writing test code. The "incremental" code path is mostly separate from the regular code path, which minimizes the risk that people or distributions who don't use incremental mode will suffer from regressions. Wrt performance, I think I've shown that the improvement for the intended use case is noticeable, even with zstd. We're not talking about just a few per cent. As mentioned above, depmod is run multiple times for a typical kernel-module-package installation scenario, and even if a single run is only half a second, the difference with an without incremental mode is well noticeable for the end user. And, my timings have been taken on fairly capable hardware; on smaller systems the differences will be bigger. |
Please note, that I haven't spent great deal of time thinking things over or reading the code. So it's quite possible I'm talking complete BS - apologies in advance, if so 🙇 The "fragile" comment is based on a quick read through the man page changes. More specifically, in random order:
While "no clear suggestion" is a self-directed, since I cannot see a constructive suggestions on how to address ^^ and/or reduce the two toggles. Most importantly, Lucas is the project maintainer so please take his input with much greater weight. |
AFAICS, there's no conflict. symbols and modules are different namespaces. For
If there are several modules with the same module name, the depmod db only contains entries pointing the module with the highest-priority path. The existing code contains the logic that determines the path priority. The incremental code uses this existing logic to determine whether the module that is added or removed is the highest-priority one. Only in that case will changes be applied. According to my testing, this logic works correctly, but of course I may be overlooking something.
If the configuration is changed, the entire database needs to be rebuilt. That's independent of incremental mode.
It wouldn't work, because the data base would still contain entries referencing the removed module. But you might as well ask if it would work if we forgot to run depmod at all – it wouldn't. Again, this is independent on whether or not incremental mode is used.
It depends. distro packagers make mistakes. I would hope that when distro packagers start adopting this feature, they'll think about it and make sure they use it (only) where appropriate. I think that you could actually always use incremental depmod, except in the "configuration change" case. When you install a kernel for the first time, the data base will be empty, and incremental comes down to regular (it isn't recommended, as it will probably be slower than regular in this pathological scenario, but it should work). |
Incremental depmod
Currently, depmod always needs to scan an entire module tree to create a valid module data base. This PR adds "incremental mode" to the depmod tool,
depmod --incremental
ordepmod -I
. With this mode, it is possible to scan just a subset of modules and add their symbols and dependencies to the depmod data base. depmod reads the existing database files and merges them with the dependencies of the additional modules scanned. After applying changes to a module tree, it's sufficient to rundepmod -e -E <symvers-file> -I -a <kernel version>
to update the data base.This mode provides a significant speedup if modules are added to, removed from, or overridden in an already installed kernel module tree. Several distributions (e.g. openSUSE and Fedora) ship multiple packages with kernel modules in order to reduce the size of the core kernel package. Installation of such add-on kernel packages or third-party module packages can be sped up by using incremental mode.
The actual speedup depends on the number of modules added, the storage speed, and the module compression algorithm. For addition of the
kernel-default-optional
on openSUSE Leap (787 / 5933 modules, zstd compression, M.2 SSD), incremental mode is roughly 3x faster than a full depmod run.To make this possible when
CONFIG_MODVERSIONS
is set, depmod needs to be able to retrieve module symbol versions from the data base. depmod can't rely on thesymvers
file for this purpose, because that file only contains symbol information for the modules built together with the kernel itself. If any modules have been added, replaced or overridden in the meantime, this information will be stale. Thereforedepmod
now saves symbol version information inmodules.symbols.bin
by adding a second field for the CRC to every entry in this file 1.A large part of this PR adds new tests to the test suite. This is because I wanted to make sure that my changes don't break symbol or dependency resolution. In particular, I spent much effort on tests to assert hat the result of using incremental mode didn't differ from the results of a full depmod run in any way. I wanted to be able to have this checked by a human being very easily, e.g. by just running
diff
orcmp
over the depmod output. This doesn't work with kmod-33 as-is, because the ordering of the dependencies on a given line inmodules.dep
is non-deterministic. If a module A depends on modules B and C which are mutually independent, the order in which B and C are listed on the list of requirements of A is not predictable. The patch "depmod: deterministic module sort order in depmod_calculate_dependencies()" changes this behavior. The commit message of this patch provides a detailed explanation of the issue and the fix. The PR also contains test cases that demonstrate the issue.Apart from additions to the test suite, I also added an test script (
testsuite/test-incremental.sh
) that can be used on live systems to double-check both accuracy and performance of incremental mode. The script (which can be run as ordinary user, and works completely on temporary storage) removes the modules given on the command line and their dependencies from the kernel module directory and re-adds them using incremental mode. See the commit message of "testsuite: add generic test script for incremental depmod" for details.Note: this patch set causes the CI for Ubuntu 22.04/meson to fail, because it updates the minimum meson version to 0.64 in order to use meson's
fs.copyfile
feature (see meson release notes). I've added an exclude directive to main.yml.Footnotes
Note that I did not
↩INDEX_VERSION_MAJOR
, although this represents a file format change. I can change that of course.The rationale for my decision was that
modprobe
always bails out if it finds an incompatible depmod data base version, whereas according to my testing, modprobe binaries compiled without this patch set work ok with depmod binaries created with this patch set. Only if modules are loaded by usingsymbol:
aliases, modprobe will print an error message, but this message is non-fatal, as the module providing the symbol is loaded nonetheless: