Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental depmod #253

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

Conversation

mwilck
Copy link
Contributor

@mwilck mwilck commented Nov 17, 2024

Incremental depmod

Currently, depmod always needs to scan an entire module tree to create a valid module data base. This PR adds "incremental mode" to the depmod tool, depmod --incremental or depmod -I. With this mode, it is possible to scan just a subset of modules and add their symbols and dependencies to the depmod data base. depmod reads the existing database files and merges them with the dependencies of the additional modules scanned. After applying changes to a module tree, it's sufficient to run depmod -e -E <symvers-file> -I -a <kernel version> to update the data base.

This mode provides a significant speedup if modules are added to, removed from, or overridden in an already installed kernel module tree. Several distributions (e.g. openSUSE and Fedora) ship multiple packages with kernel modules in order to reduce the size of the core kernel package. Installation of such add-on kernel packages or third-party module packages can be sped up by using incremental mode.

The actual speedup depends on the number of modules added, the storage speed, and the module compression algorithm. For addition of the kernel-default-optional on openSUSE Leap (787 / 5933 modules, zstd compression, M.2 SSD), incremental mode is roughly 3x faster than a full depmod run.

To make this possible when CONFIG_MODVERSIONS is set, depmod needs to be able to retrieve module symbol versions from the data base. depmod can't rely on the symvers file for this purpose, because that file only contains symbol information for the modules built together with the kernel itself. If any modules have been added, replaced or overridden in the meantime, this information will be stale. Therefore depmod now saves symbol version information in modules.symbols.bin by adding a second field for the CRC to every entry in this file 1.

A large part of this PR adds new tests to the test suite. This is because I wanted to make sure that my changes don't break symbol or dependency resolution. In particular, I spent much effort on tests to assert hat the result of using incremental mode didn't differ from the results of a full depmod run in any way. I wanted to be able to have this checked by a human being very easily, e.g. by just running diff or cmp over the depmod output. This doesn't work with kmod-33 as-is, because the ordering of the dependencies on a given line in modules.dep is non-deterministic. If a module A depends on modules B and C which are mutually independent, the order in which B and C are listed on the list of requirements of A is not predictable. The patch "depmod: deterministic module sort order in depmod_calculate_dependencies()" changes this behavior. The commit message of this patch provides a detailed explanation of the issue and the fix. The PR also contains test cases that demonstrate the issue.

Apart from additions to the test suite, I also added an test script (testsuite/test-incremental.sh) that can be used on live systems to double-check both accuracy and performance of incremental mode. The script (which can be run as ordinary user, and works completely on temporary storage) removes the modules given on the command line and their dependencies from the kernel module directory and re-adds them using incremental mode. See the commit message of "testsuite: add generic test script for incremental depmod" for details.

Note: this patch set causes the CI for Ubuntu 22.04/meson to fail, because it updates the minimum meson version to 0.64 in order to use meson's fs.copyfile feature (see meson release notes). I've added an exclude directive to main.yml.

Footnotes

  1. Note that I did not INDEX_VERSION_MAJOR, although this represents a file format change. I can change that of course.
    The rationale for my decision was that modprobe always bails out if it finds an incompatible depmod data base version, whereas according to my testing, modprobe binaries compiled without this patch set work ok with depmod binaries created with this patch set. Only if modules are loaded by using symbol: aliases, modprobe will print an error message, but this message is non-fatal, as the module providing the symbol is loaded nonetheless:

        # modprobe -v symbol:qlt_abort_cmd
        insmod /lib/modules/6.4.0-150600.23.25- 
        default/kernel/drivers/scsi/qla2xxx/qla2xxx.ko.zst 
        modprobe: ERROR: could not find module by name='8c2400e3'
        modprobe: ERROR: could not insert '8c2400e3': Unknown symbol in module, or unknown parameter (see dmesg)
    

@mwilck
Copy link
Contributor Author

mwilck commented Nov 17, 2024

@hramrach, @hreinecke for information.

@mwilck mwilck force-pushed the incremental-depmod branch from 43ae162 to 6bf3c3d Compare November 18, 2024 17:39
@mwilck
Copy link
Contributor Author

mwilck commented Nov 18, 2024

Rebased onto master.

@hramrach
Copy link
Contributor

This is a nice feature

@mwilck
Copy link
Contributor Author

mwilck commented Nov 20, 2024

While the CI workflows are waiting for moderator approval here, the results for 6bf3c3d can be examined in my fork.

@lucasdemarchi
Copy link
Contributor

My fear with a PR with 28 commits was on the complexity being added. Upon checking the diffstat, it doesn't seem the much for "the part that matters" though:

 libkmod/libkmod-elf.c |    4 +-
 libkmod/libkmod-file-zstd.c|    1 -
 libkmod/libkmod-file.c|    2 +-
 libkmod/libkmod-index.c |   77 ++++++++++++++++-------
 libkmod/libkmod-index.h |    6 ++
 libkmod/libkmod-internal.h |    3 +-
 libkmod/libkmod-module.c |   15 ++---
 libkmod/libkmod.c |    4 +-
 shared/hash.c |    4 +-
 shared/missing.h |    2 +-
 shared/strbuf.c |    2 +
 shared/util.c |    7 +++
 shared/util.h |    5 --

The idea is cool, but I will need some time to properly review. Right now I only have questions:

This mode provides a significant speedup if modules are added to, removed from, or overridden in an already installed kernel module tree. Several distributions (e.g. openSUSE and Fedora) ship multiple packages with kernel modules in order to reduce the size of the core kernel package. Installation of such add-on kernel packages or third-party module packages can be sped up by using incremental mode.

The actual speedup depends on the number of modules added, the storage speed, and the module compression algorithm. For addition of the kernel-default-optional on openSUSE Leap (787 / 5933 modules, zstd compression, M.2 SSD), incremental mode is roughly 3x faster than a full depmod run.

  1. So, this only helps in case the modules are maintained in separate packages. So far I've only seen this happening with very few modules (often backports from upstream to an enterprise/lts kernel) and for out-of-tree modules. What are the modules we are talking about? Is opensuse splitting the kernel modules (from the same build) into multiple packages?

  2. You mentioned it being 3x faster... but on my aging TGL laptop it already seems fast (to me). Do we need to increase the complexity here? For me:

$ sudo hyperfine --warmup 1 --runs 5 "./build-release/depmod -a 6.11.9-1-MANJARO"
Benchmark 1: ./build-release/depmod -a 6.11.9-1-MANJARO
  Time (mean ± σ):     952.9 ms ±  25.5 ms    [User: 856.3 ms, System: 88.6 ms]
  Range (min … max):   921.4 ms … 974.7 ms    5 runs

The first run with code cache (and without warmup) was ~1.6sec. Can you share the absolute time you are seeing?

@mwilck
Copy link
Contributor Author

mwilck commented Nov 21, 2024

  1. Is opensuse splitting the kernel modules (from the same build) into multiple packages?

Yes. OpenSUSE Leap is derived from SUSE Linux Enterprise. The kernel base package contains only modules that have official commercial support in SLE. Then there's kernel-default-extra, with modules that have limited support in some add-on products such as SUSE Linux Enterprise desktop, and kernel-default-optional with modules that are only available for Leap.

> for x in $(rpm -qa kernel-default*); do echo $x: $(rpm -ql "$x"|grep '\.ko'|wc -l); done
kernel-default-6.4.0-150600.23.25.1.x86_64: 2342
kernel-default-extra-6.4.0-150600.23.25.1.x86_64: 1804
kernel-default-optional-6.4.0-150600.23.25.1.x86_64: 787

Actually, Fedora seems to do something similar:

> for x in $(rpm -qa kernel-modules*); do echo $x: $(rpm -ql "$x"|grep '\.ko'|wc -l); done
kernel-modules-core-6.11.8-300.fc41.x86_64: 2079
kernel-modules-6.11.8-300.fc41.x86_64: 2329
kernel-modules-extra-6.11.8-300.fc41.x86_64: 123
kernel-modules-internal-6.11.8-300.fc41.x86_64: 95

And Ubuntu has multiple module packages, too:

linux-modules-6.8.0-49-generic - Linux kernel extra modules for version 6.8.0 on 64 bit x86 SMP
linux-modules-extra-6.8.0-49-generic - Linux kernel extra modules for version 6.8.0 on 64 bit x86 SMP
linux-modules-ipu6-6.8.0-49-generic - Linux kernel ipu6 modules for version 6.8.0-49
linux-modules-iwlwifi-6.8.0-49-generic - Linux kernel iwlwifi modules for version 6.8.0-49
linux-modules-usbio-6.8.0-49-generic - Linux kernel usbio modules for version 6.8.0-49
linux-modules-nvidia-470-6.8.0-49-generic - Linux kernel nvidia modules for version 6.8.0-49
[... lots more nvidia ...]
  1. Can you share the absolute time you are seeing?

Here are results of the test script included in the test, on my laptop with Ryzen 7 4750U and M.2 SSD:

> ./testsuite/test-incremental.sh scsi_mod
test-incremental.sh: using DEPMOD=./testsuite/../builddir/depmod
test-incremental.sh: found kernel 6.4.0-150600.23.22-default
test-incremental.sh: symbols resolved with -E /tmp/symvers-6.4.0-150600.23.22-default
test-incremental.sh: running depmod over entire tree
elapsed time: 1.32s (1.16 user / 0.16 system)
test-incremental.sh: running full depmod after module removal
elapsed time: 1.19s (1.08 user / 0.10 system)
test-incremental.sh: running depmod --incremental (220 modules added)
elapsed time: 0.26s (0.21 user / 0.05 system)
test-incremental.sh: incremental: OK
test-incremental.sh: running depmod --incremental --all
elapsed time: 0.27s (0.22 user / 0.04 system)
test-incremental.sh: incremental-all: OK
test-incremental.sh: running depmod --incremental --all after module removal
elapsed time: 0.18s (0.12 user / 0.06 system)
test-incremental.sh: deleted: OK

So we're talking about 0.2s vs. 1.0s1. That's not much, but still in a range that users will notice.

This is with a kernel that uses zstd compression, though. Doing the same test on a kernel with xz compression yields very different results, on the same hardware:

> ./testsuite/test-incremental.sh -b /tmp/SLE15-SP3/ scsi_mod
test-incremental.sh: using DEPMOD=./testsuite/../builddir/depmod
test-incremental.sh: found kernel 5.3.18-57-default
test-incremental.sh: symbols resolved with -E /tmp/symvers-5.3.18-57-default
test-incremental.sh: running depmod over entire tree
elapsed time: 35.10s (6.86 user / 28.18 system)
test-incremental.sh: running full depmod after module removal
elapsed time: 33.56s (6.47 user / 27.03 system)
test-incremental.sh: running depmod --incremental (185 modules added)
elapsed time: 1.97s (0.44 user / 1.51 system)
test-incremental.sh: incremental: OK
test-incremental.sh: running depmod --incremental --all
elapsed time: 2.00s (0.56 user / 1.43 system)
test-incremental.sh: incremental-all: OK
test-incremental.sh: running depmod --incremental --all after module removal
elapsed time: 0.14s (0.08 user / 0.05 system)
test-incremental.sh: deleted: OK

That's 2s vs. 33s with xz. This is an improvement that's very clearly noticeable by users. I used to have this kind of kernel with xz-compressed modules on a Raspi-like ARM device, where kernel updates would take several minutes, partly due to depmod slowness. That was my main incentive to start this work ;-).

Also, especially when handling kernel module packages and checking them for Kernel ABI compatibility, depmod needs to be run multiple times during a single package installation. These times add up and contribute significantly to the overall time required for installing or updating a KMP.

You may have noticed from the commit time stamps that I've been working on this patch set for more than 3 years (unfortunately it took me so long to clean up the set and finish the tests). Back then, (open)SUSE still used xz module compression. At that time I investigated different options to speed up both kernel updates and KMP installations, and incremental depmod turned out to be an important factor, even if replacing xz by zstd was more effective. If you are interested, have a look at my presentation from 2021, where I investigated several ways to speed up the process of installing or updating packages containing kernel modules2.

Footnotes

  1. Note that while the first "running depmod over entire tree" may need to access the storage, starting from "running full depmod" all modules should be in the page cache. I have experimented with various warm-up methods but seen little difference. In the cases that matter most, depmod will be run during package installation, when the files have just been copied to storage and are thus usually in the page cache.

  2. Sorry, video only, there are no public slides at this time; but you'll have to watch only ~5 minutes.

@evelikov
Copy link
Collaborator

Few notes in random order:

  • absolutely loving the amount of tests added
  • please split off the cleanups/fixes in a separate PR - one or multiple, don't have a strong preference
  • Arch does this trick where it ensures depmod is called only once per pacman transaction - does openSuse/Fedora do something like that?
  • perf improvements are awesome, but IMHO we should change the format in backward incompatible manner as a last resort
  • the extra complexity - both code and end user is overwhelming - we have -i flag -x flag and subtle functional differences when -a is provided and maybe more that I have missed

Since v33 we have landed a bunch of perf changes. Can you share some absolute numbers of v33, v33 + this patch and master.

At a glance, we can improve zstd side by pre-allocating/reusing the decompression state. Doing it once, instead of allocating/de-allocating it 1000s of times should shave a few us/ms. Changing to dlopen only the required deps will also save a couple of us.

@mwilck
Copy link
Contributor Author

mwilck commented Nov 21, 2024

* please split off the cleanups/fixes in a separate PR - one or multiple, don't have a strong preference

Alright. I need to include the bug fixes in this PR though, so it will be on top of the small PR. I hope that's acceptable.

* Arch does this trick where it ensures depmod is called only once per `pacman` transaction - does openSuse/Fedora do something like that?

It depends on the transaction and on settings. For (open)SUSE kernel updates, it is now possible to set the environment variable INITRD_IN_POSTTRANS=1, which will avoid runnning depmod multiple times if multiple kernel packages are installed or uninstalled in the same transaction. For installation of kernel module packages, running depmod multiple times is unavoidable. This is the case where incremental mode provides most benefit, because we usually just add, remove, or update a handful of modules, often only one.

* perf improvements are awesome, but IMHO we should change the format _in backward incompatible manner_ as a last resort

I am open to suggestions. I pondered this a lot, and the solution I went for seemed like the best to me. Note that it is "99%" backward-compatible. As mentioned above, the only problem I saw was modprobe symbol:xyz, where a non-fatal error message is observed, and which (AFAICT) is a rather rare form of modprobe usage. I could have overlooked something, of course.

For this feature to work reliably, storing the symbol versions somewhere is a must, and the current format doesn't support that, so some sort of change is necessary. I could go for a separate index file modules.symbol-versions.bin of course, if you prefer that.

I could also refrain from storing symbol versions in modules.symbols.bin for kernels that don't set CONFIG_MODVERSIONS.

Again, let me know you preferences.

* the extra complexity - both code and end user is overwhelming - we have `-i` flag `-x` flag and subtle functional differences when `-a` is provided and maybe more that I have missed

I'm open to suggestions here, too. I didn't want to change the semantics of the existing depmod invocations. The biggest semantic clash that I see is between -e and -x. For reasons you'll understand better than me, -e prints warnings but doesn't do anything to the exit status of depmod. That means that tools like openSUSE's weak-modules2 have to parse depmod's output in order to see whether any problems were detected, which is clumsy and error-prone. Admittedly the -x option is independent of the incremental feature and could be handled in a separate PR.

Since v33 we have landed a bunch of perf changes. Can you share some absolute numbers of v33, v33 + this patch and master.

I can't easily provide numbers for v33 + this patch, because I've had a lot of rebasing to do since v33, and that rebasing was happened modifications in my own code as I added tests and found bugs.

Results for full depmod (depmod -e -E $symvers $kver) for the 5.3.18-57 kernel with xz compression above, same hardware as above1:

v29 (SUSE) v33 master master+patch set
35.6 35.1 35.2 35.2

In contrast, here are results for depmod -I -a for addition of different numbers of modules to the same kernel (after removing them, results obtained with test-incremental.sh)2:

# modules added 2 185 784 full
time[s] 0.2 2.0 7.5 35.2

As you can see, the difference is in a different order of magnitude. And that's no surprise, as the number of module files that must be decompressed and read by depmod is far lower.

At a glance, we can improve zstd side by pre-allocating/reusing the decompression state. Doing it once, instead of allocating/de-allocating it 1000s of times should shave a few us/ms. Changing to dlopen only the required deps will also save a couple of us.

That sounds promising, too, but I hope you are not expecting me to work on it :)

Footnotes

  1. Obviously these results are not proper benchmark results, but I'm observing surprisingly little deviations (< 0.1s) between subsequent runs of the same test.

  2. FTR, the set of modules to remove on test-incremental invocations for this table were qla2xxx (2 modules), scsi_mod (185 modules) and usbcore soundcore drm_display_helper scsi_mod (784 modules).

@mwilck mwilck force-pushed the incremental-depmod branch from 6bf3c3d to 60a6c26 Compare November 21, 2024 14:21
@mwilck
Copy link
Contributor Author

mwilck commented Nov 21, 2024

Rebased onto current master.

@lucasdemarchi
Copy link
Contributor

@mwilck thanks for the context, that helps a lot.... for the uncompressed or zstd-compressed it's hard to justify by these numbers, but for xz it indeed is a lot of improvements. I will try reviewing this soon. Just a heads up that we may decide to postpone this for after the next release as this is already the biggest release ever.

@mwilck
Copy link
Contributor Author

mwilck commented Nov 21, 2024

Rebased onto #257, as requested. I also added some more detail to the commit messages of the bug fixes.

@mwilck
Copy link
Contributor Author

mwilck commented Nov 21, 2024

Wrt the (in)compatible file format: We are already creating modules.symbols without using it ourselves any more. What I could do is leave modules.symbols.bin unchanged and create another file modules.sysvers.bin (or whatever) besides it. "New" depmod could then use this new file, which would be ignored by older versions. That would impact the performance somewhat, but not much.

Let me know if this would be preferred.

@mwilck
Copy link
Contributor Author

mwilck commented Nov 21, 2024

Just a heads up that we may decide to postpone this for after the next release as this is already the biggest release ever.

Yeah, I realize that the timing of this PR is pretty bad, as there has been a lot of action in this project lately. While I certainly wish that this PR get merged soon, I understand that it feels risky at this point in time.

@mwilck mwilck force-pushed the incremental-depmod branch from e211a58 to fdadd2d Compare November 21, 2024 23:18
@mwilck
Copy link
Contributor Author

mwilck commented Nov 21, 2024

Rebased onto updated #257.

@mwilck mwilck force-pushed the incremental-depmod branch from fdadd2d to 8ad94fa Compare November 27, 2024 13:31
@mwilck
Copy link
Contributor Author

mwilck commented Nov 27, 2024

rebased onto updated #257

... and use them to implement kmod_index_dump(). These helpers will
be useful in depmod.

Signed-off-by: Martin Wilck <[email protected]>
@mwilck mwilck force-pushed the incremental-depmod branch from 8ad94fa to a5bc60d Compare December 3, 2024 19:10
return err;
}

static int depmod_incremental(struct depmod *depmod, bool all, int argc,

Check warning

Code scanning / CodeQL

Poorly documented large function Warning

Poorly documented function: fewer than 2% comments for a function of 102 lines.
Copy link

codecov bot commented Dec 3, 2024

Codecov Report

Attention: Patch coverage is 58.99054% with 260 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
tools/depmod.c 59.56% 154 Missing and 70 partials ⚠️
libkmod/libkmod-index.c 21.87% 25 Missing ⚠️
testsuite/test-depmod.c 80.00% 8 Missing ⚠️
libkmod/libkmod.c 57.14% 2 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
libkmod/libkmod-module.c 54.03% <100.00%> (ø)
libkmod/libkmod.c 49.88% <57.14%> (ø)
testsuite/test-depmod.c 72.97% <80.00%> (ø)
libkmod/libkmod-index.c 53.08% <21.87%> (ø)
tools/depmod.c 60.61% <59.56%> (ø)

@mwilck
Copy link
Contributor Author

mwilck commented Dec 3, 2024

Rebased again.

@lucasdemarchi, is the code coverage critically low? I thought I'd added quite a bit of test cases.

mwilck and others added 4 commits December 3, 2024 23:28
Symbol mappings are unique: depmod_symbol_add() doesn't allow more than one
owner of a symbol by storing it in the hash.  Thus the regular symbol lookup
will only ever return a list with a single entry, the module containing the
symbol.

Rename kmod_lookup_alias_from_alias_bin to kmod_lookup_alias_from_alias_bin_n,
adding an extra parameter indicating the maximum number of values to return.
kmod_lookup_alias_from_alias_bin() calls this function setting the maximum to
UINT_MAX. When looking up symbols, in modules.symbols.bin, set the maximum to
1. This way the CRC will not be mistaken as a module name.

Signed-off-by: Martin Wilck <[email protected]>
Add a the symbol's CRC as second value to every index entry in
modules.symbols.bin, using a pseudo "priority" value of UINT_MAX.

This will cause modprobe versions that lack the previous patch to complain
about invalid modules if modules are looked up by symbol: alias, because
they will interpret the CRC as a module name:

modprobe: ERROR: could not find module by name='01fe7347'
modprobe: ERROR: could not insert '01fe7347': Unknown symbol in module, or unknown parameter (see dmesg)

The module owning the symbol and its dependencies will be loaded nonetheless,
though. In the unlikely case [*] that the CRC matches a module name, modprobe
will try to load that module, too. If the predecessor patch is applied, the
CRC will be ignored by modprobe.

[*] Afaics, there is currently no module name that matches ^[a-f0-9]{8}$.

Signed-off-by: Martin Wilck <[email protected]>
Add the option -I/--incremental. With this option, depmod will add the
modules given on the command line to the existing module data base.
Don't write files related to built-in modules in this mode.

This patch just adds the option and documentation, the implementation follows
in subsequent patches.

Signed-off-by: Martin Wilck <[email protected]>
For incremental depmod, we need to read in the complete state of the
current depmod data base first. We read modules and module dependencies
from "modules.dep.bin", symbols and their CRCs from "modules.symbols.bin",
using the previously added CRC value.

The aliases can't be read from "modules.alias.bin", because in the .bin file,
dashes are substituted by underscores in a way that can't be easily undone.
That doesn't matter for kmod/modprobe, but for depmod it does. Thus
we read the aliases from "modules.alias" instead.

Finally, read the softdeps, too.

Signed-off-by: Martin Wilck <[email protected]>
mwilck and others added 20 commits December 3, 2024 23:28
Add an additional field in "struct mod" to track the list of
users of a module (other modules depending on this module). This
allows quicker lookup later on.

Signed-off-by: Martin Wilck <[email protected]>
The functionality to determine whether a module replaces another
one is neede by the incremental code, too.

Signed-off-by: Martin Wilck <[email protected]>
The functionality to load a single module is needed by the
incremental code.

Signed-off-by: Martin Wilck <[email protected]>
If the same module (same path) occurs twice in the list of modules,
depmod_module_is_higher_priority() must return 1 (it returns 1 for
same prio). This can't happen during regular runs, but it is possible
in incremental mode (a previously added module can be listed on the
command line again).

Signed-off-by: Martin Wilck <[email protected]>
depmod_module_del() should also remove any references to the
module being deleted in other data structures.

Signed-off-by: Martin Wilck <[email protected]>
This patch implements the incremental addition of modules from
the command line, using the previously added functionality.

Signed-off-by: Martin Wilck <[email protected]>
Add a new mode in which depmod searches the directory tree for modules that
are not in the current dependency data base, and adds them with their
dependencies, without re-calculating the dependencies of modules that had
been in the data base previously. The existing -a/--all flag is abused for
selecting this mode with "depmod -I -a".

Signed-off-by: Martin Wilck <[email protected]>
As "depmod -I -a" is now able to search the tree for added modules, it would
be nice to be able to handle module deletion, too. It turns out that this is
actually quite easy in the existing framework. Instead of failing when module
can't be opened, try to handle the situation gracefully by re-checking the
modules that depend on the removed ones. Errors are only raised if a remaining
module depends on functionality that had been provided by a deleted one.

While this is most useful for automatically searching the module tree for
additions and removals with "depmod -I -a", it can also be used with "depmod
-I" with modules listed on the command line.

Signed-off-by: Martin Wilck <[email protected]>
Add the option -x/--errexit. If this option is given, depmod will
exit with error code 1 if either unresolved symbols or (if crc checking
is also enabled) symbol CRC mismatch is detected.

Signed-off-by: Martin Wilck <[email protected]>
Add a test using the currently existing modules mod-foo and
mod-foo-{a,b,c}. mod-foo contains on all 3 other modules.

Signed-off-by: Martin Wilck <[email protected]>
This patch adds two more tests. Both add one module to the previous
mod-foo test. The added module depends on mod-foo-b. Only the name of
the added module differs: it's mod-bar in case 1 and mod-bah in the second
case. Because of internals of depmod's sorting algorithm, mod-bar gets sorted
before mod-foo, but mod-bah behind it. While the ordering of lines in
modules.dep this isn't a big issue, it's quite surprising that this also
affects the order of dependecies printed for mod-foo (compare
"correct-modules.dep" for the two cases). The dependency list isn't "wrong" in
either case, as mod-foo-{a,b,c} don't depend on each other. But the example
shows that the ordering of mutually independent modules in modules.dep is hard
to predict, making it very difficult to implement automatic verification
(using simple "diff") for depmod's incremental mode.

Signed-off-by: Martin Wilck <[email protected]>
Add a python script that creates a non-trivial dependency tree of modules.
The script works by creating simple modules that can both export and call
functions with either "void f(void)" or "void f(int)" prototypes. The kernel
build will resolve these dependencies and a dependency tree will be generated.

By default, the script generates a set of modules that is is inspired by a
real world example from the block and SCSI layer of the Linux kernel.

Signed-off-by: Martin Wilck <[email protected]>
Add a test using the script from the previous patch, with expected results.
The generated modules need some basic symbols from vmlinux. Create code to
generate the symvers file. This makes it possible to check symbol versions in
the test suite and fail if they don't match, using the "-x" option.

Signed-off-by: Martin Wilck <[email protected]>
…ies()

TL;DR: This patch modifies the dependency sort algorithm such that the
list of dependencies of any module is deterministically ordered, independent
of the order in which modules were detected. This is important for verifying
that incremental depmod and "ordinary" depmod lead to the same results.

With the changes for dependency ordering, the expected test results
change. Note that (as intended) the results for test-dependencies{,-1,-2}
are now identical.

Details:

The main constraint enforced by the module sorting algorithm is that every
module is sorted before all modules it depends upon. The list of dependencies
for modules.dep is sorted using this ordering. This way, modprobe can simply
walk the list of dependencies backward without encounterung undefined symbols.
However, this is the only constraint the sorting algorithm enforces. In
particular, no ordering is implied between mutually independent modules.
Thus if module A depends on B and C, but B and C are independent, both
A, B, C and A, C, B are valid orderings.

The sort algorithm starts out with those modules on which no other module
depends ("roots"). The algorithm walks over these "roots". For every root
node N, it walks through the list of its dependencies M, and "forgets" the
dependency of N on M, decrementing the number of "users" of M by 1. If N was
the last "user" of M, M becomes a new "root" node, which is sorted after all
root nodes found so far. The position in which this node is now inserted
becomes its sort index (dep_sort_idx). This way it is ensured that the
constraint above is always met.

One peculiarity of the current algorithm is that the list of "root" nodes is
walked backwards, while new nodes are being appended to the end. This works,
but it's hard to follow and leads to some surprising results (for example,
modules with dependencies A->B, C->D will be ordered A, B, D, C).

The final ordering depends on the initial ordering of the modules, and the
order of the list of dependencies of every module. The initial module order
is determined by the module hash. This ordering is deterministic
(the hash function is deterministic, and in every hash bucket, modules are
sorted alphabetically), but it depends on the hash implementation and should
rather not be relied upon. Also, the determinism is hard to understand because
of the hash involved. A module's list of dependencies depends on the order
the modules are scanned. This is deterministic for regular depmod (determined
by the depth-first search in depmod_modules_search_dir()), but not for
incremental depmod, where the order of modules listed on the command line is
arbitrary.

This patch modifies the algorithm by not incrementing dep_sort_idx for every
node. Rather, dep_sort_idx is set to the node's "rank", which is defined as
follows: Modules with no users have rank 1. The rank of an arbitrary module M
is the maximum of the ranks of all modules depending on M, plus 1. By this
definition, modules of rank N can only depend on modules of rank N+1 or higher.
A prerequisite for implementing this change is that the list of "roots" is now
scanned forward rather than backward.

Modules are sorted by rank; this way the main ordering constraint (see above)
remains fulfilled. Modules of equal rank are sorted alphabetically by module
name, leading to a completely deterministic order for all modules.

Example of previous behavior ("->" indicates "depends on"):

  A->(B, C); B->D; C->D; E->B

  input: A, B, C, D, E
  sorted: A, E, C, B, D
  deps of A: C, B, D

  input: E, A, B, C, D
  sorted: E, A, B, C, D
  deps of A: B, C, D

Note that the list of dependencies of A depends on the input order.

With this patch, ranks A: 0, E: 0, B: 1, C: 1, D: 2
  sorted: A, E, B, C, D (independent of input order)
  deps of A: B, C, D

Signed-off-by: Martin Wilck <[email protected]>
This test uses the same module set as "big-01". But it simulates
an incremental addition. The test directory contains modules.dep
and other output files from a "previous" depmod run, where some
modules (scsi_transport_sas, qla2xxx and their dependencies)
had been missing. The test then runs "depmod -I -a" over this
directory with the "previously missing" modules now present.
The final results should be equal to those of the big-01 test.

Signed-off-by: Martin Wilck <[email protected]>
This test also uses the module set from "big-01". It simulates
deletion of some modules (again, scsi_transport_sas and qla2xxx
plus their dependencies) and subsequent running of "depmod -I -a".
The input directory contains the depmod results for the full module
set. The results should be equal to the inputs of the
big-01-incremental test.

Signed-off-by: Martin Wilck <[email protected]>
This alternative playground will be used to build module sets
with disagreeing symbol versions. This must be done in separate
directories, otherwise the kernel build will mix up the
dependencies because some symbols will be defined multiple times
with different symbol version.

Signed-off-by: Martin Wilck <[email protected]>
This test takes again the module set from "big-01" and replaces
two modules such that a symbol version changes. "depmod -I -a"
is expected to pick up this change.

Signed-off-by: Martin Wilck <[email protected]>
This test is like the "replace" test, but it uses depmods's
built-in priority logic by copying the replacement modules
into the "updates" subdirectory. "depmod -I -a" is expected
to detect this change.

Signed-off-by: Martin Wilck <[email protected]>
Add a test script to test incremental depmod on a real kernel, as an addition
to the normal kmod CI. This script is meant to be run against the installed
kernel on a test system.

This test script compares the results (i.e. the modules.* files generated by
depmod) of incremental depmod with a full depmod run. The expectation is that
the results are always identical.

The script takes a list of module basenames as arguments, the "test modules",
like this:

   test-incremental.sh scsi_mod drm_display_helper

1. The script looks for a kernel module tree in the system it's running on
and copies this tree to the working directory, and runs a full depmod there.
2. It copies the entire tree again, deletes the test modules and all modules
depending on them, and runs full depmod once more.
3. It copies the tree created in 2., re-adds the previously deleted modules,
and runs "depmod --incremental" with the added modules as arguments, and
compares the results with those from step 1.
4. It copies the tree created in 2., re-adds the previously deleted modules,
runs "depmod --incremental --all", and compares the results with those from
step 1.
5. It copies the tree created in 2, runs "depmod --incremental --all", and
compares the results with those from step 2.

If any differences are found in step 3., 4., or 5., the script will report
them.

By default, all operations are done below a temporary directory under /tmp,
so no changes are made to the system. The autodetected parameters should work
on most systems. Otherwise, the script has various options to override them.

Copying this script into meson's build directory requires meson's
"fs.copyfile" feature, which was introduced in meson 0.64. The meson build
must therefore be disabled on Ubuntu 22.04, which still ships meson 0.61.

Signed-off-by: Martin Wilck <[email protected]>
@mwilck mwilck force-pushed the incremental-depmod branch from a5bc60d to 79d2e48 Compare December 3, 2024 22:46
@mwilck
Copy link
Contributor Author

mwilck commented Dec 3, 2024

I've added commit 1772e9f ("libkmod: only use first value when looking up symbols"). This used to be part of the set, but somehow during repeated rebasing and refactoring, it got lost before I created the PR.

Without that commit, modprobe complained when loading modules by symbol: alias, as described above in the PR description for modprobe without this patch set:

# modprobe symbol:iscsi_boot_create_kset
modprobe: ERROR: could not find module by name='e3a99f3c'
modprobe: ERROR: could not insert 'e3a99f3c': Unknown symbol in module, or unknown parameter (see dmesg)
# echo $?
1
# grep iscsi /proc/modules 
iscsi_boot_sysfs 20480 0 - Live 0xffffffffc2b3b000

With the added commit, that doesn't happen any more.

# ./modprobe symbol:iscsi_boot_create_kset
# echo $?
0

@mwilck
Copy link
Contributor Author

mwilck commented Dec 3, 2024

Let me know if you want me to add more comments to satisfy the github-advanced-security check.

@evelikov
Copy link
Collaborator

evelikov commented Dec 5, 2024

Massive thanks for the numbers. Seems like the perf polish we did in master is within the noise margin, when xz is used. Hey at least we didn't make things slower 😅

is the code coverage critically low? I thought I'd added quite a bit of test cases.

I wouldn't be too worried about the code coverage reports atm. We have some pre-existing gaps which may be miss-counted and the PR adds a very good amount of test IMHO.

Let me know if you want me to add more comments to satisfy the github-advanced-security check.

If it were me, I would split in 2-3 sub-functions which will also improve legibility. Although it's nothing to be too concerned about.

That sounds promising, too, but I hope you are not expecting me to work on it :)

Lucas opened a PR for the dlopen side. Once it lands I will open another to speed-up zstd a wee-bit.

Wrt this feature, I'm a bit split since:

  • most distros I've seen use uncompressed or zstd modules
  • perf is nice, even for the zstd-side
  • the extra toggles/workflow seems fragile and no clear suggestion comes to mind

@mwilck
Copy link
Contributor Author

mwilck commented Dec 5, 2024

What do you mean with "fragile"? I'm aware that it isn't immediately obvious that incremental mode doesn't break the depmod db. That's the reason why I spent much effort on writing test code. The "incremental" code path is mostly separate from the regular code path, which minimizes the risk that people or distributions who don't use incremental mode will suffer from regressions.

Wrt performance, I think I've shown that the improvement for the intended use case is noticeable, even with zstd. We're not talking about just a few per cent. As mentioned above, depmod is run multiple times for a typical kernel-module-package installation scenario, and even if a single run is only half a second, the difference with an without incremental mode is well noticeable for the end user. And, my timings have been taken on fairly capable hardware; on smaller systems the differences will be bigger.

@evelikov
Copy link
Collaborator

evelikov commented Dec 5, 2024

Please note, that I haven't spent great deal of time thinking things over or reading the code. So it's quite possible I'm talking complete BS - apologies in advance, if so 🙇

The "fragile" comment is based on a quick read through the man page changes. More specifically, in random order:

  • what would happen if we get a module name which matches $checksum at a random point in the future
  • how do we handle incremental updates, if the module "x" exists in multiple search locations
  • are things going to be handled correctly when package X ships a module and a depmod.d snippet changing the search ordering, exclude(dir), adding more overrides, etc
  • how would things behave if high(er) priority package/module is removed, yet we've forgotten to update the incremental DB
  • are we confident that distros will (correctly) call incremental depmod on package/module removal

While "no clear suggestion" is a self-directed, since I cannot see a constructive suggestions on how to address ^^ and/or reduce the two toggles.

Most importantly, Lucas is the project maintainer so please take his input with much greater weight.

@mwilck
Copy link
Contributor Author

mwilck commented Dec 5, 2024

what would happen if we get a module name which matches $checksum at a random point in the future

AFAICS, there's no conflict. symbols and modules are different namespaces. For symbol: lookups, the change in the symbol lookup procedure makes sure that a symbol name that happens to match a checksum can't be confused with the checksum.

how do we handle incremental updates, if the module "x" exists in multiple search locations

If there are several modules with the same module name, the depmod db only contains entries pointing the module with the highest-priority path. The existing code contains the logic that determines the path priority. The incremental code uses this existing logic to determine whether the module that is added or removed is the highest-priority one. Only in that case will changes be applied. According to my testing, this logic works correctly, but of course I may be overlooking something.

are things going to be handled correctly when package X ships a module and a depmod.d snippet changing the search ordering, exclude(dir), adding more overrides, etc

If the configuration is changed, the entire database needs to be rebuilt. That's independent of incremental mode.

how would things behave if high(er) priority package/module is removed, yet we've forgotten to update the incremental DB

It wouldn't work, because the data base would still contain entries referencing the removed module. But you might as well ask if it would work if we forgot to run depmod at all – it wouldn't. Again, this is independent on whether or not incremental mode is used.

are we confident that distros will (correctly) call incremental depmod on package/module removal

It depends. distro packagers make mistakes. I would hope that when distro packagers start adopting this feature, they'll think about it and make sure they use it (only) where appropriate.

I think that you could actually always use incremental depmod, except in the "configuration change" case. When you install a kernel for the first time, the data base will be empty, and incremental comes down to regular (it isn't recommended, as it will probably be slower than regular in this pathological scenario, but it should work).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants