Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libkmod: Refactor builtin modinfo parser #136

Closed
wants to merge 2 commits into from

Conversation

stoeckmann
Copy link
Contributor

@stoeckmann stoeckmann commented Sep 17, 2024

I have refactored the builtin modinfo parser to utilize stream functions, i.e. getdelim. This should make it much easier to see what's going on and heavily reduces the amount of system calls. In my tests, it's thrice as fast (you can check this best with one of the last entries in your builtin.modinfo file). Also you might want to use strace to see how many system calls are currently performed.

This removes the arbitrary INTPTR_MAX limit because the size of the file is of no interest, because the parser does not jump back and forth anymore.

Also, the explicit memory management is greatly reduced by using getdelim and kmod's strbuf.

Since the file is parsed from beginning to end, there is no TOCTOU left either. Right now, the file is parsed to retrieve the amount of strings we are interested in, then it's parsed again to retrieve the strings. If the file changes in between, this can lead to memory corruption. I would argue that this does not happen for system files, but with this -b option, we should be a bit more careful.

What this parser does:

  1. Read chunks from builtin.modinfo file, which are NUL separated
  2. If a chunk starts with looked up module name, start adding it to a string buffer
  3. If a chunk does not start with looked up module name anymore, stop parsing
  4. Turn string buffer into a memory chunk which contains a vector with char pointers at the start

Last but not least: I've added a test. Please let me know if there is some kind of "standard module set" you use, because right now I've taken a small sample of my live system.

libkmod/libkmod-builtin.c Fixed Show fixed Hide fixed
@stoeckmann stoeckmann force-pushed the builtin branch 3 times, most recently from d6112c0 to 885821b Compare September 17, 2024 19:07
libkmod/libkmod-builtin.c Dismissed Show dismissed Hide dismissed
Copy link
Collaborator

@evelikov evelikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improves overall performance as well.

Please provide some (any really) non-scientific numbers for posterity sake. Fwiw I'm already loving this from loc POV,

Implementation wise it looks spot-on. Left a few requests for inline comments and some pedantic nits 😅

}

snprintf(path, PATH_MAX, "%s/%s", dirname, MODULES_BUILTIN_MODINFO);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could do snprintf(path, PATH_MAX, MODULES_BUILTIN_MODINFO "/%s", dirname); since the compiler isn't smart enough to do it :-\

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the other way around, i.e. "%s/" MODULES_BUILTIN_MODINFO, but I can do it. Feels like we should refactor this "resolve a path" thing next, after all the snprintf handling is the other pending PR of mine.

return count;
}
count++;
} while (n != -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always be true afaict. Personally, I'd make this a for loop for (count = 0;....) with the INTPTR_MAX handling just outside of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike this do-while as well because it's really just a for(;;)-loop without triggering static analyzers, I guess. I didn't think about putting count into the for-loop because it implies that count is somehow relevant for further iterations. But then again, it still looks nicer than this one and for the unlikely overflow-event, it's actually relevant. Applied.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for(;;)-loop without triggering static analyzers

Remember seeing that in the past, didn't know it was still a thing. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not meant that I tried to obscure an endless loop, but trying to say through code that we loop here until no more lines are left (or calling break for optimization). So I've tried to bend my mind around "how to state that we are iterating through lines here?" but the for-loop with count is still nice, since it completely avoids this obscure endless loop and implies that we would eventually find an end if there are WAY too many strings around.

modlen = len;
} else if (modlen != len || strncmp(modname, line, len)) {
if (strncmp(line, modname, modlen) ||
line[modlen] != '.') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can keep this + ERR() on a single line.

The clang-format series will reformat this though, once it lands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

pos = offset;
if (n == -1 && !feof(info->fp)) {
count = -errno;
ERR(info->ctx, "get_string: %s\n", strerror(errno));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feof() does not set errno - is the error here correct? Is it effectively the "ENOMEM" from getdelimit()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check that feof is false, i.e. we did not reach the end. So yes, we effectively print the error of getdelim, which sets errno.

if (strncmp(line, modname, modlen) ||
line[modlen] != '.') {
if (count == 0)
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an inline comment why we continue here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. If we did not get a single string yet, it implies that no string so far matched our modname. This means that we are still looking for the correct block. If we have read at least one string and the modname mismatches again, it means that we are done.

free(iter->buf);
free(iter);
fclose(info->fp);
free(info->buf);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: flip the order and add a note that info->buf is allocated in get_string()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flipped, added a note to comment in struct declaration.

ERR(iter->ctx, "kmod_builtin_iter_get_modname: unexpected string without modname prefix\n");
goto fail;
if (SIZE_MAX / sizeof(char *) - 1 < count ||
SIZE_MAX - buf->used < sizeof(char *) * (count + 1)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's throw in a const fooo = sizeof(char *) * (count + 1) and a small comment above this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant a comment just above the if check if vecsz has overflown or it won't cause an overflow in realloc below.

}

len = dot - line;
vector = realloc(buf->bytes, buf->used + sizeof(char *) * (count + 1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been somewhat tempted to change the strbuf helpers to:

  • allocate +1 for the null byte + always set it
  • steal -> disown, which returns a pointer and effectively invalidates the strbuf

Those two will a) remove the explicit NULL handling further up and b) simplify this function.

What do you think about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably won't hurt but this file will have a "special" strbuf handling anyway, because it puts NUL bytes into it on purpose. If at all, I would expect the most negative feedback on this PR for doing exactly that. ;)

fail:
kmod_builtin_iter_free(iter);
kmod_builtin_info_release(&info);
strbuf_release(&buf);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's teardown in reverse order of things being setup. strbuf first, then modinfo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one applied as well. :)

@stoeckmann stoeckmann force-pushed the builtin branch 3 times, most recently from 4c0aed5 to a90ea5a Compare September 18, 2024 21:39
Copy link
Collaborator

@evelikov evelikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace the Improves overall performance as well. line in the commit message with:

This converts the 1000+ pread64 syscalls, to a dozen read(s). Effectively
reducing the syscalls we make by ~90%

        $ strace before/modinfo drm 2>&1  >/dev/null  | wc -l
        1480
        $ strace -e read before/modinfo drm 2>&1  >/dev/null  | wc -l
        23
        $ strace -e pread64 before/modinfo drm 2>&1  >/dev/null  | wc -l
        1375

        $ strace after/modinfo drm 2>&1  >/dev/null  | wc -l
        119
        $ strace -e read after/modinfo drm 2>&1  >/dev/null  | wc -l
        34
        $ strace -e pread64 after/modinfo drm 2>&1  >/dev/null  | wc -l
        3

Awesome work 👍

@stoeckmann
Copy link
Contributor Author

I have added a comment and adjusted the commit message. I have mentioned the system call reduction but also pointed out that we most likely do allocate memory more often, even though basically the same amount. This means that I won't argue that every program which uses less system calls is automatically faster, but that the reduction of system calls definitely outweights the added memory handling.

Also, this is not the main motivation, but a very nice additional benefit.

@evelikov
Copy link
Collaborator

In case you're wondering why strace numbers. I've seen it used a handful of times in git log, so when in Rome ;-)

As you mentioned memory curiosity got the best of me, so here are some numbers for that.

command: tools/modinfo drm
Valgrind total heap usage:
489 allocs, 489 frees, 68,391 bytes allocated - BEFORE
288 allocs, 288 frees, 71,953 bytes allocated - AFTER

Valgrind/massif peak memory used
15.98K - BEFORE
18.19K - AFTER

command: time { for i in $(seq 1 1000); do modinfo drm &>/dev/null; done }
before:

real    0m3.107s
user    0m2.259s
sys     0m0.875s

after:

real    0m1.262s
user    0m1.006s
sys     0m0.292s

So overall, ignoring the syscall side, instead of repeatedly allocating 256-512 byte chunks it allocates one 4K and keeping them for longer. Resulting in 2-3 times shorter runtime.

But as you said - performance side is a nice benefit. Security, correctness and code clarity are the main goals.

Copy link
Contributor

@lucasdemarchi lucasdemarchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. Overall LGTM. Nice improvement.

}

static bool kmod_builtin_iter_get_modname(struct kmod_builtin_iter *iter,
char modname[static PATH_MAX])
static char **strbuf_to_vector(struct strbuf *buf, size_t count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reminds me of a similar functions systemd has with the name "strv". I think we may eventually migrate this function to strbuf.h, but I'm fine with keep it where it is for now.

}

len = dot - line;
vector = realloc(buf->bytes, vecsz + buf->used);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of forcing a realloc, should we rather grow strbuf? It may be that we still have room and don't actually need the extra alloc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used the same realloc which would occur in strbuf_steal, i.e. even if we have enough memory, don't waste additional unneeded one.

if (count == 0)
*modinfo = NULL;
else if (count > 0) {
*modinfo = strbuf_to_vector(&buf, (size_t)count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider strbuf_to_vector() as a function stealing/disowning the buffer. As such it's odd to still call strbuf_release() below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If everything works as expected, then yes: strbuf_to_vector would steal/disown the buffer. But if realloc fails, then the strbuf is considered "left alone for caller to clean up who passed reference to us."

Going the Rust way, we could argue that we pass ownership to strbuf_to_vector. But on the other hand, from a C perspective, we construct the strbuf in kmod_builtin_get_modinfo so it should be its responsibility to clean up any resources associated with it when leaving.

So I went with the second (C) approach. Let me know what you think, I don't have a hard opinion either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the cleanup into error handling. I think it actually looks nicer this way, like "I expect this function to convert the buffer and only if it fails, I release the whole thing on my own."

The modinfo command can show information about builtin modules. Make
sure that it functions correctly.

Signed-off-by: Tobias Stoeckmann <[email protected]>
Remove arbitrary limits due to file sizes (INTPR_MAX check). Reduce
amount of system calls by up to 90 % utilizing stream functions.

Also make sure that no TOCTOU could ever happen by not iterating
through the file twice: First to figure out amount of strings, then
parsing them. If the file changes in between, this can lead to
memory corruption.

Even though more memory allocations might occur due to strbuf usage,
performance generally increased by heavy reduction of system calls.

Signed-off-by: Tobias Stoeckmann <[email protected]>
lucasdemarchi pushed a commit that referenced this pull request Sep 23, 2024
The modinfo command can show information about builtin modules. Make
sure that it functions correctly.

Signed-off-by: Tobias Stoeckmann <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Link: #136
Signed-off-by: Lucas De Marchi <[email protected]>
lucasdemarchi pushed a commit that referenced this pull request Sep 23, 2024
Remove arbitrary limits due to file sizes (INTPR_MAX check). Reduce
amount of system calls by up to 90 % utilizing stream functions.

Also make sure that no TOCTOU could ever happen by not iterating
through the file twice: First to figure out amount of strings, then
parsing them. If the file changes in between, this can lead to
memory corruption.

Even though more memory allocations might occur due to strbuf usage,
performance generally increased by heavy reduction of system calls.

Signed-off-by: Tobias Stoeckmann <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Link: #136
Signed-off-by: Lucas De Marchi <[email protected]>
@lucasdemarchi
Copy link
Contributor

Applied, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants