libkmod: Refactor builtin modinfo parser #136

stoeckmann · 2024-09-17T18:09:34Z

I have refactored the builtin modinfo parser to utilize stream functions, i.e. getdelim. This should make it much easier to see what's going on and heavily reduces the amount of system calls. In my tests, it's thrice as fast (you can check this best with one of the last entries in your builtin.modinfo file). Also you might want to use strace to see how many system calls are currently performed.

This removes the arbitrary INTPTR_MAX limit because the size of the file is of no interest, because the parser does not jump back and forth anymore.

Also, the explicit memory management is greatly reduced by using getdelim and kmod's strbuf.

Since the file is parsed from beginning to end, there is no TOCTOU left either. Right now, the file is parsed to retrieve the amount of strings we are interested in, then it's parsed again to retrieve the strings. If the file changes in between, this can lead to memory corruption. I would argue that this does not happen for system files, but with this -b option, we should be a bit more careful.

What this parser does:

Read chunks from builtin.modinfo file, which are NUL separated
If a chunk starts with looked up module name, start adding it to a string buffer
If a chunk does not start with looked up module name anymore, stop parsing
Turn string buffer into a memory chunk which contains a vector with char pointers at the start

Last but not least: I've added a test. Please let me know if there is some kind of "standard module set" you use, because right now I've taken a small sample of my live system.

libkmod/libkmod-builtin.c

evelikov

Improves overall performance as well.

Please provide some (any really) non-scientific numbers for posterity sake. Fwiw I'm already loving this from loc POV,

Implementation wise it looks spot-on. Left a few requests for inline comments and some pedantic nits 😅

evelikov · 2024-09-18T17:02:48Z

libkmod/libkmod-builtin.c

 	}
-
 	snprintf(path, PATH_MAX, "%s/%s", dirname, MODULES_BUILTIN_MODINFO);


Nit: we could do snprintf(path, PATH_MAX, MODULES_BUILTIN_MODINFO "/%s", dirname); since the compiler isn't smart enough to do it :-\

It's the other way around, i.e. "%s/" MODULES_BUILTIN_MODINFO, but I can do it. Feels like we should refactor this "resolve a path" thing next, after all the snprintf handling is the other pending PR of mine.

evelikov · 2024-09-18T17:05:10Z

libkmod/libkmod-builtin.c

+			return count;
+		}
+		count++;
+	} while (n != -1);


This will always be true afaict. Personally, I'd make this a for loop for (count = 0;....) with the INTPTR_MAX handling just outside of the loop.

I dislike this do-while as well because it's really just a for(;;)-loop without triggering static analyzers, I guess. I didn't think about putting count into the for-loop because it implies that count is somehow relevant for further iterations. But then again, it still looks nicer than this one and for the unlikely overflow-event, it's actually relevant. Applied.

for(;;)-loop without triggering static analyzers

Remember seeing that in the past, didn't know it was still a thing. Thanks

It was not meant that I tried to obscure an endless loop, but trying to say through code that we loop here until no more lines are left (or calling break for optimization). So I've tried to bend my mind around "how to state that we are iterating through lines here?" but the for-loop with count is still nice, since it completely avoids this obscure endless loop and implies that we would eventually find an end if there are WAY too many strings around.

evelikov · 2024-09-18T17:08:09Z

libkmod/libkmod-builtin.c

-			modlen = len;
-		} else if (modlen != len || strncmp(modname, line, len)) {
+		if (strncmp(line, modname, modlen) ||
+		    line[modlen] != '.') {


Nit: you can keep this + ERR() on a single line.

The clang-format series will reformat this though, once it lands.

Nice, thanks!

evelikov · 2024-09-18T17:10:27Z

libkmod/libkmod-builtin.c

-		pos = offset;
+	if (n == -1 && !feof(info->fp)) {
+		count = -errno;
+		ERR(info->ctx, "get_string: %s\n", strerror(errno));


feof() does not set errno - is the error here correct? Is it effectively the "ENOMEM" from getdelimit()?

We check that feof is false, i.e. we did not reach the end. So yes, we effectively print the error of getdelim, which sets errno.

evelikov · 2024-09-18T17:12:04Z

libkmod/libkmod-builtin.c

+		if (strncmp(line, modname, modlen) ||
+		    line[modlen] != '.') {
+			if (count == 0)
+				continue;


Can you add an inline comment why we continue here?

Will do. If we did not get a single string yet, it implies that no string so far matched our modname. This means that we are still looking for the correct block. If we have read at least one string and the modname mismatches again, it means that we are done.

evelikov · 2024-09-18T17:21:21Z

libkmod/libkmod-builtin.c

-	free(iter->buf);
-	free(iter);
+	fclose(info->fp);
+	free(info->buf);


Nit: flip the order and add a note that info->buf is allocated in get_string()?

Flipped, added a note to comment in struct declaration.

evelikov · 2024-09-18T17:29:12Z

libkmod/libkmod-builtin.c

-		ERR(iter->ctx, "kmod_builtin_iter_get_modname: unexpected string without modname prefix\n");
-		goto fail;
+	if (SIZE_MAX / sizeof(char *) - 1 < count ||
+	    SIZE_MAX - buf->used < sizeof(char *) * (count + 1)) {


Let's throw in a const fooo = sizeof(char *) * (count + 1) and a small comment above this.

Meant a comment just above the if check if vecsz has overflown or it won't cause an overflow in realloc below.

evelikov · 2024-09-18T17:34:53Z

libkmod/libkmod-builtin.c

 	}

-	len = dot - line;
+	vector = realloc(buf->bytes, buf->used + sizeof(char *) * (count + 1));


I have been somewhat tempted to change the strbuf helpers to:

allocate +1 for the null byte + always set it

steal -> disown, which returns a pointer and effectively invalidates the strbuf

Those two will a) remove the explicit NULL handling further up and b) simplify this function.

What do you think about it?

It probably won't hurt but this file will have a "special" strbuf handling anyway, because it puts NUL bytes into it on purpose. If at all, I would expect the most negative feedback on this PR for doing exactly that. ;)

evelikov · 2024-09-18T17:36:23Z

libkmod/libkmod-builtin.c

-fail:
-	kmod_builtin_iter_free(iter);
+	kmod_builtin_info_release(&info);
+	strbuf_release(&buf);


Nit: let's teardown in reverse order of things being setup. strbuf first, then modinfo.

And this one applied as well. :)

evelikov

I would replace the Improves overall performance as well. line in the commit message with:

This converts the 1000+ pread64 syscalls, to a dozen read(s). Effectively
reducing the syscalls we make by ~90%

        $ strace before/modinfo drm 2>&1  >/dev/null  | wc -l
        1480
        $ strace -e read before/modinfo drm 2>&1  >/dev/null  | wc -l
        23
        $ strace -e pread64 before/modinfo drm 2>&1  >/dev/null  | wc -l
        1375

        $ strace after/modinfo drm 2>&1  >/dev/null  | wc -l
        119
        $ strace -e read after/modinfo drm 2>&1  >/dev/null  | wc -l
        34
        $ strace -e pread64 after/modinfo drm 2>&1  >/dev/null  | wc -l
        3

Awesome work 👍

stoeckmann · 2024-09-19T17:32:16Z

I have added a comment and adjusted the commit message. I have mentioned the system call reduction but also pointed out that we most likely do allocate memory more often, even though basically the same amount. This means that I won't argue that every program which uses less system calls is automatically faster, but that the reduction of system calls definitely outweights the added memory handling.

Also, this is not the main motivation, but a very nice additional benefit.

evelikov · 2024-09-20T12:20:59Z

In case you're wondering why strace numbers. I've seen it used a handful of times in git log, so when in Rome ;-)

As you mentioned memory curiosity got the best of me, so here are some numbers for that.

command: tools/modinfo drm
Valgrind total heap usage:
489 allocs, 489 frees, 68,391 bytes allocated - BEFORE
288 allocs, 288 frees, 71,953 bytes allocated - AFTER

Valgrind/massif peak memory used
15.98K - BEFORE
18.19K - AFTER

command: time { for i in $(seq 1 1000); do modinfo drm &>/dev/null; done }
before:

real    0m3.107s
user    0m2.259s
sys     0m0.875s

after:

real    0m1.262s
user    0m1.006s
sys     0m0.292s

So overall, ignoring the syscall side, instead of repeatedly allocating 256-512 byte chunks it allocates one 4K and keeping them for longer. Resulting in 2-3 times shorter runtime.

But as you said - performance side is a nice benefit. Security, correctness and code clarity are the main goals.

lucasdemarchi

I left some comments. Overall LGTM. Nice improvement.

lucasdemarchi · 2024-09-21T16:24:46Z

libkmod/libkmod-builtin.c

 }

-static bool kmod_builtin_iter_get_modname(struct kmod_builtin_iter *iter,
-				char modname[static PATH_MAX])
+static char **strbuf_to_vector(struct strbuf *buf, size_t count)


this reminds me of a similar functions systemd has with the name "strv". I think we may eventually migrate this function to strbuf.h, but I'm fine with keep it where it is for now.

lucasdemarchi · 2024-09-21T16:26:08Z

libkmod/libkmod-builtin.c

-	}
-
-	len = dot - line;
+	vector = realloc(buf->bytes, vecsz + buf->used);


instead of forcing a realloc, should we rather grow strbuf? It may be that we still have room and don't actually need the extra alloc.

I've used the same realloc which would occur in strbuf_steal, i.e. even if we have enough memory, don't waste additional unneeded one.

lucasdemarchi · 2024-09-21T16:30:54Z

libkmod/libkmod-builtin.c

+	if (count == 0)
+		*modinfo = NULL;
+	else if (count > 0) {
+		*modinfo = strbuf_to_vector(&buf, (size_t)count);


I'd consider strbuf_to_vector() as a function stealing/disowning the buffer. As such it's odd to still call strbuf_release() below.

If everything works as expected, then yes: strbuf_to_vector would steal/disown the buffer. But if realloc fails, then the strbuf is considered "left alone for caller to clean up who passed reference to us."

Going the Rust way, we could argue that we pass ownership to strbuf_to_vector. But on the other hand, from a C perspective, we construct the strbuf in kmod_builtin_get_modinfo so it should be its responsibility to clean up any resources associated with it when leaving.

So I went with the second (C) approach. Let me know what you think, I don't have a hard opinion either way.

I've moved the cleanup into error handling. I think it actually looks nicer this way, like "I expect this function to convert the buffer and only if it fails, I release the whole thing on my own."

The modinfo command can show information about builtin modules. Make sure that it functions correctly. Signed-off-by: Tobias Stoeckmann <[email protected]>

Remove arbitrary limits due to file sizes (INTPR_MAX check). Reduce amount of system calls by up to 90 % utilizing stream functions. Also make sure that no TOCTOU could ever happen by not iterating through the file twice: First to figure out amount of strings, then parsing them. If the file changes in between, this can lead to memory corruption. Even though more memory allocations might occur due to strbuf usage, performance generally increased by heavy reduction of system calls. Signed-off-by: Tobias Stoeckmann <[email protected]>

The modinfo command can show information about builtin modules. Make sure that it functions correctly. Signed-off-by: Tobias Stoeckmann <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Link: #136 Signed-off-by: Lucas De Marchi <[email protected]>

Remove arbitrary limits due to file sizes (INTPR_MAX check). Reduce amount of system calls by up to 90 % utilizing stream functions. Also make sure that no TOCTOU could ever happen by not iterating through the file twice: First to figure out amount of strings, then parsing them. If the file changes in between, this can lead to memory corruption. Even though more memory allocations might occur due to strbuf usage, performance generally increased by heavy reduction of system calls. Signed-off-by: Tobias Stoeckmann <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Link: #136 Signed-off-by: Lucas De Marchi <[email protected]>

lucasdemarchi · 2024-09-23T13:30:42Z

Applied, thanks.

github-advanced-security bot found potential problems Sep 17, 2024

View reviewed changes

libkmod/libkmod-builtin.c Fixed Show fixed Hide fixed

stoeckmann force-pushed the builtin branch 3 times, most recently from d6112c0 to 885821b Compare September 17, 2024 19:07

github-advanced-security bot found potential problems Sep 17, 2024

View reviewed changes

libkmod/libkmod-builtin.c Dismissed Show dismissed Hide dismissed

evelikov requested changes Sep 18, 2024

View reviewed changes

stoeckmann force-pushed the builtin branch 3 times, most recently from 4c0aed5 to a90ea5a Compare September 18, 2024 21:39

evelikov approved these changes Sep 19, 2024

View reviewed changes

stoeckmann force-pushed the builtin branch from a90ea5a to 8ab9144 Compare September 19, 2024 17:30

lucasdemarchi reviewed Sep 21, 2024

View reviewed changes

stoeckmann force-pushed the builtin branch from 8ab9144 to 2863400 Compare September 22, 2024 19:11

testsuite: add builtin test for modinfo

dfdfd21

The modinfo command can show information about builtin modules. Make sure that it functions correctly. Signed-off-by: Tobias Stoeckmann <[email protected]>

stoeckmann force-pushed the builtin branch from 2863400 to 156a227 Compare September 22, 2024 19:12

stoeckmann force-pushed the builtin branch from 156a227 to 0cf7650 Compare September 22, 2024 19:13

lucasdemarchi closed this Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libkmod: Refactor builtin modinfo parser #136

libkmod: Refactor builtin modinfo parser #136

stoeckmann commented Sep 17, 2024 •

edited

Loading

evelikov left a comment

evelikov Sep 18, 2024

stoeckmann Sep 18, 2024

evelikov Sep 18, 2024

stoeckmann Sep 18, 2024

evelikov Sep 19, 2024

stoeckmann Sep 19, 2024

evelikov Sep 18, 2024

stoeckmann Sep 18, 2024

evelikov Sep 18, 2024

stoeckmann Sep 18, 2024

evelikov Sep 18, 2024

stoeckmann Sep 18, 2024

evelikov Sep 18, 2024

stoeckmann Sep 18, 2024

evelikov Sep 18, 2024

stoeckmann Sep 18, 2024

evelikov Sep 19, 2024

evelikov Sep 18, 2024

stoeckmann Sep 18, 2024

evelikov Sep 18, 2024

stoeckmann Sep 18, 2024

evelikov left a comment

stoeckmann commented Sep 19, 2024

evelikov commented Sep 20, 2024

lucasdemarchi left a comment

lucasdemarchi Sep 21, 2024

lucasdemarchi Sep 21, 2024

stoeckmann Sep 22, 2024

lucasdemarchi Sep 21, 2024

stoeckmann Sep 22, 2024

stoeckmann Sep 22, 2024

lucasdemarchi commented Sep 23, 2024

		}

		snprintf(path, PATH_MAX, "%s/%s", dirname, MODULES_BUILTIN_MODINFO);

libkmod: Refactor builtin modinfo parser #136

libkmod: Refactor builtin modinfo parser #136

Conversation

stoeckmann commented Sep 17, 2024 • edited Loading

evelikov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evelikov left a comment

Choose a reason for hiding this comment

stoeckmann commented Sep 19, 2024

evelikov commented Sep 20, 2024

lucasdemarchi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucasdemarchi commented Sep 23, 2024

stoeckmann commented Sep 17, 2024 •

edited

Loading