Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libkmod: Improve index dump performance #250

Closed
wants to merge 1 commit into from

Conversation

stoeckmann
Copy link
Contributor

@stoeckmann stoeckmann commented Nov 16, 2024

Use FILE for output to reduce the amount of system calls. Removes the only use case of write_str_safe, which can be removed as well.

This has another advantage, beside faster execution: Having a FILE in kmod_dump_index allows us easier error detection in the future (by using ferror here instead of multiplying it into FILE-based and memory-mapped index functions).

Talking about execution times, these are collected from modules of an Arch Linux installation:

current master

Uses 198497 write calls.
Takes around 224 ms on this system.

$ strace -c modprobe -c > output.txt
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.98    0.224581           1    198497           write
  0.02    0.000045           7         6           munmap
  0.00    0.000000           0        13           read
  0.00    0.000000           0        17           close
  0.00    0.000000           0        18           fstat
  0.00    0.000000           0        30           mmap
  0.00    0.000000           0         7           mprotect
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         2           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         3           fcntl
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         4           getdents64
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0        20         3 openat
  0.00    0.000000           0         9         5 newfstatat
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           prlimit64
  0.00    0.000000           0         1           getrandom
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    0.224626           1    198637         9 total
$ size tools/kmod libkmod/.libs/libkmod.so.2.5.0
   text    data     bss     dec     hex filename
 158343    5852     160  164355   28203 tools/kmod
  96173    2248       8   98429   1807d libkmod/.libs/libkmod.so.2.5.0

new

Uses 573 write calls.
Takes around 7 ms.

Binary size shrinks by around 200 bytes, library size increases by around 60 bytes.
Speed increasement based on strace: around 30x
Speed increasement based on time: 4x to 10x (depending if you redirect to /dev/null or a real file)

$ strace -c modprobe -c > output.txt
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 84.74    0.005751          10       573           write
  2.95    0.000200           6        30           mmap
  2.08    0.000141           7        20         3 openat
  1.93    0.000131           6        19           close
  1.66    0.000113           5        20           fstat
  1.31    0.000089          14         6           munmap
  1.21    0.000082           6        13           read
  0.96    0.000065           9         7           mprotect
  0.80    0.000054           6         9         5 newfstatat
  0.52    0.000035           5         6           fcntl
  0.50    0.000034           8         4           getdents64
  0.24    0.000016           5         3           brk
  0.22    0.000015           7         2           pread64
  0.18    0.000012           6         2           dup
  0.13    0.000009           9         1           set_tid_address
  0.12    0.000008           8         1           set_robust_list
  0.10    0.000007           7         1           arch_prctl
  0.10    0.000007           7         1           prlimit64
  0.10    0.000007           7         1           getrandom
  0.09    0.000006           6         1           rseq
  0.07    0.000005           5         1           lseek
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
------ ----------- ----------- --------- --------- ----------------
100.00    0.006787           9       723         9 total
$ size tools/kmod libkmod/.libs/libkmod.so.2.5.0
   text    data     bss     dec     hex filename
 158143    5852     160  164155   2813b tools/kmod
  96208    2272       8   98488   180b8 libkmod/.libs/libkmod.so.2.5.0

Copy link

codecov bot commented Nov 16, 2024

Codecov Report

Attention: Patch coverage is 24.32432% with 28 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
libkmod/libkmod-index.c 0.00% 16 Missing ⚠️
libkmod/libkmod.c 42.85% 8 Missing and 4 partials ⚠️
Files with missing lines Coverage Δ
shared/util.c 72.72% <ø> (ø)
shared/util.h 61.53% <ø> (ø)
testsuite/test-util.c 85.58% <ø> (ø)
libkmod/libkmod.c 49.76% <42.85%> (ø)
libkmod/libkmod-index.c 50.82% <0.00%> (ø)

🚨 Try these New Features:

Use FILE for output to reduce the amount of system calls. Removes the
only use case of write_str_safe, which can be removed as well.

Signed-off-by: Tobias Stoeckmann <[email protected]>
Copy link
Collaborator

@evelikov evelikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbers look amazing - nicely spotted. The benefit might shrink a bit when(if) we reintroduce fwrite_str_safe(), but as a whole it should still be a win.

assert_cc(EAGAIN == EWOULDBLOCK);

do {
ssize_t r = write(fd, buf + done, todo);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove the helper instead of adapting it to use fwrite()? AFAICT all the safety handling it does is still applicable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with FILE output is that fclose itself could still write data. And this is a one-shot approach. We cannot call it multiple times until the remaining data is successfully written.

I mistakenly believed that FILE itself handles short writes, but I was wrong. So if we want to keep interrupt-safe writes, we cannot use FILE here. At best, we would adjust the API, but ... Not in a simple PR. :)

if (ctx == NULL)
return -ENOSYS;

fd2 = dup(fd);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT libraries should really be using CLOEXEC, aka fcntl(fd, F_DUPFD_CLOEXEC...) or alike.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fdopen would set ´O_CLOEXEC. Unfortunately I cannot simply use dup3`, which would do it atomically for us, because then I would have to figure out a free fd first (or open one just to reserve it).

if (fp == NULL) {
err = -errno;
close(fd2);
return err;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move all the whole dup/fdopen hunk after all the input validation - aka after the type check below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would adjust it if we decide if it's even worth it:

  • FILE loses EINTR handling (modprobe -c shouldn't really care, since it first uses FILE and then passes a file descriptor to library, but let's not break API and whatever other users might rely on)
  • wbuf speeds up processing, but introduces custom code, not improving the error handling

So ... let's see. I wouldn't mind keeping everything as it is right now and, if we ever adjust the API in a release, reconsider turning it FILE-based.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FILE loses EINTR handling

My slight inclination towards a wbuf-like approach got a whole lot stronger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second that. What we could have is one additional API that receives a FILE* as argument.... we are not really going to change the API and break users because of this.

However if we add the new API, then we don't drop the old code since converting it to use FILE* would mean to lose EINTR handling. So.... I'm leaning towards the wbuf approach.

fwrite(buf->bytes, 1, buf->used, fp);
fputc(' ', fp);
fwrite(v.value, 1, v.len, fp);
fputc('\n', fp);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: have you tried placing the contents into a buffer (on the stack) and doing a single write for given value instead of 4? Does it make a difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already helps by almost 50 %, which is nice. At first I tried scratchbuf (and soon strbuf), but we could also keep a buffer around for longer. See #252 as an example.

@stoeckmann
Copy link
Contributor Author

Will close this one since we prefer #252.

@stoeckmann stoeckmann closed this Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants