Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unlimited Spin of Glibc's Read-write-lock Implementation #739

Open
PengZheng opened this issue Mar 12, 2024 · 4 comments
Open

Unlimited Spin of Glibc's Read-write-lock Implementation #739

PengZheng opened this issue Mar 12, 2024 · 4 comments

Comments

@PengZheng
Copy link
Contributor

PengZheng commented Mar 12, 2024

Last week, I investigated a Read-write-lock implementation issue affecting ALL versions of Glibc since 2.25, which is detailed in the following ML thread: https://sourceware.org/pipermail/libc-alpha/2024-March/155278.html

In summary, a reader of high RT priority that can not acquire its lock can do unlimited spin (eating all available CPUs) while a writer that holds its lock can not stop the reader from spinning because it has no chance to run.

Considering that rwlock is used in the central piece of our framework and glibc is the most extensively used C library, we should pay close attention to the progress of this issue.

Note that musl does not suffer from this issue, since it only does limited spin (up to 100 times, check the following email for an example). Neither is uclibc affected.

Event if Glibc addresses this issue quickly, we should warn our users of this issue. If it were ignored, then we may need to implement our own rdlock in the worst case. @pnoltes @xuzhenbao

Bug Report: https://sourceware.org/bugzilla/show_bug.cgi?id=31477

@PengZheng
Copy link
Contributor Author

The unlimited spin is introduced by this commit: https://sourceware.org/git/?p=glibc.git;a=commit;h=cc25c8b4c1196a8c29e9a45b1e096b99a87b7f8c

The current glibc rwlock is completely unusable together with real time priority tasks, though it is OK to use with SCHED_OTHER.
Considering the current design is super complex, I don't expect a upstream fix will be available in a year or two.
The workaround in my day job is to revert it to Ulrich Drepper's original design and implementation.

@PengZheng
Copy link
Contributor Author

PengZheng commented May 26, 2024

It seems that the glibc upstream is not interested in fixing it, so here is my fix for glibc 2.29:
https://github.com/PengZheng/glibc/commits/release/2.29/rw_fix/
03a1fca315a07800639acc5b333d5c08cc00fba9

Here is fix for glibc 2.25:
https://github.com/PengZheng/glibc/commits/release/2.25/rw_fix

@pnoltes
Copy link
Contributor

pnoltes commented May 31, 2024

Interesting issues.

I could be good to warn users, maybe in the CHANGES.md (known issues), but this can of course also occur in already released Apache Celix versions.

But this also triggers me that we are currently not building and testing using musl or uclibc, is that something we should also consider?

@PengZheng
Copy link
Contributor Author

PengZheng commented Jun 1, 2024

But this also triggers me that we are currently not building and testing using musl or uclibc, is that something we should also consider?

Yes. As for uclibc, we may use uclibc-ng instead, which is still actively maintained. IIRC, toolchains using uclibc does not have complete support of C++14.

I am also considering RTOS support, before which we need to support both static bundle and overall static build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants