Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backports for 2.0.27 #2678

Merged
merged 14 commits into from
Sep 15, 2024
Merged

Backports for 2.0.27 #2678

merged 14 commits into from
Sep 15, 2024

Conversation

xrmx
Copy link
Collaborator

@xrmx xrmx commented Sep 15, 2024

No description provided.

asottile and others added 14 commits September 8, 2024 15:53
In all versions of pyuwsgi at the moment the first fork has a NULL threadstate
due to uwsgi_python_master_fixup which calls UWSGI_RELEASE_GIL (expanded to
PyEval_SaveThread -- which drops the GIL and sets threadstate to NULL).
This is called during uwsgi_setup.
After uwsgi_setup was returning, PyThreadState_Swap was restoring the pyuwsgi
threadstate (in both the original and worker processes)

Future forks would have the pyuwsgi threadstate active (from the
restoration at PyThreadState_Swap) in python versions < 3.12 this wasn't an issue.
In 3.12+ the PyEval_RestoreThread would attempt to take_gil and then block forever
on the GIL mutex (despite it actually holding it? due to the fork state from the
parent process).

Bisecting cpython showed that python/cpython@92d8bff slightly changed behaviour
of PyThreadState_Swap (it now additionally manages GIL state: unlocking the
previous threadstate and locking the new threadstate).
Putting a log line in the PyThreadState_Swap showed a suspicious swapping from
oldts=123123 to newts=123123 (swapping from its own threadstate to itself?);
this is because after forking control would be given back to the original
threadstate (which mostly worked but was in UB territory given the GIL state).

In 3.11 the threadstate that was restored after the PyThreadState_Swap did not
have the GIL locked (technically this could have allowed a data race if threads
existed before starting uwsgi via pyuwsgi).

In 3.12 since PyThreadState_Swap was changed to release the old threadstate's GIL
and acquire the GIL in the new threadstate this meant that the saved threadstate
had ->locked = 1 (which is sort of an invalid state?).
As far as I can tell there aren't any public apis to undo this and "restore" the
3.11 behaviour precisely.
Then later it would try and lock (despite already being -> locked = 1) and
deadlock against itself this is actually called out on the docs:
  If the lock has been created, the current thread must not have acquired it,
  otherwise deadlock ensues.

To fix this once we call uwsgi_setup we never give control back to the original
pyuwsgi threadstate avoiding the Swap dance entirely.
With the following:

```
wsgi = app:app
http = :8000
master = true
processes = 2
harakiri = 15
harakiri-verbose = true
harakiri-graceful-timeout = 15
harakiri-graceful-signal = 15
max-requests = 100000
memory-report = true
enable-threads = true
threads = 4
enable-thread = true
showconfig = true
listen = 1024
post-buffering = 8192
buffer-size = 32768
lazy = true
http-keepalive = 1
add-header = Connection: Keep-Alive
http-timeout = 70
socket-timeout = 75
hook-master-start = unix_signal:15 gracefully_kill_them_all
vacuum = true
hook-master-start = unix_signal:15 gracefully_kill_them_all
```

kill -s 15 master-pid while request does not complete. fllowing is uwsgi log:

```
running "unix_signal:15 gracefully_kill_them_all" (master-start)...
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x55de52cb68f0 pid: 143521 (default app)
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x55de52cb68f0 pid: 143520 (default app)
graceful shutdown triggered...
Gracefully killing worker 1 (pid: 143520)...
gateway "uWSGI http 1" has been buried (pid: 143522)
Gracefully killing worker 2 (pid: 143521)...
worker 1 buried after 1 seconds
{address space usage: 277147648 bytes/264MB} {rss usage: 34459648 bytes/32MB} [pid: 143521|app: 0|req: 1/1] 127.0.0.1 () {28 vars in 291 bytes
} [Mon Jul 22 09:13:34 2024] GET / => generated 11 bytes in 6036 msecs (HTTP/1.1 200) 3 headers in 103 bytes (1 switches on core 0)
worker 2 buried after 4 seconds
goodbye to uWSGI.
```

The gateway process(pid=143522) is closed prematurely, causing the client to be unable to correctly
receive the request result.

I think you should wait for the worker process to shut down before shutting down the gateway process

Fix unbit#2656
This bug arose due to the fact that the traceback module was overhauled
in python 3.5: https://hg.python.org/cpython/rev/73afda5a4e4c
The crash was due to PyTuple_GetItem returning NULL as a traceback is
no longer a list of tuples but rather a list of objects (which support indexing).
To avoid calling functions made private from 3.13+.
And probably fixing issues with C extensions.
As it should have been since years :_)
That is still in beta but removed private C APIs we were using.
As mentioned in unbit#2655, this changes
the way support for python 3.13 is handled. Instead of handling python
3.13 as a minor change from 3.12, and handling support for it under the
3.12 `#ifdef` blocks, this breaks out 3.13 into its own block, apart
from 3.12. This makes the code a bit more verbose, but makes it easier
to see what the structures look like for different python versions.
[xrmx: add 36e045c to reduce delta with
master]
@xrmx xrmx merged commit 64502f6 into unbit:uwsgi-2.0 Sep 15, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants