You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm thoroughly pleased to be able to say that I finally understand the issue that's been bothering me on Report URI for a few weeks now, and this is the blog post that's going to explain everything!
Report URI
If you haven't read the first two blog posts related to this issue, it might help you to get some background on what I'm going to be talking about here:
This weird behaviour of our application receiving and then trying to process truncated POST requests has lead me on quite a journey of learning, and I can finally explain what's happening!
The Story So Far
In the previous blog posts, I'd narrowed it down to what was happening, but not why it was happening, and I was still stumped. After patching PHP, I could see it was calling read() on the socket and reading until 0 was returned, but 0 was being returned before expected. Here's the log of the behaviour from the previous blog post.
I knew that Nginx was receiving the full request, so the problem was happening in the fastcgi_pass to PHP-FPM, but I couldn't wrap my head around the circumstances. I wanted to understand more on the underlying mechanics of how Nginx was talking to PHP and, more specifically, what was failing in that process.
bpftrace
In the last blog I had to get familiar with strace for the first time, and this time around, I'd asked for some assistance from Ignat Korchagin who kindly whipped up several bpftrace commands, and, could also reliably reproduce the issue. Hopefully, by monitoring things a little more closely between Nginx and PHP, I could gain an understanding of precisely what was happening.
After monitoring my test server when the issue was happening, I could see Nginx hitting EAGAIN ( source.-,EAGAIN,-Resource%20temporarily%20unavailable)) when trying to write to the socket, and I could see PHP getting EPIPE ( source when trying to read from the socket, but it still wasn't clear to me how that was occurring. The EAGAIN error is likely because the socket buffer is full when Nginx is sending, so it can't send any more data, and the EPIPE comes when PHP tries to read from the socket that has since been closed by Nginx. Here's the command.
The log reads like Nginx is trying to write to the socket which has a full buffer, and PHP is trying to read from a socket that's closed, but if PHP has called accept(), why would it not be able to read()? The accept() call is followed by read() calls in the PHP source almost immediately so it didn't make sense. I also saw 12x EPIPE (-32) errors coming from PHP and I had exactly 12x truncated payload errors in syslog using the same logging from the previous blog post.
Digging a little deeper, I wanted to see when the calls to connect() and accept() were being made.
After firing another batch of 20 requests, I could see all of the connect() calls from nginx happen right away, but the accept() calls from PHP were much more spaced out.
I still wasn't clear on how this was working because PHP is calling accept() and read() very close together, so if PHP can accept, why is the read failing with a partial payload?!
The truncated payload issued showing in syslog matched up to the late accept() calls from PHP! It was at this point I was prompted to look at the backlog parameter of listen() and it became clear that this behaviour can be explained by backlog, something I wasn't familiar with.
int listen(intsockfd, intbacklog);
When calling listen(), the second parameter provided is backlog, which is described as follows in the docs:
The backlog argument defines the maximum length to which the
queue of pending connections for sockfd may grow.
So now we can have an application, like Nginx, call connect() and succeed but without the application on the other side, PHP, calling accept()! This is finally starting to make a little more sense with the behaviour I'm seeing. It also turns out that Nginx can write() until the receive buffer on the socket is full, and that is when it's going to hit EAGAIN. I confirmed this using strace:
I believe that Nginx is using writev() to send the fastcgi headers, and using sendfile() to send chunks of the body, which comes from this function in the code. It certainly seems like this is starting to make a little sense. Looking at the PHP config, I wanted to see what the backlog value was because it's not something I ever recall setting. I found the following in our config:
; Set listen(2) backlog.
; Default Value: 511 (-1 on Linux, FreeBSD and OpenBSD)
;listen.backlog = 511
With a default value of -1, I was guessing that backlog might be cast to an unsigned int by the kernel, which it is ( source).
if ((unsigned int)backlog>somaxconn)
backlog=somaxconn;
You can also see that, as stated in the manual ( source), if the defined backlog value is greater than somaxconn, it will be silently truncated to that value. Because -1 will actually be 4294967295, by default, PHP will always set backlog to the value of somaxconn, effectively letting the system decide.
# cat /proc/sys/net/core/somaxconn
4096
This means we can have up to 4,096 connections pending in the backlog queue, which is quite a few!
The Theory
At this point, I believe something like the following is happening during the lifecycle of the request.
Nginx makes the connect() syscall and begins writing to the socket.
While PHP is able, it processes the initial requests. We see a small number of requests succeeding.
PHP becomes overloaded and stops processing new, inbound requests.
Nginx continues to make new connections and fill the socket buffer with data.
Once the receive buffer is full, Nginx will receive EAGAIN and likely wait for something like fastcgi_send_timeout ( source).
The send will timeout and Nginx will close the connection.
PHP becomes available and accepts the connection, starting to read until the receive buffer has been consumed.
Once the buffer has been read, PHP will get EPIPE and process the content read from the buffer.
Sounds logical, right? It should also be easy enough to test this now that I know what I'm aiming for so I tinkered with some of the config parameters in Nginx and it did indeed seem fastcgi_send_timeout was the main factor. By reducing the send timeout, I would see more requests failing to parse in general, and I assume that's because Nginx is timing out sooner so PHP has less time to get to the backlog.
Testing The Theory!
Now for the fun part, where you actually find out if you were right, or you were completely off the mark! I was looking into ways that I could reliably reproduce this and my first effort was to take a few steps. I reduced the number of child processes PHP can spawn to one, so it would be easy to keep it busy, and I reduced the send timeout on Nginx to three seconds so it would time out very quickly. With this basic setup I could send two requests sequentially and the first one would succeed and the second one would reliably fail, 100% of the time. I wanted something better than that though, something that would allow me to trigger this on a single request, and that's when João Poupino dropped this little gem in Slack!
This gives me the ability to talk directly to the PHP socket, using the FastCGI protocol, and completely cut Nginx out of the equation. The script also has a feature that allows you to send only a partial request, where it will randomly close() the socket somewhere during the transmission of the payload. With this, it should now be possible to reproduce this problem with only a single request. The moment of truth....
python3 send-fastcgi-request.py unix:/run/php/php8.2-fpm.sock post.data --partial
[102935] Sending request 1/1
[102936] Connecting to unix:/run/php/php8.2-fpm.sock...
[102936] Sending 37 FastCGI records...
[102936] * Sent record 1/37 (8 + 8 bytes)
[102936] * Sent record 2/37 (8 + 512 bytes)
[102936] * Sent record 3/37 (8 + 0 bytes)
[102936] * Sent record 4/37 (8 + 32768 bytes)
[102936] * Sent record 5/37 (8 + 32768 bytes)
[102936] * Sent record 6/37 (8 + 32768 bytes)
[102936] * Sent record 7/37 (8 + 32768 bytes)
[102936] * Sent record 8/37 (8 + 32768 bytes)
[102936] * Sent record 9/37 (8 + 32768 bytes)
[102936] * Sent record 10/37 (8 + 32768 bytes)
[102936] * Sent record 11/37 (8 + 32768 bytes)
[102936] * Sent record 12/37 (8 + 32768 bytes)
[102936] * Sent record 13/37 (8 + 32768 bytes)
[102936] * Sent record 14/37 (8 + 32768 bytes)
[102936] * Sent record 15/37 (8 + 32768 bytes)
[102936] * Sent record 16/37 (8 + 32768 bytes)
[102936] * Sent record 17/37 (8 + 32768 bytes)
[102936] * Sent record 18/37 (8 + 32768 bytes)
[102936] * Sent record 19/37 (8 + 32768 bytes)
[102936] * Sent record 20/37 (8 + 32768 bytes)
[102936] * Sent record 21/37 (8 + 32768 bytes)
[102936] * Sent record 22/37 (8 + 32768 bytes)
[102936] * Sent record 23/37 (8 + 32768 bytes)
[102936] * Sent record 24/37 (8 + 32768 bytes)
[102936] Closing socket after record 24!
The script started to send my large POST payload and decided to close the socket after sending only 24 out of the 37 chunks of the request. Time to check syslog and see if we can find what I'm hoping for and expecting.
Oct 13 13:33:57 test-server php: POST data reading complete. Size: 688128 bytes (out of 1069923 bytes)
Oct 13 13:34:07 test-server php: JSON decode failed: 1069923 688128
Finally! 🍾🥂
It feels awesome to get to the bottom of something that's been bugging you so much! I fixed the issue a whole two blog posts ago, and I could have moved on, but sometimes, I really just need to understand why something is happening, and honestly, it was quite a fun journey getting here. I owe a big thanks to João and Ignat, who collectively must have saved me countless days of effort that I might not even have had time for.
I've also skipped over a few of the debugging steps above because this was becoming another lengthy blog post, but I did, during and after debugging, spend a lot of time trawling through the Nginx source to understand fully what's happening and also just to confirm my understanding.
All in all, I can now say I'm happy that I understand everything that was happening with this issue and can finally put it to bed!
As a final closing note, just to nicely close things off, I got a response on the PHP bug that I filed in the previous blog post and it seems that this finding is indeed a bug and will be patched!
I'm thoroughly pleased to be able to say that I finally understand the issue that's been bothering me on Report URI for a few weeks now, and this is the blog post that's going to explain everything!
Report URI
If you haven't read the first two blog posts related to this issue, it might help you to get some background on what I'm going to be talking about here:
Unravelling The Mystery Of Truncated POST Requests On Report URI
Processing Truncated Requests? A PHP Debugging Deep Dive
This weird behaviour of our application receiving and then trying to process truncated POST requests has lead me on quite a journey of learning, and I can finally explain what's happening!
The Story So Far
In the previous blog posts, I'd narrowed it down to what was happening, but not why it was happening, and I was still stumped. After patching PHP, I could see it was calling
read()
on the socket and reading until0
was returned, but0
was being returned before expected. Here's the log of the behaviour from the previous blog post.I knew that Nginx was receiving the full request, so the problem was happening in the
fastcgi_pass
to PHP-FPM, but I couldn't wrap my head around the circumstances. I wanted to understand more on the underlying mechanics of how Nginx was talking to PHP and, more specifically, what was failing in that process.bpftrace
In the last blog I had to get familiar with
strace
for the first time, and this time around, I'd asked for some assistance from Ignat Korchagin who kindly whipped up severalbpftrace
commands, and, could also reliably reproduce the issue. Hopefully, by monitoring things a little more closely between Nginx and PHP, I could gain an understanding of precisely what was happening.After monitoring my test server when the issue was happening, I could see Nginx hitting
EAGAIN
( source.-,EAGAIN,-Resource%20temporarily%20unavailable)) when trying to write to the socket, and I could see PHP gettingEPIPE
( source when trying to read from the socket, but it still wasn't clear to me how that was occurring. TheEAGAIN
error is likely because the socket buffer is full when Nginx is sending, so it can't send any more data, and theEPIPE
comes when PHP tries to read from the socket that has since been closed by Nginx. Here's the command.$ bpftrace -e 'kretfunc:unix_stream_sendmsg /retval < 0/ { printf("%d\n", retval); }' Attaching 1 probe... -11 -11 [-11 repeated 20x] -11 -11 -32 -32 [-32 repeated 12x] -32 -32
The log reads like Nginx is trying to write to the socket which has a full buffer, and PHP is trying to read from a socket that's closed, but if PHP has called
accept()
, why would it not be able toread()
? Theaccept()
call is followed byread()
calls in the PHP source almost immediately so it didn't make sense. I also saw 12xEPIPE (-32)
errors coming from PHP and I had exactly 12x truncated payload errors insyslog
using the same logging from the previous blog post.Digging a little deeper, I wanted to see when the calls to
connect()
andaccept()
were being made.$ bpftrace -e 'tracepoint:syscalls:sys_enter_connect /comm == "nginx"/ { printf("connect: %s\n", strftime("%H:%M:%S:%f", nsecs)); } tracepoint:syscalls:sys_enter_accept /comm == "php-fpm8.2"/ { printf("accept: %s\n", strftime("%H:%M:%S:%f", nsecs)); }'
After firing another batch of 20 requests, I could see all of the
connect()
calls from nginx happen right away, but theaccept()
calls from PHP were much more spaced out.I still wasn't clear on how this was working because PHP is calling
accept()
andread()
very close together, so if PHP can accept, why is the read failing with a partial payload?!The truncated payload issued showing in
syslog
matched up to the lateaccept()
calls from PHP! It was at this point I was prompted to look at thebacklog
parameter oflisten()
and it became clear that this behaviour can be explained bybacklog
, something I wasn't familiar with.int listen(int sockfd , int backlog );
When calling
listen()
, the second parameter provided isbacklog
, which is described as follows in the docs:So now we can have an application, like Nginx, call
connect()
and succeed but without the application on the other side, PHP, callingaccept()
! This is finally starting to make a little more sense with the behaviour I'm seeing. It also turns out that Nginx canwrite()
until the receive buffer on the socket is full, and that is when it's going to hitEAGAIN
. I confirmed this usingstrace
:I believe that Nginx is using
writev()
to send thefastcgi
headers, and usingsendfile()
to send chunks of the body, which comes from this function in the code. It certainly seems like this is starting to make a little sense. Looking at the PHP config, I wanted to see what the backlog value was because it's not something I ever recall setting. I found the following in our config:With a default value of
-1
, I was guessing thatbacklog
might be cast to an unsigned int by the kernel, which it is ( source).You can also see that, as stated in the manual ( source), if the defined
backlog
value is greater thansomaxconn
, it will be silently truncated to that value. Because-1
will actually be4294967295
, by default, PHP will always set backlog to the value ofsomaxconn
, effectively letting the system decide.# cat /proc/sys/net/core/somaxconn 4096
This means we can have up to 4,096 connections pending in the backlog queue, which is quite a few!
The Theory
At this point, I believe something like the following is happening during the lifecycle of the request.
connect()
syscall and begins writing to the socket.EAGAIN
and likely wait for something likefastcgi_send_timeout
( source).EPIPE
and process the content read from the buffer.Sounds logical, right? It should also be easy enough to test this now that I know what I'm aiming for so I tinkered with some of the config parameters in Nginx and it did indeed seem
fastcgi_send_timeout
was the main factor. By reducing the send timeout, I would see more requests failing to parse in general, and I assume that's because Nginx is timing out sooner so PHP has less time to get to thebacklog
.Testing The Theory!
Now for the fun part, where you actually find out if you were right, or you were completely off the mark! I was looking into ways that I could reliably reproduce this and my first effort was to take a few steps. I reduced the number of child processes PHP can spawn to one, so it would be easy to keep it busy, and I reduced the send timeout on Nginx to three seconds so it would time out very quickly. With this basic setup I could send two requests sequentially and the first one would succeed and the second one would reliably fail, 100% of the time. I wanted something better than that though, something that would allow me to trigger this on a single request, and that's when João Poupino dropped this little gem in Slack!
This gives me the ability to talk directly to the PHP socket, using the FastCGI protocol, and completely cut Nginx out of the equation. The script also has a feature that allows you to send only a partial request, where it will randomly
close()
the socket somewhere during the transmission of the payload. With this, it should now be possible to reproduce this problem with only a single request. The moment of truth....The script started to send my large POST payload and decided to close the socket after sending only 24 out of the 37 chunks of the request. Time to check
syslog
and see if we can find what I'm hoping for and expecting.Finally! 🍾🥂
It feels awesome to get to the bottom of something that's been bugging you so much! I fixed the issue a whole two blog posts ago, and I could have moved on, but sometimes, I really just need to understand why something is happening, and honestly, it was quite a fun journey getting here. I owe a big thanks to João and Ignat, who collectively must have saved me countless days of effort that I might not even have had time for.
I've also skipped over a few of the debugging steps above because this was becoming another lengthy blog post, but I did, during and after debugging, spend a lot of time trawling through the Nginx source to understand fully what's happening and also just to confirm my understanding.
The Nginx behaviour of using
writev()
andsendfile()
can be seen here: https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/os/unix/ngx_linux_sendfile_chain.c#L50Nginx adds a timeout when receiving
EAGAIN
here, usingfastcgi_send_timeout
. I also noticed in the Nginx source that it makes use ofepoll()
, which explains earlier why I was only seeing a singleEAGAIN
per request: https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/http/ngx_http_upstream.c#L2093The event timer expires in Nginx here: https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/event/ngx_event_timer.c#L54
I also spotted that PHP was respawning the child processes when I was testing and that they'd always have new PIDs after a test run. It seems that if the child is idle for too long, PHP will kill and respawn it, possibly based on this timeout value: https://github.com/php/php-src/blob/47c6b3bd452d2932af5c2f021b10ab2aaed01fb1/sapi/fpm/fpm/fpm_conf.c#L622
All in all, I can now say I'm happy that I understand everything that was happening with this issue and can finally put it to bed!
As a final closing note, just to nicely close things off, I got a response on the PHP bug that I filed in the previous blog post and it seems that this finding is indeed a bug and will be patched!
https://scotthelme.co.uk/sockets-under-the-hood/
The text was updated successfully, but these errors were encountered: