-
-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Bug Report: Website and other other uploads hang while an upload is finalizing #660
Comments
Are there any errors in the logs on the server? |
Hmm, sorry, but where's the backend log file? Not seeing it in the file system nor in the db. Edit - oh, within Docker. |
The only thing I see in the docker logs are premature closes, which appear to be from clients navigating away without finishing downloads. |
OK, thanks. Are you sure that this happens only if you upload files simultaneously, like you described? Could you share the response body of the errored request? You can get it from the network tab of the dev tools. |
Yes, I'm sure. Although if you were uploading a single file to two separate shares, either from two tabs or two computers, the second upload to finish would error while the first finalizes. I didn't capture the 400 error that happens after the 403 in the first screenshot. They seem to always come in pairs. {"message":"Cannot GET /api/shares/cyMTgzM/files?id=66133106-c592-446e-b1cf-ad2d50312b11&name=Tiffs.zip&chunkIndex=13&totalChunks=137","error":"Not Found","statusCode":404} {"message":"Cannot GET /api/shares/cyMTgzM/files?id=66133106-c592-446e-b1cf-ad2d50312b11&name=Tiffs.zip&chunkIndex=0&totalChunks=137","error":"Not Found","statusCode":404} |
Just to confirm, am I understanding correctly: if you upload the same file to two shares simultaneously, the second upload fails once the first one completes? Because I'm a bit confused from your reproduction steps:
If I'm wrong please explain the reproduction steps step-by-step or create a screen recording. |
It doesn't need to be the same file. The point about two shares is that whenever the server is finalizing a download of sufficient size from any source, any other uploads block/error. There may be some more relevant errors in the logs than I realized. Seems like deadlock on the sqlite db called through Prisma, though I don't really understand Node's threading model, or lack thereof. I will try to set up an environment and step through. Reproduction: Upload 1 1GB file and 1 5GB file. The 1GB file finishes first, and while finalizing produces an error in the second upload/hangs the site. |
Small update - The error isn't presenting with my mac as the host with the backend running in the VS Code debugger. It also doesn't present when I upload to my production host (docker, Ubuntu, HDDs) over wifi. That's leading me to think there is a synchronous IO operation happening when the first file upload completes that blocks long enough on a slow HDD for a fast network connection to overflow a buffer with the second file. Renaming from tmp_chunk causing a physical move? Calling .size requires a re-count? I'll keep experimenting and trying to repro on a different host (especially in case it has something to do with ZFS). Thanks for reading. Please let me know if you have any guidance as I'm highly unfamiliar with node.js and the rest of the stack. |
Thank you for sharing this detailed information. I just tested the upload on my Mac using an external HDD (formatted as APFS, not ZFS), and I didn’t encounter the error. Are you able to mount another HDD or SSD to your server? If so, you could try setting the Pingvin Share data folder to the mounted drive. This way, we could test if the hard drive might be causing the issue. You also mentioned an error related to an SQLite deadlock. Could you share more details about that error? I don’t have much experience with disks, so I’m not sure if a specific operation could cause issues on slower HDDs. But if we narrow down the problem and find that it only happens on slow HDDs, I’ll read into this topic. |
Ok, I'm back after an embarrassing number of hours figuring out how to get docker to run local code and change DATA_DIRECTORY and DATABASE_URL. Has the same idea and tested a slow USB hard drive on the Mac and it didn't reproduce the bug. What I could see on the Ubuntu host was prisma throwing timeout errors (is its default 5s?) while performing the zip, which I believe caused the first 403 error. So, the frontend tried sending chunk 0 again, which resulted in the 500 error, but an "Unexpected chunk received" error to be returned, and the expectedChunkIndex payload got it to send the right one next, which successfully resumed the upload. I tried changing docker-compose.yml and moving the pingvin-share.db to a separate drive from the share data, and that resolved the issue. I think the issue was ZFS batching writes and not allowing reads from pingvin-share.db until after prisma times out. It also seems like createZip could be problematic since the archiver calls are synchronous. I forgot to check whether the UI loading issue is also fixed while finalizing, as in is the server event loop blocked, or are IO's to pingvin-share.db just left awaiting until writes are finished? Again, don't really understand node that well, but something seems funny about createZip. I guess the question to ask is if you want pingvin share to support all IO behavior. ZFS isn't super common, but it's a first class citizen in Ubuntu. I'm sure there are others with strange behaviors. Plus whatever is used on cloud. Interested to hear your thoughts. By the way, another thing that was confusing was after updating the production repo to the newest release, I wasn't able to login over LAN, but still was over the internet. Turns out it's the secure cookies feature blocking access because the internet URL is HTTPS, but lan IPs are not. To workaround, on my fork I added an option to the config menu to require https connections (maybe I should just say require https for a cookie?). I guess not technically an issue since if I were more competent I could get LAN addresses with certificates. |
Apologies for the delayed response, your message got lost somehow. I don’t have much experience with different file systems, so I’m not sure I can provide a solid answer about the issue. That said, your theory seems plausible. If Like I said, I’m not very familiar with file systems, so this suggestion might not make sense. Let me know what you think. As for the secure cookies issue, this should be fixed in the latest version. |
👟 Reproduction steps
To trigger in one session, upload two files. Say 1 500MB zip and 1 5gb zip. When the 500MB zip is finished uploading and "finalizing", the other download errors, and resumes after the first is finished finalizing. Same result across a variety of chunks sizes.
👍 Expected behavior
Finalizing an upload should not block the web client nor interfere with ongoing uploads (I presume it blocks and overflows a buffer). Not a web programmer, but seems like something isn't async that should be.
👎 Actual Behavior
While an upload is finalizing, the site hangs for other clients, and when it finishes, you get an internal server error. It does not appear to interrupt ongoing downloads, fortunately.
📜 Logs
No response
The text was updated successfully, but these errors were encountered: