-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blog and wiki keep going down #163
Comments
Might help with whatwg/misc-server#163.
Could it be a StatusCake issue? Is it easy to catch them being down? I just got an email about wiki being down and tested it immediately, but it loaded for me. Maybe the response time is too long or something? |
I've caught them down a few times, but it's possible the problem is less serious than StatusCake makes it appear, hmm. |
When you've seen them down, has trying to reload fixed the problem, or has it been down for minutes at a time? I'm thinking that maybe this is a warmup problem. Maybe instances are killed or somehow frozen when there hasn't been traffic for a while. This is how AppEngine behaves at least, although it's not the same kind of architecture so it's not exactly the same for sure. |
Hmm, it was reported down in whatwg/blog.whatwg.org#12, but works for me now. @domenic when we last met you said the blog and wiki have stopped going down, but I wonder if really it's just the monitoring behavior that has changed...? |
https://blog.whatwg.org/ has been down since (at least) yesterday, FWIW. |
I've kicked the control panel again :( I don't think this is a warmup or monitoring problem. I think this is either:
I think the next step here would be to try setting up the same simple Docker image on another service provider (e.g. AWS), and pointing blog.whatwg.org to that deployment for a few months, and seeing if it's better. That would narrow down whether it's (1) or (2). |
It’s down again. |
FYI, it's down again. (Let me know if this is not the best place to post alerts.) |
So I narrowed down this problem to something about the CDN fronting the blog and wiki. Right now the blog at https://blog.whatwg.org/ is down. However the deployment URL https://blog-6tqz3.ondigitalocean.app/ is up. And all the logs show 200 requests to the internal URL. I am going to try contacting DigitalOcean support since this seems like it is not our problem. (I.e., it is not the server being overloaded because we don't have enough caching, or something like that.) |
DigitalOcean support has been unhelpful both times I tried pointing them at live outages. Today I was pointed to https://fly.io/ which seems really promising?? Maybe we should try switching to that. We can switch just the sites at first, having them connect to the existing DigitalOcean database, and if that works then switch the database too. |
According to StatusCake these are constantly going down. Blog seems a bit worse than wiki.
According to the DigitalOcean logs everything is fine. They're getting a bit more traffic than normal, maybe 10-20 requests per minute, but all responses are 200s supposedly. Blog is at about 40% RAM usage and 20% CPU usage; wiki is at about 30% RAM and 60% CPU usage; and the shared database server is at at about 68% RAM and 12% CPU usage.
There was a major spike in incoming connections and CPU/RAM usage last night around 19:11 Eastern Time, but the outages started getting bad around 17:26 Eastern Time so I'm not sure if it's related.
My best hypothesis is that either DigitalOcean sucks, or something about our setup sucks, and can't handle this much traffic.
Potential ideas:
The text was updated successfully, but these errors were encountered: