Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hosting: Move from github pages #57

Closed
cben opened this issue Oct 3, 2014 · 20 comments
Closed

Hosting: Move from github pages #57

cben opened this issue Oct 3, 2014 · 20 comments

Comments

@cben
Copy link
Owner

cben commented Oct 3, 2014

It's becoming untenable to deploy via github pages with submodules but no build step and server logic.

My ideal would be something that can run any commit from anybody's fork.
Ideally, user should be able to switch code version via cookie without changing document URLs.
Think Rawgit + build transforms (like https://github.com/jesusabdullah/browserify-cdn) + containerized execution of server code.
The last part is hardest. I think the practical approach is layered:

  1. Host server-side code(s) on Heroku - easy for deploy via Heroku Button.
  2. Generic firepad->HTML gateway, with parametrized firepads location and output template - so you can use on anybody's firebase, and switch surrounding design & javascript without re-launching the gateway. As long as firepad format is stable, you don't really need to fork the gateway. (famous non-last words?)
  3. Generic hosted pandoc, allowing client-specified markdown+foo-bar dialects and filters (either server->json->client javascript->json->server, or Lua filters).
  4. Hosted Javascript build, preferably on-demand a-la browserify-cdn but could also be via commit hook->CI->storage/CDN. Perhaps "deploy" builds to a derived Git repo that can be consumed as a submodule (whatever people say I like submodules).
@cben
Copy link
Owner Author

cben commented Oct 3, 2014

However, it's possible to just move now to Heroku, then improve piece-wise.

The main mental block is I'm loath to lose Rawgit ability to link to any version, but Rawgit doesn't (yet) work on mathdown due to submodules anyway.
(Also, I've been lazy to use any deploy beyond git push, but my push-without-testing license shoud be revoked anyway, and git push heroku is just as easy.)

@cben cben mentioned this issue Jan 25, 2015
3 tasks
@cben
Copy link
Owner Author

cben commented Jan 26, 2015

In progress: now works on https://mathdown.herokuapp.com and https://mathdown-cben.rhcloud.com.
Going to use Heroku for now (but keep ability to run off GH Pages)

@cben
Copy link
Owner Author

cben commented Jan 29, 2015

Problem: Heroku's free 1-dyno per app idles when it gets 0 traffic for a hour.
It wakes up automatically but it takes about 10 seconds:

320 ms First Byte Time http://www.webpagetest.org/performance_optimization.php?test=150129_3E_R06&run=1&cached=0 (running)
vs
12621 ms First Byte Time http://www.webpagetest.org/performance_optimization.php?test=150128_R0_Q9E&run=1&cached=0 (sleeping)

  • Optimizing startup time is out of my hands.
    npm start to first byte takes 500~600ms on my machine, according to:

    $ (env PORT=8001 npm start & while ! curl 'http://localhost:8001/?doc=demo'; do :; done) |& ts -s '%H:%M:%S%.S'
    

    (using ts from moreutils. Ctrl+C after it's done.)

    I suppose a smaller slug might help but ~10sec heroku wakeup is quoted widely so I don't expect there is much to win.

  • Paying: $35/mo for a dyno would give me 2 dynos and neither would idle.
    But together with $20/mo for custom-domain SSL endpoint it's quite pricey...

=> I'm thinking moving off Heroku to a docker setup on a Digital Ocean box. $5~10 should be enough, including SSL.

@cben
Copy link
Owner Author

cben commented Feb 9, 2015

Turns out the "free" Openshift plan also idles, and waking up felt way slower than 10sec.
The "bronze" plan (credit card but starts from $0) never idles [https://www.openshift.com/products/pricing].
Too bad their UI refuses to let me upgrade to bronze...

@cben
Copy link
Owner Author

cben commented Feb 9, 2015

Yay, bronze plan worked now.

@cben
Copy link
Owner Author

cben commented Feb 10, 2015

@cben
Copy link
Owner Author

cben commented Feb 13, 2015

Uh-oh: warning in Openshift console:
Warning: Gear 546a6d7e5973cac907000028 is using 95.6% of inodes allowed

@cben
Copy link
Owner Author

cben commented Feb 13, 2015

Heroku broke: lots of 404 - apparently for everything in submodules.
I don't remember explicitly pushing to Heroku since it worked, so suspect Heroku's auto-pull from gh-pages...

@cben
Copy link
Owner Author

cben commented Feb 14, 2015

=> Yes, git push to heroku works, auto sync from github breaks submodules.
=> Pushed manually and https://mathdown.herokuapp.com works now; disabled auto sync.

Heroku doesn't exactly document this, but https://devcenter.heroku.com/articles/git-submodules says:

Using submodules for builds on Heroku is only supported for builds triggered with git pushes. Builds created with the API do not resolve submodules.

I just made an unrelated commit to github, which triggered this in heroku logs:

2015-02-14T22:19:58.003351+00:00 heroku[api]: Deploy ebfefd5 by [email protected]
2015-02-14T22:19:58.003351+00:00 heroku[api]: Release v19 created by [email protected]

so it's plausible that "Builds created with the API do not resolve submodules." applies...
Although then I did a git push to Heroku, which also looks like "heroku[api]" in the logs.

A direct manual push prints this to my terminal:

remote: Git submodules detected, installing:
remote: Submodule path 'CodeMirror': checked out 'f65d50622518b9248aabaff96b3846fc7f32cd89'
remote: Submodule path 'CodeMirror-MathJax': checked out 'c56e40c3bc37df538db90db367e3ea86b1965c4b'
remote: Submodule path 'CodeMirror-MathJax/CodeMirror': checked out '50cf44c40c339db5084d79b24b973771535e0cd8'
remote: Submodule path 'MathJax': checked out 'a4b82c521efc0f1845533bb85c7e41477fd67c42'
remote: Submodule path 'firebase': checked out 'e7feeb9b49eb9794cc57de423919816259a45d5f'
remote: Submodule path 'firepad': checked out '452841ef3f04c15efd8e4a6c57e364cc6734fb8f'
remote: 
remote: Compressing source files... done.
remote: Building source:
remote: 
remote: -----> Node.js app detected
...

Sadly neither heroku logs nor logs in heroku's dashboard have a trace of resolving submodules.
"Build log" in the dashboard begins from the -----> Node.js app detected.
The only difference between Build log of auto sync from github vs Build log of git push to heroku is:

-----> Compressing... done, 5.7MB
-----> Launching... done, v19

vs

-----> Compressing... done, 29.8MB
-----> Launching... done, v20

P.S. This wouldn't happen if testing used immutable infrastructure in the Right way:
create new heroku app, run tests against it, if successful "deploy" by pointing to new app.
Actually heroku supports "forking" apps and I suspect there are solutions to do exactly this.

@cben
Copy link
Owner Author

cben commented Feb 15, 2015

Back to Openshift quota issue.
Following https://help.openshift.com/hc/en-us/articles/202399760-Disk-Quota-Exceeded-Now-What- :

[mathdown-cben.rhcloud.com 546a6d7e5973cac907000028]\> quota -s
Disk quotas for user 546a6d7e5973cac907000028 (uid 4040): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
/dev/mapper/EBSStore01-user_home01
                   617M       0   1024M           76453       0   80000        

So inodes are tight but overall space is also 60% used.

Notable, partially overlapping, parts of storage used:
360M app-root
257M app-deployments/2015-01-25_18-20-02.081
212M app-root/runtime/repo
212M app-deployments/2015-01-25_18-20-02.081/repo
170M app-deployments/2015-01-25_18-20-02.081/repo/MathJax
169M app-root/runtime/repo/MathJax
104M app-root/logs
45M app-root/runtime/dependencies/.npm
45M app-deployments/2015-01-25_18-20-02.081/dependencies/.npm

AFAICT app-root vs app-deployments/2015... are copies, not symlinks nor hardlinks.

app-root/logs contains miniscule haproxy logs and 10x10M nodejs.log* consiting of stdout/err from my server.js.
For unclear reason, my logs consist predominantly of GET /: (abridged sort | uniq -c | sort -n):

      1  GET /administrator/index.php
      1  GET /blog/
      1  GET /templates/cn/template.xml
      1  GET /user/
      1  GET /xmlrpc.php?rsd=1
      2  GET /wp-login.php
...
      2  GET /MathJax/jax/output/HTML-CSS/fonts/TeX/AMS/Regular/BBBold.js?rev=2.4-beta-2
      2  GET /?doc=demo
...
     14  GET /robots.txt
     15  GET /favicon.ico
...
     21  GET /MathJax/extensions/Safe.js?rev=2.4-beta-2
     27  GET /CodeMirror/addon/dialog/dialog.css
     27  GET /?doc=about
     28  GET /MathJax/config/TeX-AMS_HTML-full.js?rev=2.4-beta-2
    513  HEAD /
 446038  GET /

[Interesting. That's a LOT of hits for /, especially for a server not linked to almost anywhere on the net. Though the domain has existed for a long time.]

After running rhc app-tidy mathdown, went from 95.6% to 95.4% inodes :-(

> quota -s
Disk quotas for user 546a6d7e5973cac907000028 (uid 4040): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
/dev/mapper/EBSStore01-user_home01
                   513M       0   1024M           76292       0   80000        

@cben
Copy link
Owner Author

cben commented Feb 15, 2015

inodes issue split to #73

@cben
Copy link
Owner Author

cben commented Feb 15, 2015

RHcloud* Bronze plan lets me auto-scale to 16 gears. First 3 small gears are free but 13 more would cost me pretty $190/month.
=> I'm setting auto-scaling to max 5 gears. It unlikely I'll need more than 1 anyway.

*I've been saying "Openshift" interchageably till now but to be precise Openshift is the software (open source) and RHcloud is the hosted instance run by RedHat.

@cben
Copy link
Owner Author

cben commented Feb 15, 2015

Fixed inodes quota blocker, ready to flip mathdown.net to RHcloud! :-)
Will document & commit some things first.

@cben
Copy link
Owner Author

cben commented Feb 17, 2015

For the record: Heroku has a handy "[Production Check]" button on top right.
=> It basically says:

  • Running with a single web dyno will cause it to sleep after a period of inactivity. Increase to multiple web dynos to keep them running and avoid waiting for them to warm back up.
  • You're only running on 1 web dyno. A second dyno will provide instant fallback if one dyno fails for any reason.
  • No logging add-on found. Install a logging add-on such as Papertrail, or Logentries to monitor and search your logs.

cben added a commit that referenced this issue Feb 17, 2015
…d -some related files:

- DNS config downloaded from Cloudflare

- TLS certificates, the public parts
  (keeping to myself StartSSL.com browser-identifiying cert, and private key used for mathdown.{com,net} certs)

  ("Best practises" example for publicly committing certs:
  https://github.com/18F/tls-standards/tree/master/sites/)
@cben
Copy link
Owner Author

cben commented Feb 17, 2015

Load test (using Blitz addon via Heroku) shows that Heroku can sustain more load:
Heroku sustains up to 250 hits/sec with latency <30ms and zero errors: https://www.blitz.io/report/3f3d541ec0f1d2df0e0299fa0e580e4c
(though in other runs I've seen a few errors — closer to start actually)

while RHcloud gives goes into high latency and up to 50% errors around 170 users:
https://www.blitz.io/report/3f3d541ec0f1d2df0e0299fa0ed6aa18
For some reason the sharable links from Blitz only include subset of metrics, so here is a manually built plot from their .csv:
https://plot.ly/~cben/30

I don't expect to see that many users soon (though IIUC that's 250 HTTP requests, not 250 full page loads) but what bothers me:

  • No timeouts (used Blitz default of 1sec = timeout), only errors. And Blitz won't tell me which not sure which errors. It's actually good that the server starts refusing requests, but I'm curious how exactly it behaves, and what is returning these failures. IIUC server.js does nothing to refuse requests
    • During the test I ssh'd into the gear and saw cpu usage by node process go up to 25%.
      (And while it has 4 cores, it's not like 25% = one core — trying while(1) {} went to 96% cpu.)
      • I should get monitoring in place!
  • It's configured to scale 1–5 — did it try? Probably couldn't so fast?

=> OK, let's force scaling by configuring 2–5 gears?
=>> Yes, sustained 250 "users"! Latency up to 270ms but zero errors :-)

I'm going to keep 2–5 as it allows fast failover if one crashes. And because why not...

@cben
Copy link
Owner Author

cben commented Feb 18, 2015

Correction: not zero errors: TCP Connection timeout 3
And with 1 gear, there were: TCP Connection timeout 495
Heroku also had: TCP Connection timeout 1.

So now I know what kind of errors they were.
Actually when rare they're not really errors, just high latency arbitrarily going over Blitz's threshold.
[With 1 gear when there were hundreds of them, the latency would have been so high that for users it really felt as never load.]
Retries RHcloud (2 gears) with 5sec timeout => 5,978 hits with 0 errors & 0 timeouts :-)
https://www.blitz.io/report/9a569554c9bfc4668288240c3554550d
But now the latencies are not so fun: Fastest: 15 ms Slowest: 1,059 ms Average: 238 ms
Too bad Blitz doesn't show the full latency histogram.

The good news is that this exlains why I'm was not seeing any kind of errors in Haproxy status screen (which I'm not sure how to read but all "denied", "errors" etc. were 0).

@cben
Copy link
Owner Author

cben commented Feb 18, 2015

Pushed new version (minor server.coffee change) to Heroku and RHcloud.
Heroku took 44s: 6sec git(?), 10s checkout submodules, 5s ?, 10s build, 6s compress slug, 5s launch.
RHcloud took 06m17s :-( :

00:00:06 remote: Stopping NodeJS cartridge        
00:00:07 remote: Wed Feb 18 2015 14:28:03 GMT-0500 (EST): Stopping application 'mathdown' ...        
00:00:08 remote: Wed Feb 18 2015 14:28:04 GMT-0500 (EST): Stopped Node application 'mathdown'        
00:01:01 remote: Syncing git content to other proxy gears        
00:01:01 remote: Saving away previously installed Node modules        
00:02:40 remote: Building git ref 'master', commit 74322aa        
00:02:40 remote: Building NodeJS cartridge        
00:02:44 remote: npm info it worked if it ends with ok        
00:02:44 remote: npm info using [email protected]        
00:02:44 remote: npm info using [email protected]        
00:02:44 remote: npm info preinstall [email protected]        
00:02:44 remote: npm info build /var/lib/openshift/546a6d7e5973cac907000028/app-root/runtime/repo        
00:02:44 remote: npm info linkStuff [email protected]        
00:02:44 remote: npm info install [email protected]        
00:02:44 remote: npm info postinstall [email protected]        
00:02:44 remote: npm info prepublish [email protected]        
00:02:46 remote: npm info ok         
00:02:46 remote: Preparing build for deployment        
00:05:12 remote: Deployment id is 1dc0dc02        
00:05:12 remote: Distributing deployment to child gears        
00:05:45 remote: Activating deployment        
00:05:53 remote: HAProxy already running        
00:05:53 remote: HAProxy instance is started        
00:05:53 remote: Starting NodeJS cartridge        
00:05:53 remote: Wed Feb 18 2015 14:33:49 GMT-0500 (EST): Starting application 'mathdown' ...        
00:06:16 remote: -------------------------        
00:06:16 remote: Git Post-Receive Result: success        
00:06:16 remote: Distribution status: success        
00:06:16 remote: Activation status: success        
00:06:16 remote: Deployment completed with status: success        
00:06:17 To ssh://[email protected]/~/git/mathdown.git/
00:06:17    2ba0200..74322aa  gh-pages -> master
00:06:17 0.02user 0.02system 6:16.99elapsed 0%CPU (0avgtext+0avgdata 6492maxresident)k
00:06:17 0inputs+128outputs (0major+1037minor)pagefaults 0swaps

It's not very clear why it's so slow, it just is... (e.g. what happened between 0:08–1:01?)
There are details but maybe out of date at http://www.openshift.org/documentation/openshift-pep-006-deploy.html#git-deployments-current-openshift-workflow
I suspect copying ~300M takes a lot of the time. Oh, well. I hope the future docker-based Openshift will be faster.

What's most striking is that it first stops the app, then builds, then starts — that's ~6min of downtime?!
But in reality I have 2 gears, and I see app responding during that — Haproxy shows that first local gear goes down (the one on same node with haproxy), only after it's up again 2nd gear is restarted (much faster).
There is a hot_deploy option which doesn't stop server. Alas, it doesn't restart it at all, which is not what I want.

Ah, there is option to move build outside the 1st gear by offloading it to a Jenkins cartridge. Don't think I want it.

@cben
Copy link
Owner Author

cben commented Feb 18, 2015

OK, I just flipped mathdown.net to RHcloud!

@cben
Copy link
Owner Author

cben commented Feb 20, 2015

Closing this issue as essentially done.
Opened new issues with deploy.rhcloud, deploy.heroku tags for needed improvements.

cben added a commit that referenced this issue Feb 20, 2015
Closes #6 and #57.  Opening separate issues for some followup work.
@cben cben removed the 4 - Working label Feb 20, 2015
@cben cben closed this as completed Feb 20, 2015
@cben
Copy link
Owner Author

cben commented Aug 3, 2015

Heroku has new pricing
https://blog.heroku.com/archives/2015/6/15/dynos-pricing-ga
TLDR: free apps must sleep 6h per day (sleep after 30min inactivity);
new $7/mo "hobby" 1 dyno never sleeps;
more dynos from $25/mo.

I've switched to the new free and reactivated the uptimerobot once-in-2h ping to see how it works.
(I'd be migrated to this anyway in August)
I'm likely to go $7/mo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant