chown in /start -- is it necessary? #135

tilgovi · 2015-02-13T00:07:00Z

The chown in /start can take time and messes with the consistency of things like the dokku checks timeouts. My understanding is that it's necessary in case something else (like a dokku plugin) modifies the filesystem after the compile step.

Is my understand correct?
Is it necessary?
Is there a better way we can ensure those post-compile steps don't mess up permissions, such as by running them under the application user?
If not, could we deal with this in dokku explicitly as another step of generating the slug before deploy, so that it can be executed between hooks?

progrium · 2015-02-16T16:31:02Z

Take a look at the Herokuish branch and project. I'd like to have this conversation based on how it's doing things since it'll be merged in and replace a lot of buildstep soon.

mjonuschat · 2015-02-16T17:29:30Z

There's also a relation to #133 - chowning the dir helps with the mounting of data volumes and random users as the permissions on for the mounted directories get „fixed“ before the app starts as unprivileged user, at least for volumes mounted below /app

@tilgovi I've tried to replicate the impact on the checks plugin but haven't been able to create a fail due to a large app directory. Any indication what is needed to make the chown so slow that it has a noticeable impact? I tried with a dummy directory containing 131072 files in a 2 level hashed directory and even on my slowest drives it took less than a second.

michaelshobbs · 2015-02-16T18:13:29Z

@yabawock I've seen the issue almost solely under high resource utilization.

mjonuschat · 2015-02-16T20:58:52Z

I can't seem to trigger a situation where the chown is the sole culprit of the failing check. When torturing a system so that the chown takes more than a second to complete (iowait > 80%) starting the application without running chown also takes an extremely long time (enough to fail the checks on its own), at least with an application that's not just serving a static „Hello World“ page.

I'm pretty sure there is a usage mix that can trigger a failed check, but if that is „normal“ utilization of your host I'd consider removing the chown a worse stop-gap measure than increasing the wait/timeout values for the checks plugin.

@progrium herokuish doesn't do anything different, the chown is in procfile-setup-home().

progrium · 2015-02-17T05:49:20Z

Wouldn't your underlying filesystem have a significant role in this?

On Mon, Feb 16, 2015 at 2:58 PM, Morton Jonuschat [email protected]
wrote:

I can't seem to trigger a situation where the chown is the sole culprit
of the failing check. When torturing a system so that the chown takes
more than a second to complete (iowait > 80%) starting the application
without running chown also takes an extremely long time (enough to fail
the checks on its own), at least with an application that's not just
serving a static „Hello World“ page.

I'm pretty sure there is a usage mix that can trigger a failed check, but
if that is „normal“ utilization of your host I'd consider removing the
chown a worse stop-gap measure than increasing the wait/timeout values
for the checks plugin.

@progrium https://github.com/progrium herokuish doesn't do anything
different, the chown is in procfile-setup-home().

—
Reply to this email directly or view it on GitHub
#135 (comment).

Jeff Lindsay
http://progrium.com

mjonuschat · 2015-02-17T13:25:24Z

I've tried with both ext4 and btrfs. Maybe XFS (Redhat default?) is a possible candidate, there are contradicting reports, some say that XFS is slow for this kind of filesystem operation, others report it's faster than ext4

tilgovi · 2015-02-17T16:37:09Z

The box where I have this issue is actually very memory constrained. I don't know if that's the cause but I suspect so. There's a fair bit of swapping during this time. If it's not reasonable to change this behavior then that's totally fine. I just thought it would be worth discussing. So, reasons for slowness aside, what are the reasons we couldn't run the chown and checkpoint the result so that it's done already at deploy?

mjonuschat · 2015-02-17T20:39:19Z

Given the current way things work the chown is optional and mostly fixes modification that happened after the compile phase to the image/slug. As I already mentioned it also helps with volumes mounted below /app that can have permission problems due to the user account changing on each deploy.

Would the user change on every start of the application (as it does on Heroku) the chown wouldn't be optional anymore since the ownership of /app would need to be adjusted for the current user. At the moment this isn't feasible due to bugs in docker, but that might change in the future.

Weighting all pros and cons I would reason to keep the chown as I think using volumes for persistent storage below /app is currently much more common (although not in line with an ephemeral filesystem ) than a severely resource constrained box.

tilgovi · 2015-02-17T20:53:05Z

Any reason not to make that a distinct phase before deploy, though?

mjonuschat · 2015-02-17T21:26:38Z

The distinct phase before deploy would be something a container manager does, wouldn't it?

tilgovi · 2015-02-17T21:31:58Z

Yes. I should maybe refile this against dokku. But first we'd need to
separate the start script and the chown so the latter can be executed
separately, I think.

On Tue, Feb 17, 2015, 13:26 Morton Jonuschat [email protected]
wrote:

The distinct phase before deploy would be something a container manager
does, wouldn't it?

—
Reply to this email directly or view it on GitHub
#135 (comment).

mjonuschat · 2015-02-17T22:11:59Z

Please take into consideration that buildstep is rather generic and it's not just dokku using it. Even if there was only dokku your proposal seems more prone to errors than the current situation to me. To make a reliable chown of everything under /app at build time all data volumes would need to be mounted. But then you would get into the situation that an older version of the application could be running with account u12345 and appropriate file permissions on the data volumes. Now while building you change the permissions within the docker volume. Suddenly the running application using account u9876 can no longer access/write the persistent data and the new application is still being deployed.

I know that the same situation can occur now if you are using a dokku version with zero downtime deploys, but at least it will be fixed on application restart if a deployment error occurs.

When I originally implemented it, it was meant as a safeguard against „careless“ post processing of the slug and curtesy to docker volumes below /app. Currently it could be removed - in master as well as the herokuish branch - as both operate with the same user during build and run phase.

All in all it seems like a band aid for a problem with a different root cause to me. I am not opposed to delegating the responsibility for correct permissions to the container manager if any kind of modification happens to the slug after the build phase, even mounting a volume. But I fear it will do more harm than good - especially since full user randomization was originally in the herokuish branch and only got removed due to bugs in docker. So the chown might get to be a requirement in the future.

@progrium What's your take on the situation?

progrium · 2015-02-19T02:48:09Z

If it can be removed, we can take it out in herokuish (which should be landing here soon) and revisit it as issues around it come up again. I'm all for experiments like that, especially if they end up simplifying or removing code.

bymodude · 2016-03-05T02:31:06Z

I am deploying a node.js app with dokku and running into the problem that this chown step runs forever and uses high system resources/io and somehow gets stuck

I have 2 containers deploying the same code, one running as web and one running as worker, when they start up they are sometimes (strangely not always) stuck on the following:

root     11370  0.3  0.1  12884  1040 ?        D    02:20   0:00 chown -R u23590:u23590 /app
root     11522  0.2  0.1  12884  1048 ?        D    02:20   0:00 chown -R u23590:u23590 /app

vmstat

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1      0  68908  72208 555784    0    0   522   895  206  570  3  1  1 58 38
 3  0      0  64812  71876 559660    0    0  4224     0   52   94  0  1  0 98  1
 0  2      0  85064  70864 543220    0    0   768 20480   58   58  0  0  0 100  0
 0  2      0  84444  70864 543732    0    0   256  4080   45   23  0  0  0 100  0
 0  2      0  83948  70864 544244    0    0   256  4096   57   47  1  0  0 99  0

iotop shows these chown being the reason why the system is locked up:

4028 be/4 root      235.78 K/s  389.03 K/s  0.00 % 76.72 % [kworker/0:3]
   46 be/4 root      102.17 K/s  310.44 K/s  0.00 %  8.79 % [kworker/0:1]
11522 be/4 root       31.44 K/s    0.00 B/s  0.00 %  7.80 % chown -R u23590:u23590 /app
11370 be/4 root       23.58 K/s    0.00 B/s  0.00 %  6.12 % chown -R u23590:u23590 /app

looking at what files these chown processes are accessing it seems they are busy going through node_modules files

sudo lsof -p 11370
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
chown   21948 root  cwd    DIR   0,33     4096     2 /
chown   21948 root  rtd    DIR   0,33     4096     2 /
chown   21948 root  txt    REG   0,33    60160    85 /bin/chown
chown   21948 root  mem    REG   0,33             64 /lib/x86_64-linux-gnu/libnss_files-2.19.so (path dev=202,1, inode=401387)
chown   21948 root  mem    REG   0,33             62 /lib/x86_64-linux-gnu/libnss_nis-2.19.so (path dev=202,1, inode=401379)
chown   21948 root  mem    REG   0,33             60 /lib/x86_64-linux-gnu/libnsl-2.19.so (path dev=202,1, inode=401375)
chown   21948 root  mem    REG   0,33             58 /lib/x86_64-linux-gnu/libnss_compat-2.19.so (path dev=202,1, inode=401374)
chown   21948 root  mem    REG   0,33             51 /lib/x86_64-linux-gnu/libc-2.19.so (path dev=202,1, inode=401384)
chown   21948 root  mem    REG   0,33             42 /lib/x86_64-linux-gnu/ld-2.19.so (path dev=202,1, inode=401377)
chown   21948 root    0u   CHR    1,3      0t0 53490 /dev/null
chown   21948 root    1w  FIFO    0,8      0t0 53237 pipe
chown   21948 root    2w  FIFO    0,8      0t0 53238 pipe
chown   21948 root    3r   DIR   0,33     4096  9198 /app/node_modules/bower/lib/node_modules
chown   21948 root    5r   DIR   0,33    12288  9364 /app/node_modules/bower/lib/node_modules/lodash

dokku is running on a VPS and this consumes all io available - sometimes after 20 minutes or so its finally finished and resources are released, when I sudo docker exec -it container /bin/bash and do chown -R u23590:u23590 /app then it runs within 1 or 2 seconds as expected

Can I somehow disable this chown step if its not required?

Im using this .buildpacks

https://github.com/ddollar/heroku-buildpack-apt
https://github.com/heroku/heroku-buildpack-nodejs.git#v87
https://github.com/captain401/heroku-buildpack-xvfb

josegonzalez · 2016-03-07T16:46:08Z

Note: dokku no longer uses buildstep, so you are almost certainly running on an unsupported version of dokku.

bymodude · 2016-03-07T16:56:07Z

no Im running current version of dokku, but from your comment I realize it now uses herokuish which is doing the same chown step which takes forever in my case

is that an issue that should be filed over at herokuish then or over at dokku or is it something altogether that Im just out of luck with?
I would assume that any node.js deployment on dokku with many dependencies (30K files) in node_modules may run into similar problems when running on smaller VPS

$ sudo dokku version
0.4.14

josegonzalez · 2016-03-07T18:04:52Z

Related herokuish issue gliderlabs/herokuish#114 , which you may wish to comment upon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chown in /start -- is it necessary? #135

chown in /start -- is it necessary? #135

tilgovi commented Feb 13, 2015

progrium commented Feb 16, 2015

mjonuschat commented Feb 16, 2015

michaelshobbs commented Feb 16, 2015

mjonuschat commented Feb 16, 2015

progrium commented Feb 17, 2015

mjonuschat commented Feb 17, 2015

tilgovi commented Feb 17, 2015 via email

mjonuschat commented Feb 17, 2015

tilgovi commented Feb 17, 2015

mjonuschat commented Feb 17, 2015

tilgovi commented Feb 17, 2015

mjonuschat commented Feb 17, 2015

progrium commented Feb 19, 2015

bymodude commented Mar 5, 2016

josegonzalez commented Mar 7, 2016

bymodude commented Mar 7, 2016

josegonzalez commented Mar 7, 2016

chown in /start -- is it necessary? #135

chown in /start -- is it necessary? #135

Comments

tilgovi commented Feb 13, 2015

progrium commented Feb 16, 2015

mjonuschat commented Feb 16, 2015

michaelshobbs commented Feb 16, 2015

mjonuschat commented Feb 16, 2015

progrium commented Feb 17, 2015

mjonuschat commented Feb 17, 2015

tilgovi commented Feb 17, 2015 via email

mjonuschat commented Feb 17, 2015

tilgovi commented Feb 17, 2015

mjonuschat commented Feb 17, 2015

tilgovi commented Feb 17, 2015

mjonuschat commented Feb 17, 2015

progrium commented Feb 19, 2015

bymodude commented Mar 5, 2016

josegonzalez commented Mar 7, 2016

bymodude commented Mar 7, 2016

josegonzalez commented Mar 7, 2016