Skip to content
This repository has been archived by the owner on Nov 14, 2018. It is now read-only.

RFC: Prevent data loss due to disk quotas #390

Open
tanmaykm opened this issue May 20, 2016 · 3 comments
Open

RFC: Prevent data loss due to disk quotas #390

tanmaykm opened this issue May 20, 2016 · 3 comments

Comments

@tanmaykm
Copy link
Member

There's a subtle bug in JuliaBox that can cause loss of data when the user's disk storage is near the alotted quota.

JuliaBox today needs to create and/or update certain files in the user's home folder:

  • .bashrc to setup certain paths and aliases for Julia
  • .gitconfig, auto generated for the user
  • .ssh, auto generated for the user
  • .ipython to setup IPython kernels and certain options to make it work correctly on JuliaBox. Can also contain IPython log files.
  • .juliabox to setup JuliaBox configuration files. Can also contain JuliaBox log files.
  • the Julia tutorial link and associated notebooks

To restore the user's data:

  1. a blank disk is first primed with the above files
  2. data from user's backup are applied on it
  3. some files are updated (I think only .bashrc)

Steps 2 and 3 above can fail if the storage used in step 1 is more than what it was when the user backed up their data last. That can happen between JuliaBox releases. And when the incompletely restored data is backed up, it overwrites the last good backup.

So, I think the primary reasons for this issue is JuliaBox having to share disk space with user data. Keeping more than one backup will help restore data, but that's an added safety feature. Below are a few thoughts to address this:

  1. Separate mounts for user data and user home (larger data volume at /data, smaller volume at /home/juser)
    • user home is not backed up
    • data disk gets backed up
    • link some essentials from user home to mounted volume (e.g. .bashrc, .ssh, ...)
    • 👍 Faster boot time, as /data can be restored in async
    • 👍 User (with appropriate privilege) can keep multiple disks and choose one to mount at run time
    • 👍 Going by this analogy EBS volumes can be mounted at /data/disk-1 and such
    • 👎 Inconvenience of having a small home folder
    • 👎 Can leave some file on the home folder and forget that it is ephemeral
  2. Separate mounts for user data and user home (larger data volume at /data, docker container filesystem for /home/juser)
    • similar to above, except that /home/juser is not a separate mount
    • 👍 simpler in operation as one less mount point
    • 👎 can't enforce a limit on home folder size
    • 👎 container filesystem is slower, so writing to home folder will be slower
  3. Allocate single disk of size (quota + (reserved space for JuliaBox use)), enforce quota someway
    • reserve large enough additional space for JuliaBox use
    • enforce by alerting user after periodic checks and during backup
    • 👎 no easy way to have the OS enforce limits
    • 👎 not clear how often should JuliaBox be monitoring space usage
    • 👎 not clear what to do if the user ignores quota messages or session gets disconnected
  4. Use linux quota with docker user namespaces
    • have a virtual uid on host system for each docker container, and map it to juser in the corresponding container
    • setup quota on host machine for the virtual uid to control allocation
    • 👍 simpler, no need to use loopback volumes
    • 👎 the real uid can become different on each log in, difficult to manage file permissions.

Approach 1 or 2 looks the best to me. Any other ideas are welcome.

tanmaykm added a commit to tanmaykm/JuliaBox that referenced this issue Jul 16, 2016
- move configuration and log files internal to JuliaBox out of user home directory
    - `/home/juser/.juliabox` is now relocated to `/opt/juliabox`
    - `/opt/juliabox` is mounted as a separate volume managed by JuliaBox
    - new volume plugin for configuration files handles the mount point
- do not update `.bashrc` anymore
    - this was required for Julia v0.3 package precompilation, which is now excluded.

The following files are still created in user home, only for a new user (or when they are missing):
- file link to Julia tutorials
- git config
- ssh keys

With this, JuliaBox will not need to touch the user home after restoring the backup unless there's a real need.

ref JuliaCloud#390
tanmaykm added a commit to tanmaykm/JuliaBox that referenced this issue Jul 17, 2016
- move configuration and log files internal to JuliaBox out of user home directory
    - `/home/juser/.juliabox` is now relocated to `/opt/juliabox`
    - `/opt/juliabox` is mounted as a separate volume managed by JuliaBox
    - new volume plugin for configuration files handles the mount point
- do not update `.bashrc` anymore
    - this was required for Julia v0.3 package precompilation, which is now excluded.

The following files are still created in user home, only for a new user (or when they are missing):
- file link to Julia tutorials
- git config
- ssh keys

With this, JuliaBox will not need to touch the user home after restoring the backup unless there's a real need.

ref JuliaCloud#390
@tanmaykm
Copy link
Member Author

#421 fixes this to a good extent. But the loopback disk plugin with object store backup still has practical size limitation.

Using GlusterFS with the JuliaBox hostdisk plugin could be a good solution for large amounts of reliable and quite fast data storage. I found it good and responsive when tried on a small test setup. GlusterFS also supports folder level quotas and user serviceable snapshots.

The AWS equivalent EFS is easier to provision and manage. It does not have snapshots or folder level quota and access is restricted from AWS-VPC.

It will be great to hear experiences from anyone who has used GlusterFS/EFS.

@tanmaykm
Copy link
Member Author

Summarizing the possible storage types:

disk type plugin attach IO cost size
local disk hostdisk fast fastest low large
object store (S3, GCS) loopback slow fastest low small
block store (EBS) vol_ebs slow fast high very large
network disk (GlusterFS, EFS) hostdisk fast fast high unlimited

GlusterFS/EFS can be mounted on multiple instances/containers (useful in sharing or running distributed applications).

@ViralBShah
Copy link
Contributor

cc @aviks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants