Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more information about home directory size #136

Open
rabernat opened this issue Apr 7, 2022 · 1 comment
Open

Provide more information about home directory size #136

rabernat opened this issue Apr 7, 2022 · 1 comment

Comments

@rabernat
Copy link
Contributor

rabernat commented Apr 7, 2022

Context

Users coming from laptops are used to knowing how much space is on their hard disk. Users coming from HPC environments are used to having quotas on their home directory size. Users on cloud JupyterHubs are confused about how much storage space they have in their home directories. This issue is not addressed by the 2i2c docs on data management.

In a similar vein, hub owners may not understand the technical implementation and economic implications of data storage in home directories. For example the LEAP executive committee is very concerned about data stored in user home directories becoming a cost liability. I have been trying to argue that this is not a major concern, but I don't have much evidence or documentation to back up this claim. My understanding, likely incorrect / incomplete, is that 2i2c home directories are stored on an NFS shared volume which is backed by persistent disk. I do no know the following:

  • whether the persistent disk is fixed-size or elastic
  • what happens if the disk fills up (this cased Pangeo hubs to crash in the past)
  • how much this costs relative to the total hub cost

Proposal

We should augment the documentation at https://docs.2i2c.org/en/latest/admin/howto/data.html to address these issues. We should seek to provide answers to the following questions:

  • How much data can users keep in their home directories?
  • When should users use home directories vs. object storage (see also Document object storage and scratch bucket #135)
  • What tools exists for moving data in and out of the hub?
  • Do quotas exist at a user level?
  • Is there a cap on total size of home directories on each hub?
  • If quotas are not enforced, how should communities manage this shared resource? How can we prevent a single user from abusing the shared NFS volume by storing a large amount of data?

Updates and actions

No response

@choldgraf
Copy link
Member

Definitely agreed - this is a missing piece of our "estimating costs, and considerations for costs" docs. We started with some improvements on that in #132 but that PR is missing much detail on user data (and data cost considerations in general).

@yuvipanda does the DataHub document this anywhere? If so maybe we could start with that as a base and modify as needed for the 2i2c docs? Then we can continue to improve it over time.

Or, if somebody wants to throw down a few quick bullet points I am happy to turn them into docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants