You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users coming from laptops are used to knowing how much space is on their hard disk. Users coming from HPC environments are used to having quotas on their home directory size. Users on cloud JupyterHubs are confused about how much storage space they have in their home directories. This issue is not addressed by the 2i2c docs on data management.
In a similar vein, hub owners may not understand the technical implementation and economic implications of data storage in home directories. For example the LEAP executive committee is very concerned about data stored in user home directories becoming a cost liability. I have been trying to argue that this is not a major concern, but I don't have much evidence or documentation to back up this claim. My understanding, likely incorrect / incomplete, is that 2i2c home directories are stored on an NFS shared volume which is backed by persistent disk. I do no know the following:
whether the persistent disk is fixed-size or elastic
what happens if the disk fills up (this cased Pangeo hubs to crash in the past)
how much this costs relative to the total hub cost
What tools exists for moving data in and out of the hub?
Do quotas exist at a user level?
Is there a cap on total size of home directories on each hub?
If quotas are not enforced, how should communities manage this shared resource? How can we prevent a single user from abusing the shared NFS volume by storing a large amount of data?
Updates and actions
No response
The text was updated successfully, but these errors were encountered:
Definitely agreed - this is a missing piece of our "estimating costs, and considerations for costs" docs. We started with some improvements on that in #132 but that PR is missing much detail on user data (and data cost considerations in general).
@yuvipanda does the DataHub document this anywhere? If so maybe we could start with that as a base and modify as needed for the 2i2c docs? Then we can continue to improve it over time.
Or, if somebody wants to throw down a few quick bullet points I am happy to turn them into docs
Context
Users coming from laptops are used to knowing how much space is on their hard disk. Users coming from HPC environments are used to having quotas on their home directory size. Users on cloud JupyterHubs are confused about how much storage space they have in their home directories. This issue is not addressed by the 2i2c docs on data management.
In a similar vein, hub owners may not understand the technical implementation and economic implications of data storage in home directories. For example the LEAP executive committee is very concerned about data stored in user home directories becoming a cost liability. I have been trying to argue that this is not a major concern, but I don't have much evidence or documentation to back up this claim. My understanding, likely incorrect / incomplete, is that 2i2c home directories are stored on an NFS shared volume which is backed by persistent disk. I do no know the following:
Proposal
We should augment the documentation at https://docs.2i2c.org/en/latest/admin/howto/data.html to address these issues. We should seek to provide answers to the following questions:
Updates and actions
No response
The text was updated successfully, but these errors were encountered: