From 9d09298f640251853638bbe0aafd3853bf9a766a Mon Sep 17 00:00:00 2001 From: Elizabeth DuPre Date: Wed, 12 Jun 2024 15:37:26 -0400 Subject: [PATCH] Address @effigies review comments --- _quarto.yml | 1 - .../sherlock/access-and-resources.md | 64 +++++++------------ .../computing/sherlock/data-management.md | 36 +++++++++-- labguide/computing/tacc.md | 0 4 files changed, 53 insertions(+), 48 deletions(-) delete mode 100644 labguide/computing/tacc.md diff --git a/_quarto.yml b/_quarto.yml index c9bc9f3..f7ad0d5 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -46,7 +46,6 @@ website: - labguide/computing/sherlock/access-and-resources.md - labguide/computing/sherlock/job-submission.md - labguide/computing/sherlock/data-management.md - #- labguide/computing/tacc.md - section: "Research practices" contents: - labguide/research/human_subjects.md diff --git a/labguide/computing/sherlock/access-and-resources.md b/labguide/computing/sherlock/access-and-resources.md index 04b1532..6ed9aee 100644 --- a/labguide/computing/sherlock/access-and-resources.md +++ b/labguide/computing/sherlock/access-and-resources.md @@ -1,12 +1,17 @@ ## Access and resources +This section describes getting initial access to Sherlock and monitoring available resources. + ### Compute basics #### Acquiring an account and logging in -If you are a new member of a lab at stanford, you will need to have your PI email Sherlock's support to get your SUNet account configured for use with computing resources. See the [Sherlock getting started guide](https://www.sherlock.stanford.edu/docs/getting-started/#prerequisites) for details. +If you are a new member of a lab at stanford, you will need to have your PI email Sherlock's support to get your SUNet account configured for use with computing resources. +See the [Sherlock getting started guide](https://www.sherlock.stanford.edu/docs/getting-started/#prerequisites) for details. -Once you have an account set up with your SUNet ID ``, you can access Sherlock via any SSH client client. If you are using a UNIX-like system (e.g., MacOS) and you are using terminal to connect to sherlock, a useful resource is to set up an ssh config file. You can do this by editing or creating the file `~/.ssh/config`, and adding the following lines: +Once you have an account set up with your SUNet ID ``, you can access Sherlock via any SSH client client. +If you are using a UNIX-like system (e.g., MacOS) and you are using terminal to connect to sherlock, a useful resource is to set up an ssh config file. +You can do this by editing or creating the file `~/.ssh/config`, and adding the following lines: ```{.default filename="~/.ssh/config"} Host sherlock @@ -25,11 +30,15 @@ and then follow the remainder of the instructions [Sherlock connection guide](ht #### Storage Monitoring -The Stanford filesystems have fixed allocations for individuals and groups. As such, it will be useful for you to be able to determine how much space you/the group have, so that you can optimally manage your resources. For extended details on storage with Sherlock, check out [Sherlock storage guide](https://www.sherlock.stanford.edu/docs/storage/#quotas-and-limits). +The Stanford filesystems have fixed allocations for individuals and groups. +As such, it will be useful for you to be able to determine how much space you/the group have, so that you can optimally manage your resources. +For extended details on storage with Sherlock, check out [Sherlock storage guide](https://www.sherlock.stanford.edu/docs/storage/#quotas-and-limits). -There are several commands that we find extremely useful for working on Sherlock. We will go over several of them. +There are several commands that we find extremely useful for working on Sherlock. +We will go over several of them. -Sherlock has fixed allocations for the storage of individuals and groups. As such, you will be required to properly manage your storage allocations, re-allocating data to group-level directories as necessary. +Sherlock has fixed allocations for the storage of individuals and groups. +As such, you will be required to properly manage your storage allocations, re-allocating data to group-level directories as necessary. To check your quotas for your group ``, you can use the `sh_quota` command: @@ -48,50 +57,25 @@ $ sh_quota +---------------------------------------------------------------------------+ ``` -When your home directory begins to get filled, it may be valuable to consider moving files to scratch directories, or group directories. `HOME`, `GROUP_HOME`, and `OAK` are persistent storage; `*SCRATCH` directories are subject to purging. +`sh_quota` is a Sherlock-specific command that provides a general overview for all partitions a user has access to. +[Documentation is provided on their wiki](https://www.sherlock.stanford.edu/docs/storage/?h=sh_quota#checking-quotas). +When your home directory begins to get filled, it may be valuable to consider moving files to scratch directories, or group directories. +`HOME`, `GROUP_HOME`, and `OAK` are persistent storage; `*SCRATCH` directories are subject to purging. -Another useful tool is the disk usage command `du`. A useful and more interacrtive version of this command is `ncdu`. To use `ncdu`, add the following line to the bottom of your `~/.bash_profile`, which will load the `ncdu` module each time you log in to Sherlock: +Another useful tool is the disk usage command `du`. +A useful and more interacrtive version of this command is `ncdu`. +To use `ncdu`, add the following line to the bottom of your `~/.bash_profile`, which will load the `ncdu` module each time you log in to Sherlock: ```bash $ ml system ncdu ``` -Next, re log-in and access the `ncdu` command: +In future login session, you can access the `ncdu` command via ```bash $ ncdu ``` which will launch an interactive window for monitoring directory sizes from the folder specified by ``. - -### Data access - -#### Restricting access - -Some data resources cannot be shared across the lab and instead need to be restricted to lab members with Data Usage Agreement (DUA) access. -The following can be adapted to restrict ACLs (access control list) to only the appropriate subset of lab members: - -```{.bash filename="protect_access.sh"} -#!/bin/bash - -echo "Using ACLs to restrict folder access on oak for russpold folders" -echo -e "\t https://www.sherlock.stanford.edu/docs/storage/data-sharing/#posix-acls " -sleep 1 -echo -# get user input for directory + user -read -p "Enter the folder path: " dir_name -if [ ! -d "$dir_name" ]; then - echo "Error: ${dir_name} doesn't exist" - exit 1 -fi - -read -p "Enter the username: " user_name - -# set restrictions -echo -e "Setting restrictions for ${user_name} as rxw for folder: /n ${dir_name}" -setfacl -R -m u:$user_name:rwx $dir_name -setfacl -R -d -m u:$user_name:rwx $dir_name - -# rm default permissions for the group -- oak_russpold -setfacl -m d::group:oak_russpold:--- $dir_name -``` +Sherlock [recommends running it in an interactive job](https://www.sherlock.stanford.edu/docs/storage/?h=ncdu#locating-large-directories). +This can be useful when identifying where quota usage is being allocated. diff --git a/labguide/computing/sherlock/data-management.md b/labguide/computing/sherlock/data-management.md index 66e79aa..ea7520e 100644 --- a/labguide/computing/sherlock/data-management.md +++ b/labguide/computing/sherlock/data-management.md @@ -1,5 +1,7 @@ ## Data management on Sherlock +This section describes how the lab manages datasets on Sherlock, including setting permissions (i.e., who else in the lab can access the dataset). + Datasets that are considered to be common lab assets (which includes any new studies within the lab and any openly shared datasets) should be placed into the primary data directory on the relevant filesystem. Datasets that are in process of acquisition should go into the “inprocess” directory. Once the dataset is finalized, it should be moved into the “data” directory. @@ -13,12 +15,32 @@ find -type f -exec chmod 440 Datasets that are temporary, or files generated for analyses that are not intended to be reused or shared, should be placed within the user directory. -### Checking current quota limits +#### Restricting access + +Some data resources cannot be shared across the lab and instead need to be restricted to lab members with Data Usage Agreement (DUA) access. +The following can be adapted to restrict ACLs (access control list) to only the appropriate subset of lab members: + +```{.bash filename="protect_access.sh"} +#!/bin/bash + +echo "Using ACLs to restrict folder access on oak for russpold folders" +echo -e "\t https://www.sherlock.stanford.edu/docs/storage/data-sharing/#posix-acls " +sleep 1 +echo +# get user input for directory + user +read -p "Enter the folder path: " dir_name +if [ ! -d "$dir_name" ]; then + echo "Error: ${dir_name} doesn't exist" + exit 1 +fi + +read -p "Enter the username: " user_name -There are several useful commands for assessing current storage usage on Sherlock: +# set restrictions +echo -e "Setting restrictions for ${user_name} as rxw for folder: /n ${dir_name}" +setfacl -R -m u:$user_name:rwx $dir_name +setfacl -R -d -m u:$user_name:rwx $dir_name -- `sh_quota` is a Sherlock-specific command that provides a general overview for all partitions a user has access to. - [Documentation is provided on their wiki](https://www.sherlock.stanford.edu/docs/storage/?h=sh_quota#checking-quotas). -- `ncdu` provides an interactive explorer for current storage usage within a given directory. - This can be useful when identifying where quota usage is being allocated. - Sherlock [recommends running it in an interactive job](https://www.sherlock.stanford.edu/docs/storage/?h=ncdu#locating-large-directories). +# rm default permissions for the group -- oak_russpold +setfacl -m d::group:oak_russpold:--- $dir_name +``` \ No newline at end of file diff --git a/labguide/computing/tacc.md b/labguide/computing/tacc.md deleted file mode 100644 index e69de29..0000000