ocf · singingtelegram · May 8, 2023 · May 8, 2023 · May 9, 2023 · May 9, 2023
diff --git a/ocfweb/docs/docs/services/hpc.md b/ocfweb/docs/docs/services/hpc.md
@@ -39,16 +39,15 @@ where you can ask questions and talk to us about anything HPC.
 
 ## The Cluster
 
-As of Fall 2018, the OCF HPC cluster is composed of one server, with the
+As of Spring 2023, the OCF HPC cluster is composed of one server, with the
 following specifications:
 
-* 2 Intel Xeon [E5-2640v4][corruption-cpu] CPUs (10c/20t @ 2.4GHz)
-* 4 NVIDIA 1080Ti GPUs
-* 256GB ECC DDR4-2400 RAM
+* 2 Intel Xeon Platinum [8352Y][corruption-cpu] CPUs (32c/64t @ 2.2GHz)
+* 4 NVIDIA RTX A6000 GPUs
+* 256GB ECC DDR4-3200 RAM
 
 We have plans to expand the cluster with additional nodes of comparable
-specifications as funding becomes available. The current hardware was
-generously funded by a series of grants from the [Student Tech Fund][stf].
+specifications as funding becomes available.
 
 ## Slurm
 

diff --git a/ocfweb/docs/docs/services/web/php.md b/ocfweb/docs/docs/services/web/php.md
@@ -1,6 +1,6 @@
 [[!meta title="PHP"]]
 
-`death`, the OCF webserver, currently runs PHP 7.0 with the following
+`death`, the OCF webserver, currently runs PHP 7.4 with the following
 non-standard packages installed:
 
 * [APCu](https://www.php.net/manual/en/book.apcu.php) (opcode caching)

diff --git a/ocfweb/docs/docs/staff/backend/backups.md b/ocfweb/docs/docs/staff/backend/backups.md
@@ -1,89 +1,60 @@
 [[!meta title="Backups"]]
 ## Backup Storage
 
-We currently store our on-site backups across a couple drives on `hal`:
+We currently store our on-site backups across a ZFS RAID1 mirror on `hal`:
 
-* `hal:/opt/backups` (6 TiB usable; 2x 6-TiB Seagate drives in RAID 1 in an LVM
- volume group)
+* `hal:/backup` (16 TB usable; 2x 16 TB WD drives in ZFS mirror)
 
- This volume group provides `/dev/vg-backups/backups-live` which contains
- recent daily, weekly, and monthly backups, and
- `/dev/vg-backups/backups-scratch`, which is scratch space for holding
- compressed and encrypted backups which we then upload to off-site storage.
+Backups are stored as ZFS snapshots. ZFS snapshots have the advantage of being
+immutable, browsable, and they can be sent to other ZFS pools for off-site
+backups.
 
 ## Off-Site Backups
 
-Our main off-site backup location is [Box][box]. Students automatically get an
-"unlimited" plan, so it provides a nice and free location to store encrypted
-backups. We currently have a weekly cronjob that [makes an encrypted
-backup][create-encrypted-backup] using GPG keys and then [uploads it to
-Box.com][upload-to-box]. This takes about 20 hours combined to make and upload,
-and will probably take even longer in the future as backups grow. An email is
-sent out once the backup files are uploaded, and the link provided is shared
-with only OCF officers to make sure the backups are kept as secure as possible,
-since they contain all of the OCF's important data. The backups are already
-encrypted, but it doesn't hurt to add a little extra security to that.
-
-### Retention
-
-Off-site backups older than six months (180 days) are permanently deleted by a
-[daily cronjob][prune-old-backups].
+Todo: new off-site backup documentation.
 
 ## Restoring Backups
 
 The easiest way to restore from a backup is to look at how it is made and
 reverse it. If it is a directory specified in rsnapshot, then likely all that
 needs to be done is to take that directory from the backup and put it onto the
-server to restore onto. Some backups, such as mysql, ldap, and kerberos are
-more complicated, and need to be restored using `mysqlimport` or `ldapadd` for
+server to restore onto. Some backups, such as mysql, ldap, and kerberos are more
+complicated, and need to be restored using `mysqlimport` or `ldapadd` for
 instance.
 
 ### Onsite
 
-Onsite backups are pretty simple, all that needs to be done is to go to `hal`
-and find the backup to restore from in `/opt/backups/live`. All backups of
-recent data are found in either `rsnapshot` (for daily backups) or `misc` (for
-any incidents or one-off backups). Within `rsnapshot`, the backups are
-organized into directories dependings on how long ago the backup was made. To
-see when each backup was created just use `ls -l` to show the last modified
-time of each directory.
+Compared to the old setup, onsite backups are a little harder to find.They are
+located at `/backup/encrypted/rsnapshot` on `hal`. In addition, we have a
+dataset for each top-level user directory, such as `/home/a/`, which is stored
+as the `backup/encrypted/rsnapshot/.sync/nfs/opt/homes/home/a` dataset.
+
+The ZFS snapshots are stored in the `.zfs/snapshot` directory of each dataset.
+The `.zfs` folder is hidden and will not show up even with the `ls -a` command,
+so you will need to manually `cd` into the directory. The snapshots are time-
+stamped, so you can find the snapshot you want to restore from by looking at the
+date string in the snapshot name. For example, if you wanted to restore the
+`public_html` directory of user `foo` with the backup from 2023-05-01, you
+should enter the
+```
+/backup/encrypted/rsnapshot/.sync/nfs/opt/homes/services/http/users/f/.zfs
+```
+directory, and then go inside the `snapshot/` folder. From there, you enter the
+`zfs-auto-snap_after-backup-2023-05-01-1133/` directory (note that the time is
+UTC), and then you can copy the `foo/` directory to the user's home directory.
+
+For large directories, please use `/backup/encrypted/scratch` as a temporary
+working area for compressing the files and other operations. Please note that
+this dataset will not be automatically snapshotted.
+
+MySQL backups are stored at the `/backup/encrypted/rsnapshot/mysql/` directory,
+and the snapshots can be accessed at
+`/backup/encrypted/rsnapshot/mysql/.zfs/snapshot/`. Inside a snapshot, the
+individual databases are stored as `.sql` files inside the `.sync/` directory.
 
 ### Offsite
 
-Offsite backups are more complicated because the backup files first need to be
-downloaded, stuck together into a single file, decrypted, extracted, and then
-put into LVM to get back the whole backup archive that would normally be found
-onsite. This essentially just means that the
-[create-encrypted-backup][create-encrypted-backup] script needs to be reversed
-to restore once the backup files are downloaded. Here are the general steps to
-take to restore from an offsite backup:
-
-1. Download all the backup pieces from Box.com. This is generally easiest with
- a command line tool like `cadaver`, which can just use a `mget *` to download
- all the files (albeit sequentially). If more speed is needed, open multiple
- `cadaver` connections and download multiple groups of files at once.
-
-2. Put together all the backup pieces into a single file. This can be done by
- running `cat <backup>.img.gz.gpg.part* > <backup>.img.gz.gpg`.
-
-3. Decrypt the backup using `gpg`. This requires your key pair to be imported
- into `gpg` first using `gpg --import public_key.gpg` and
- `gpg --allow-secret-key-import --import private_key.gpg`, then you can
- decrypt the backup with
- `gpg --output <backup>.img.gz --decrypt <backup>.img.gz.gpg`. Be careful to
- keep your private key secure by setting good permissions on it so that nobody
- else can read it, and delete it after the backup is imported. The keys can be
- deleted with `gpg --delete-secret-keys "<Name>"` and
- `gpg --delete-key "<Name>"`, where your name is whatever name it shows when
- you run `gpg --list-keys`.
-
-4. Extract the backup with `gunzip <backup>.img.gz`.
-
-5. Put the backup image into a LVM logical volume. First find the size that the
- volume should be by running `ls -l <backup>.img`, and copy the number of
- bytes that outputs. Then create the LV with
- `sudo lvcreate -L <bytes>B -n <name> /dev/<volume group>` where the volume
- group has enough space to store the entire backup (2+ TiB).
+Todo: add instructions for restoring offsite backups using zfs send/receive.
 
 ## Backup Contents
 
@@ -93,33 +64,34 @@ Backups currently include:
  * User home and web directories
  * Cronjobs on supported servers (tsunami, supernova, biohazard, etc.)
 * MySQL databases (including user databases, stats, RT, print quotas, IRC data)
-* Everything on GitHub (probably very unnecessary)
+* A few OCF repositories on GitHub (probably very unnecessary)
 * LDAP and Kerberos data
 * A [smattering of random files on random servers][backed-up-files]
 
 ## Backup Procedures
 
 Backups are currently made daily via a cronjob on `hal` which calls `rsnapshot`.
-The current settings are to retain 7 daily backups, 4 weekly backups, and 6
-monthly backups, but we might adjust this as it takes more space or we get
-larger backup drives.
 
-We use `rsnapshot` to make incremental backups. Typically, each new backup
-takes an additional ~3GiB of space (but this will vary based on how many files
-actually changed). A full backup is about ~2TiB of space and growing.
+We use `rsnapshot` and ZFS snapshots to make incremental backups. Typically,
+each new backup takes an additional ~20GiB of space (but this will vary based on
+how many files actually changed). A full backup is about ~4TiB of space and
+growing.
 
-(The incremental file backups are only about ~300 MiB, but since mysqldump
-files can't be incrementally backed up, those take a whole ~2 GiB each time, so
-the total backup grows by ~3GiB each time. However, an old backup is discarded
-each time too, so it approximately breaks even.)
+(The incremental file backups are only about ~1-5 GiB, but since MySQL and
+Postgres files can't be incrementally backed up, those take a whole ~ 15 GiB
+each time, so the total backup grows by ~20GiB each time.)
 
 ## Ideas for backup improvements
 
 1. Automate backup testing, so have some system for periodically checking that
  backups can be restored from, whether they are offsite or onsite.
 
-[box]: https://www.box.com
-[create-encrypted-backup]: https://github.com/ocf/puppet/blob/master/modules/ocf_backups/files/create-encrypted-backup
-[upload-to-box]: https://github.com/ocf/puppet/blob/master/modules/ocf_backups/files/upload-to-box
-[backed-up-files]: https://github.com/ocf/puppet/blob/17bc94b395e254529d97c84fb044f76931439fd7/modules/ocf_backups/files/rsnapshot.conf#L53
-[prune-old-backups]: https://github.com/ocf/puppet/blob/master/modules/ocf_backups/files/prune-old-backups
+[rsyncnet]: https://www.rsync.net
+[create-encrypted-backup]:
+ https://github.com/ocf/puppet/blob/master/modules/ocf_backups/files/create-encrypted-backup
+[upload-to-box]:
+ https://github.com/ocf/puppet/blob/master/modules/ocf_backups/files/upload-to-box
+[backed-up-files]:
+ https://github.com/ocf/puppet/blob/17bc94b395e254529d97c84fb044f76931439fd7/modules/ocf_backups/files/rsnapshot.conf#L53
+[prune-old-backups]:
+ https://github.com/ocf/puppet/blob/master/modules/ocf_backups/files/prune-old-backups
diff --git a/ocfweb/docs/docs/staff/backend/firewall.md b/ocfweb/docs/docs/staff/backend/firewall.md
@@ -3,50 +3,49 @@
 We use a Palo Alto Networks (PAN) firewall provided by IST. We have one network
 port in the server room which is activated and behind the firewall; we have
 another network port activated in the lab behind the television which is also
-behind the firewall. All the ports the desktops use are also behind the
-firewall since they are routed through the switch in the server room.
+behind the firewall. All the ports the desktops use are also behind the firewall
+since they are routed through the switch in the server room.
 
 ## Administering the firewall
 
 ### Accessing the interface
 
 Administration of the firewall is done through the [web interface][panorama],
-and must be done from an on-campus IP address (for instance through the
-[library VPN][library-vpn] or SOCKS proxying through an OCF host). **Remember
-to specify https when loading the firewall admin page**, as it does not have a
-redirect from http to https. If you are having connection issues with the
-firewall admin page loading indefinitely, it is likely because you are trying
-to use http or trying to access it from an off-campus IP. To quickly set up a
-SOCKS proxy, run `ssh -D 8000 -N supernova` from any off-campus host and then
-set up the SOCKS proxy (through your OS or through your browser's settings) to
-use the proxy on `localhost` and port `8000`.
+and must be done from an on-campus IP address (for instance through the [library
+VPN][library-vpn] or SOCKS proxying through an OCF host). **Remember to specify
+https when loading the firewall admin page**, as it does not have a redirect
+from http to https. If you are having connection issues with the firewall admin
+page loading indefinitely, it is likely because you are trying to use http or
+trying to access it from an off-campus IP. To quickly set up a SOCKS proxy, run
+`ssh -D 8000 -N supernova` from any off-campus host and then set up the SOCKS
+proxy (through your OS or through your browser's settings) to use the proxy on
+`localhost` and port `8000`.
 
 [panorama]: https://panorama.net.berkeley.edu
 [library-vpn]: https://www.lib.berkeley.edu/using-the-libraries/vpn
 
-To sign in to administer the firewall, make sure to use the single sign-on
-(SSO) option, and it will ask for CalNet authentication.
+To sign in to administer the firewall, make sure to use the single sign-on (SSO)
+option, and it will ask for CalNet authentication.
 
 ### Policies
 
-All our current policies are located in the "Pre Rules" section under
-"Security" in the policies tab. This option should be right at the top in the
-box on the left side of the page. It contains all our rules since we are only
-blocking traffic (either outgoing or incoming) before it goes through the
-firewall, so all we need are pre rules.
-
-In general the interface is pretty self-explanatory. Each rule has a custom
-name and a description that describes what kind of traffic it should be
-blocking or letting through, as well as the source and destination addresses
-(or groups of addresses), application (identified by the firewall), service
-(port), and whether it is allowed or blocked. Each rule has a dropdown next to
-the rule name if you hover over it that leads to the log viewer, where you can
-see what kind of traffic matched each rule and when the traffic was
-allowed/blocked.
-
-Any changes made to the firewall policies need to be committed and pushed to
-the firewall using the commit button and then the push button (or the commit
-and push button to do it in one step) located in the top right.
+All our current policies are located in the "Pre Rules" section under "Security"
+in the policies tab. This option should be right at the top in the box on the
+left side of the page. It contains all our rules since we are only blocking
+traffic (either outgoing or incoming) before it goes through the firewall, so
+all we need are pre rules.
+
+In general the interface is pretty self-explanatory. Each rule has a custom name
+and a description that describes what kind of traffic it should be blocking or
+letting through, as well as the source and destination addresses (or groups of
+addresses), application (identified by the firewall), service (port), and
+whether it is allowed or blocked. Each rule has a dropdown next to the rule name
+if you hover over it that leads to the log viewer, where you can see what kind
+of traffic matched each rule and when the traffic was allowed/blocked.
+
+Any changes made to the firewall policies need to be committed and pushed to the
+firewall using the commit button and then the push button (or the commit and
+push button to do it in one step) located in the top right.
 
 ### Syslog
 

diff --git a/ocfweb/docs/docs/staff/backend/git.md b/ocfweb/docs/docs/staff/backend/git.md
@@ -8,29 +8,28 @@ distributed).
 ## Workflow
 
 Although Git is a great tool for large-scale distributed development, for us a
-Subversion-like workflow with a "central repository" (where you clone/fetch
-from and push to) and linear history makes more sense. The instructions below
-assume that development is happening in a single branch.
+Subversion-like workflow with a "central repository" (where you clone/fetch from
+and push to) and linear history makes more sense. The instructions below assume
+that development is happening in a single branch.
 
 **Only commit your own, original work**. You may commit another staff member's
 work if you have permission and change the author appropriately (e.g.,
 `--author="Guest User <[email protected]>"`). When committing, `git config
 user.name` should be your name and `git config user.email` should be your OCF
-email address -- this should be taken care of by [[LDAP|doc
-staff/backend/ldap]] and `/etc/mailname` on OCF machines.
+email address -- this should be taken care of by [[LDAP|doc staff/backend/ldap]]
+and `/etc/mailname` on OCF machines.
 
 ### To "update"
 
-Get the latest commits from the central repository and update your working
-tree.
+Get the latest commits from the central repository and update your working tree.
 
  git pull --rebase
 
-This will `git fetch` (update your local copy of the remote repository) and
-`git rebase` (rewrite current branch in terms of tracked branch). The rebase
-prevents unnecessary merge commits by moving your local commits on top of the
-latest remote commit (`FETCH_HEAD`). This is a good thing if you have any local
-commits which have not yet been pushed to the remote repository.
+This will `git fetch` (update your local copy of the remote repository) and `git
+rebase` (rewrite current branch in terms of tracked branch). The rebase prevents
+unnecessary merge commits by moving your local commits on top of the latest
+remote commit (`FETCH_HEAD`). This is a good thing if you have any local commits
+which have not yet been pushed to the remote repository.
 
 If you have "dirty" uncommitted changes, you'll need to commit them or stash
 them before rebasing (`git stash`).
@@ -104,8 +103,8 @@ Advanced:
  * line of changes in a repository, default branch is `master`
 * fast-forward
  * advance branch forward in a linear sequence
- * this is usually what we want: the new commit builds directly on the
- previous commit
+ * this is usually what we want: the new commit builds directly on the previous
+ commit
 * hooks
  * optional scripts that can be executed during git operations
  * for example, validate syntax before accepting a commit or deploy code to a
@@ -114,20 +113,21 @@ Advanced:
  * files that are ready to be stored in your next commit
 * references (aka refs)
  * SHA-1 hashes that identify commits
- * `HEAD` points to the latest commit ref in the current branch (`HEAD^` to
- the one before it)
+ * `HEAD` points to the latest commit ref in the current branch (`HEAD^` to the
+ one before it)
 * remote
  * upstream repository that you can `git fetch` from or `git push` to, default
  is `origin`
  * local branches can "track" remote branches (e.g., `master` tracking
  `origin/master`)
 * working tree (aka workspace or working directory)
  * directory that checked out files reside
- * this includes the current branch and any "dirty" uncommitted changes
- (staged or not)
+ * this includes the current branch and any "dirty" uncommitted changes (staged
+ or not)
 
 ## Recommended reading
 
 * [A Visual Git Reference](https://marklodato.github.io/visual-git-guide/)
 * [Git Immersion](http://www.gitimmersion.com/)
-* [The Case for Git Rebase](http://darwinweb.net/articles/the-case-for-git-rebase)
+* [The Case for Git
+ Rebase](http://darwinweb.net/articles/the-case-for-git-rebase)