Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc updates #800

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions ocfweb/docs/docs/services/hpc.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,16 +39,15 @@ where you can ask questions and talk to us about anything HPC.

## The Cluster

As of Fall 2018, the OCF HPC cluster is composed of one server, with the
As of Spring 2023, the OCF HPC cluster is composed of one server, with the
following specifications:

* 2 Intel Xeon [E5-2640v4][corruption-cpu] CPUs (10c/20t @ 2.4GHz)
* 4 NVIDIA 1080Ti GPUs
* 256GB ECC DDR4-2400 RAM
* 2 Intel Xeon Platinum [8352Y][corruption-cpu] CPUs (32c/64t @ 2.2GHz)
* 4 NVIDIA RTX A6000 GPUs
* 256GB ECC DDR4-3200 RAM

We have plans to expand the cluster with additional nodes of comparable
specifications as funding becomes available. The current hardware was
generously funded by a series of grants from the [Student Tech Fund][stf].
specifications as funding becomes available.

## Slurm

Expand Down
2 changes: 1 addition & 1 deletion ocfweb/docs/docs/services/web/php.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[[!meta title="PHP"]]

`death`, the OCF webserver, currently runs PHP 7.0 with the following
`death`, the OCF webserver, currently runs PHP 7.4 with the following
non-standard packages installed:

* [APCu](https://www.php.net/manual/en/book.apcu.php) (opcode caching)
Expand Down
134 changes: 53 additions & 81 deletions ocfweb/docs/docs/staff/backend/backups.md
Original file line number Diff line number Diff line change
@@ -1,89 +1,60 @@
[[!meta title="Backups"]]
## Backup Storage

We currently store our on-site backups across a couple drives on `hal`:
We currently store our on-site backups across a ZFS RAID1 mirror on `hal`:

* `hal:/opt/backups` (6 TiB usable; 2x 6-TiB Seagate drives in RAID 1 in an LVM
volume group)
* `hal:/backup` (16 TB usable; 2x 16 TB WD drives in ZFS mirror)

This volume group provides `/dev/vg-backups/backups-live` which contains
recent daily, weekly, and monthly backups, and
`/dev/vg-backups/backups-scratch`, which is scratch space for holding
compressed and encrypted backups which we then upload to off-site storage.
Backups are stored as ZFS snapshots. ZFS snapshots have the advantage of being
immutable, browsable, and they can be sent to other ZFS pools for off-site
backups.

## Off-Site Backups

Our main off-site backup location is [Box][box]. Students automatically get an
"unlimited" plan, so it provides a nice and free location to store encrypted
backups. We currently have a weekly cronjob that [makes an encrypted
backup][create-encrypted-backup] using GPG keys and then [uploads it to
Box.com][upload-to-box]. This takes about 20 hours combined to make and upload,
and will probably take even longer in the future as backups grow. An email is
sent out once the backup files are uploaded, and the link provided is shared
with only OCF officers to make sure the backups are kept as secure as possible,
since they contain all of the OCF's important data. The backups are already
encrypted, but it doesn't hurt to add a little extra security to that.

### Retention

Off-site backups older than six months (180 days) are permanently deleted by a
[daily cronjob][prune-old-backups].
Todo: new off-site backup documentation.

## Restoring Backups

The easiest way to restore from a backup is to look at how it is made and
reverse it. If it is a directory specified in rsnapshot, then likely all that
needs to be done is to take that directory from the backup and put it onto the
server to restore onto. Some backups, such as mysql, ldap, and kerberos are
more complicated, and need to be restored using `mysqlimport` or `ldapadd` for
server to restore onto. Some backups, such as mysql, ldap, and kerberos are more
complicated, and need to be restored using `mysqlimport` or `ldapadd` for
instance.

### Onsite

Onsite backups are pretty simple, all that needs to be done is to go to `hal`
and find the backup to restore from in `/opt/backups/live`. All backups of
recent data are found in either `rsnapshot` (for daily backups) or `misc` (for
any incidents or one-off backups). Within `rsnapshot`, the backups are
organized into directories dependings on how long ago the backup was made. To
see when each backup was created just use `ls -l` to show the last modified
time of each directory.
Compared to the old setup, onsite backups are a little harder to find.They are
located at `/backup/encrypted/rsnapshot` on `hal`. In addition, we have a
dataset for each top-level user directory, such as `/home/a/`, which is stored
as the `backup/encrypted/rsnapshot/.sync/nfs/opt/homes/home/a` dataset.

The ZFS snapshots are stored in the `.zfs/snapshot` directory of each dataset.
The `.zfs` folder is hidden and will not show up even with the `ls -a` command,
so you will need to manually `cd` into the directory. The snapshots are time-
stamped, so you can find the snapshot you want to restore from by looking at the
date string in the snapshot name. For example, if you wanted to restore the
`public_html` directory of user `foo` with the backup from 2023-05-01, you
should enter the
```
/backup/encrypted/rsnapshot/.sync/nfs/opt/homes/services/http/users/f/.zfs
```
directory, and then go inside the `snapshot/` folder. From there, you enter the
`zfs-auto-snap_after-backup-2023-05-01-1133/` directory (note that the time is
UTC), and then you can copy the `foo/` directory to the user's home directory.

For large directories, please use `/backup/encrypted/scratch` as a temporary
working area for compressing the files and other operations. Please note that
this dataset will not be automatically snapshotted.

MySQL backups are stored at the `/backup/encrypted/rsnapshot/mysql/` directory,
and the snapshots can be accessed at
`/backup/encrypted/rsnapshot/mysql/.zfs/snapshot/`. Inside a snapshot, the
individual databases are stored as `.sql` files inside the `.sync/` directory.

### Offsite

Offsite backups are more complicated because the backup files first need to be
downloaded, stuck together into a single file, decrypted, extracted, and then
put into LVM to get back the whole backup archive that would normally be found
onsite. This essentially just means that the
[create-encrypted-backup][create-encrypted-backup] script needs to be reversed
to restore once the backup files are downloaded. Here are the general steps to
take to restore from an offsite backup:

1. Download all the backup pieces from Box.com. This is generally easiest with
a command line tool like `cadaver`, which can just use a `mget *` to download
all the files (albeit sequentially). If more speed is needed, open multiple
`cadaver` connections and download multiple groups of files at once.

2. Put together all the backup pieces into a single file. This can be done by
running `cat <backup>.img.gz.gpg.part* > <backup>.img.gz.gpg`.

3. Decrypt the backup using `gpg`. This requires your key pair to be imported
into `gpg` first using `gpg --import public_key.gpg` and
`gpg --allow-secret-key-import --import private_key.gpg`, then you can
decrypt the backup with
`gpg --output <backup>.img.gz --decrypt <backup>.img.gz.gpg`. Be careful to
keep your private key secure by setting good permissions on it so that nobody
else can read it, and delete it after the backup is imported. The keys can be
deleted with `gpg --delete-secret-keys "<Name>"` and
`gpg --delete-key "<Name>"`, where your name is whatever name it shows when
you run `gpg --list-keys`.

4. Extract the backup with `gunzip <backup>.img.gz`.

5. Put the backup image into a LVM logical volume. First find the size that the
volume should be by running `ls -l <backup>.img`, and copy the number of
bytes that outputs. Then create the LV with
`sudo lvcreate -L <bytes>B -n <name> /dev/<volume group>` where the volume
group has enough space to store the entire backup (2+ TiB).
Todo: add instructions for restoring offsite backups using zfs send/receive.

## Backup Contents

Expand All @@ -93,33 +64,34 @@ Backups currently include:
* User home and web directories
* Cronjobs on supported servers (tsunami, supernova, biohazard, etc.)
* MySQL databases (including user databases, stats, RT, print quotas, IRC data)
* Everything on GitHub (probably very unnecessary)
* A few OCF repositories on GitHub (probably very unnecessary)
* LDAP and Kerberos data
* A [smattering of random files on random servers][backed-up-files]

## Backup Procedures

Backups are currently made daily via a cronjob on `hal` which calls `rsnapshot`.
The current settings are to retain 7 daily backups, 4 weekly backups, and 6
monthly backups, but we might adjust this as it takes more space or we get
larger backup drives.

We use `rsnapshot` to make incremental backups. Typically, each new backup
takes an additional ~3GiB of space (but this will vary based on how many files
actually changed). A full backup is about ~2TiB of space and growing.
We use `rsnapshot` and ZFS snapshots to make incremental backups. Typically,
each new backup takes an additional ~20GiB of space (but this will vary based on
how many files actually changed). A full backup is about ~4TiB of space and
growing.

(The incremental file backups are only about ~300 MiB, but since mysqldump
files can't be incrementally backed up, those take a whole ~2 GiB each time, so
the total backup grows by ~3GiB each time. However, an old backup is discarded
each time too, so it approximately breaks even.)
(The incremental file backups are only about ~1-5 GiB, but since MySQL and
Postgres files can't be incrementally backed up, those take a whole ~ 15 GiB
each time, so the total backup grows by ~20GiB each time.)

## Ideas for backup improvements

1. Automate backup testing, so have some system for periodically checking that
backups can be restored from, whether they are offsite or onsite.

[box]: https://www.box.com
[create-encrypted-backup]: https://github.com/ocf/puppet/blob/master/modules/ocf_backups/files/create-encrypted-backup
[upload-to-box]: https://github.com/ocf/puppet/blob/master/modules/ocf_backups/files/upload-to-box
[backed-up-files]: https://github.com/ocf/puppet/blob/17bc94b395e254529d97c84fb044f76931439fd7/modules/ocf_backups/files/rsnapshot.conf#L53
[prune-old-backups]: https://github.com/ocf/puppet/blob/master/modules/ocf_backups/files/prune-old-backups
[rsyncnet]: https://www.rsync.net
[create-encrypted-backup]:
https://github.com/ocf/puppet/blob/master/modules/ocf_backups/files/create-encrypted-backup
[upload-to-box]:
https://github.com/ocf/puppet/blob/master/modules/ocf_backups/files/upload-to-box
[backed-up-files]:
https://github.com/ocf/puppet/blob/17bc94b395e254529d97c84fb044f76931439fd7/modules/ocf_backups/files/rsnapshot.conf#L53
[prune-old-backups]:
https://github.com/ocf/puppet/blob/master/modules/ocf_backups/files/prune-old-backups
61 changes: 30 additions & 31 deletions ocfweb/docs/docs/staff/backend/firewall.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,50 +3,49 @@
We use a Palo Alto Networks (PAN) firewall provided by IST. We have one network
port in the server room which is activated and behind the firewall; we have
another network port activated in the lab behind the television which is also
behind the firewall. All the ports the desktops use are also behind the
firewall since they are routed through the switch in the server room.
behind the firewall. All the ports the desktops use are also behind the firewall
since they are routed through the switch in the server room.

## Administering the firewall

### Accessing the interface

Administration of the firewall is done through the [web interface][panorama],
and must be done from an on-campus IP address (for instance through the
[library VPN][library-vpn] or SOCKS proxying through an OCF host). **Remember
to specify https when loading the firewall admin page**, as it does not have a
redirect from http to https. If you are having connection issues with the
firewall admin page loading indefinitely, it is likely because you are trying
to use http or trying to access it from an off-campus IP. To quickly set up a
SOCKS proxy, run `ssh -D 8000 -N supernova` from any off-campus host and then
set up the SOCKS proxy (through your OS or through your browser's settings) to
use the proxy on `localhost` and port `8000`.
and must be done from an on-campus IP address (for instance through the [library
VPN][library-vpn] or SOCKS proxying through an OCF host). **Remember to specify
https when loading the firewall admin page**, as it does not have a redirect
from http to https. If you are having connection issues with the firewall admin
page loading indefinitely, it is likely because you are trying to use http or
trying to access it from an off-campus IP. To quickly set up a SOCKS proxy, run
`ssh -D 8000 -N supernova` from any off-campus host and then set up the SOCKS
proxy (through your OS or through your browser's settings) to use the proxy on
`localhost` and port `8000`.

[panorama]: https://panorama.net.berkeley.edu
[library-vpn]: https://www.lib.berkeley.edu/using-the-libraries/vpn

To sign in to administer the firewall, make sure to use the single sign-on
(SSO) option, and it will ask for CalNet authentication.
To sign in to administer the firewall, make sure to use the single sign-on (SSO)
option, and it will ask for CalNet authentication.

### Policies

All our current policies are located in the "Pre Rules" section under
"Security" in the policies tab. This option should be right at the top in the
box on the left side of the page. It contains all our rules since we are only
blocking traffic (either outgoing or incoming) before it goes through the
firewall, so all we need are pre rules.

In general the interface is pretty self-explanatory. Each rule has a custom
name and a description that describes what kind of traffic it should be
blocking or letting through, as well as the source and destination addresses
(or groups of addresses), application (identified by the firewall), service
(port), and whether it is allowed or blocked. Each rule has a dropdown next to
the rule name if you hover over it that leads to the log viewer, where you can
see what kind of traffic matched each rule and when the traffic was
allowed/blocked.

Any changes made to the firewall policies need to be committed and pushed to
the firewall using the commit button and then the push button (or the commit
and push button to do it in one step) located in the top right.
All our current policies are located in the "Pre Rules" section under "Security"
in the policies tab. This option should be right at the top in the box on the
left side of the page. It contains all our rules since we are only blocking
traffic (either outgoing or incoming) before it goes through the firewall, so
all we need are pre rules.

In general the interface is pretty self-explanatory. Each rule has a custom name
and a description that describes what kind of traffic it should be blocking or
letting through, as well as the source and destination addresses (or groups of
addresses), application (identified by the firewall), service (port), and
whether it is allowed or blocked. Each rule has a dropdown next to the rule name
if you hover over it that leads to the log viewer, where you can see what kind
of traffic matched each rule and when the traffic was allowed/blocked.

Any changes made to the firewall policies need to be committed and pushed to the
firewall using the commit button and then the push button (or the commit and
push button to do it in one step) located in the top right.

### Syslog

Expand Down
38 changes: 19 additions & 19 deletions ocfweb/docs/docs/staff/backend/git.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,29 +8,28 @@ distributed).
## Workflow

Although Git is a great tool for large-scale distributed development, for us a
Subversion-like workflow with a "central repository" (where you clone/fetch
from and push to) and linear history makes more sense. The instructions below
assume that development is happening in a single branch.
Subversion-like workflow with a "central repository" (where you clone/fetch from
and push to) and linear history makes more sense. The instructions below assume
that development is happening in a single branch.

**Only commit your own, original work**. You may commit another staff member's
work if you have permission and change the author appropriately (e.g.,
`--author="Guest User <[email protected]>"`). When committing, `git config
user.name` should be your name and `git config user.email` should be your OCF
email address -- this should be taken care of by [[LDAP|doc
staff/backend/ldap]] and `/etc/mailname` on OCF machines.
email address -- this should be taken care of by [[LDAP|doc staff/backend/ldap]]
and `/etc/mailname` on OCF machines.

### To "update"

Get the latest commits from the central repository and update your working
tree.
Get the latest commits from the central repository and update your working tree.

git pull --rebase

This will `git fetch` (update your local copy of the remote repository) and
`git rebase` (rewrite current branch in terms of tracked branch). The rebase
prevents unnecessary merge commits by moving your local commits on top of the
latest remote commit (`FETCH_HEAD`). This is a good thing if you have any local
commits which have not yet been pushed to the remote repository.
This will `git fetch` (update your local copy of the remote repository) and `git
rebase` (rewrite current branch in terms of tracked branch). The rebase prevents
unnecessary merge commits by moving your local commits on top of the latest
remote commit (`FETCH_HEAD`). This is a good thing if you have any local commits
which have not yet been pushed to the remote repository.

If you have "dirty" uncommitted changes, you'll need to commit them or stash
them before rebasing (`git stash`).
Expand Down Expand Up @@ -104,8 +103,8 @@ Advanced:
* line of changes in a repository, default branch is `master`
* fast-forward
* advance branch forward in a linear sequence
* this is usually what we want: the new commit builds directly on the
previous commit
* this is usually what we want: the new commit builds directly on the previous
commit
* hooks
* optional scripts that can be executed during git operations
* for example, validate syntax before accepting a commit or deploy code to a
Expand All @@ -114,20 +113,21 @@ Advanced:
* files that are ready to be stored in your next commit
* references (aka refs)
* SHA-1 hashes that identify commits
* `HEAD` points to the latest commit ref in the current branch (`HEAD^` to
the one before it)
* `HEAD` points to the latest commit ref in the current branch (`HEAD^` to the
one before it)
* remote
* upstream repository that you can `git fetch` from or `git push` to, default
is `origin`
* local branches can "track" remote branches (e.g., `master` tracking
`origin/master`)
* working tree (aka workspace or working directory)
* directory that checked out files reside
* this includes the current branch and any "dirty" uncommitted changes
(staged or not)
* this includes the current branch and any "dirty" uncommitted changes (staged
or not)

## Recommended reading

* [A Visual Git Reference](https://marklodato.github.io/visual-git-guide/)
* [Git Immersion](http://www.gitimmersion.com/)
* [The Case for Git Rebase](http://darwinweb.net/articles/the-case-for-git-rebase)
* [The Case for Git
Rebase](http://darwinweb.net/articles/the-case-for-git-rebase)
Loading