Skip to content

Commit

Permalink
Fix links
Browse files Browse the repository at this point in the history
  • Loading branch information
felker committed Jul 28, 2023
1 parent b85b106 commit a8436b2
Show file tree
Hide file tree
Showing 21 changed files with 54 additions and 153 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Allocations require management – balance checks, resource allocation, requesting more time, etc.

## Checking for an Active Allocation
To determine if there is an active allocation, check [Job Submision](../../../theta/queueing-and-running-jobs/job-and-queue-scheduling/#submit-a-job).
To determine if there is an active allocation, check [Job Submission](../../theta/queueing-and-running-jobs/job-and-queue-scheduling.md#submit-a-job).

For information on how to run the query, look at our documentation on our [sbank Allocations Accounting System](sbank-allocation-accounting-system.md) or email [[email protected]](mailto:[email protected]) and ask for all active allocations.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ The ALCF will send you a report template at the end of each quarter. Please comp
Please be aware that we will periodically monitor, and could potentially adjust, your project allocation if a large portion of it goes unused. You may view: [Pullback Policy](../../policies/queue-scheduling/pullback-policy.md)

### Allocation Overburn Policy
Please see this page for overburn/overuse eligibility for INCITE projects that have exhausted their allocation in the first 11 months of its allocation year: [Allocation Overburn](../../../policies/queue-scheduling/queue-and-scheduling-policy/#incitealcc-overburn-policy)
Please see this page for overburn/overuse eligibility for INCITE projects that have exhausted their allocation in the first 11 months of its allocation year: [Allocation Overburn](../../policies/queue-scheduling/queue-and-scheduling-policy.md#incitealcc-overburn-policy)

### Acknowledgment In Publications
Please follow the guidelines provided on the [ALCF Acknowledgement Policy page](../../policies/alcf-acknowledgement-policy.md) to properly acknowledge the use of ALCF resources in all of your publications, both online and print.
Expand Down
8 changes: 4 additions & 4 deletions docs/ai-testbed/graphcore/unused/Scaling-ResNet50.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Scaling ResNet50

Follow all the instructions in [Getting Started](/docs/graphcore/Getting-Started) to log into a Graphcore node.
Follow all the instructions in [Getting Started](../getting-started.md) to log into a Graphcore node.

## Examples Repo

Expand Down Expand Up @@ -131,12 +131,12 @@ You should see:
# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
```

## Benchmarks.yml
## `benchmarks.yml`

Update **${HOME}/graphcore/examples/vision/cnns/pytorch/train/benchmarks.yml**
with your favorite editor to match [benchmarks.yml](/docs/graphcore/benchmarks.yml).
with your favorite editor to match [benchmarks.yml](./files/benchmarks.yml).

## Configs.yml
## `configs.yml`

Update **${HOME}/graphcore/examples/vision/cnns/pytorch/train/configs.yml**
with your favorite editor. At about line 30, change **use_bbox_info: true** to
Expand Down
8 changes: 4 additions & 4 deletions docs/ai-testbed/graphcore/unused/profiling-mnist.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Profiling MNIST

Follow all the instructions in [Getting Started](/docs/graphcore/Getting-Started) to log into a Graphcore node.
Follow all the instructions in [Getting Started](../getting-started.md) to log into a Graphcore node.

Follow the instructions in [Virtual Environments](/docs/graphcore/Virtual-Environments) up to and including **PopART Environment Setup**.
Follow the instructions in [Virtual Environments](../virtual-environments.md) up to and including **PopART Environment Setup**.

Following the instructions in [Example Programs](/docs/graphcore/Example-Programs) up to and including
Following the instructions in [Example Programs](../example-programs.md) up to and including
**MNIST, Install Requirements**.

## Change Directory
Expand Down Expand Up @@ -33,4 +33,4 @@ Do so by running the following command:
python mnist_poptorch.py
```

When MNIST has finished running, see [Profiling](/docs/graphcore/Profiling) to use **Graph Analyser**.
When MNIST has finished running, see [Profiling](./profiling.md) to use **Graph Analyser**.
6 changes: 3 additions & 3 deletions docs/ai-testbed/graphcore/unused/profiling-resnet50.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Profiling ResNet50

Follow all the instructions in [Getting Started](/docs/graphcore/Getting-Started) to log into a Graphcore node.
Follow all the instructions in [Getting Started](../getting-started.md) to log into a Graphcore node.

Follow the instructions in [Virtual Environments](/docs/graphcore/Virtual-Environments) up to and including **PopART Environment Setup**.
Follow the instructions in [Virtual Environments](../virtual-environments.md) up to and including **PopART Environment Setup**.

## Examples Repo

Expand Down Expand Up @@ -58,4 +58,4 @@ python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_re

## Profile Results

When ResNet50 has finished running, see [Profiling](/docs/graphcore/Profiling) to use **Graph Analyser**.
When ResNet50 has finished running, see [Profiling](./profiling.md) to use **Graph Analyser**.
2 changes: 1 addition & 1 deletion docs/ai-testbed/sambanova_gen1/unused/sambanova.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ broken (503 errors).

## Further Information

[Human Decisions Files notes](/display/AI/Human+Decisions+Files+notes)
<!-- [Human Decisions Files notes](/display/AI/Human+Decisions+Files+notes) -->

## Creating a SambaNova Portal Account to access the documentation portal

Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# Example Multi-Node Programs

In this section we will learn how to extend the UNet2d and Gpt1.5B applications scripts that we introduced in the [Example Programs](/docs/ai-testbed/sambanova_gen2/example-programs.md) to compile and run multiple instances of the model in a data parallel fashion across multiple tiles or across multiple nodes.

<!--- In this section we will learn how to extend the Gpt1.5B application that we instroduced in the [Example Programs](/docs/ai-testbed/sambanova_gen2/example-programs.md) to compile and run multiple instances of the model in a data parallel fashion across multiple tiles or across multiple nodes. --->
In this section we will learn how to extend the UNet2d and Gpt1.5B applications scripts that we introduced in the [Example Programs](./example-programs.md) to compile and run multiple instances of the model in a data parallel fashion across multiple tiles or across multiple nodes.

## UNet2d

Expand Down
110 changes: 0 additions & 110 deletions docs/ai-testbed/sambanova_gen2/unused/sambanova.md

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Compiling on Polaris Login and Compute Nodes

If your build system does not require GPUs for the build process, as is usually the case, compilation of GPU-accelerated codes is generally expected to work well on the Polaris login nodes. If your build system _does_ require GPUs, you cannot yet compile on the Polaris login nodes, as they do not currently have GPUs installed. You may in this case compile your applications on the Polaris compute nodes. Do this by submitting an [interactive single-node job](/polaris/running-jobs#Interactive-Jobs-on-Compute-Nodes), or running your build system in a batch job.
If your build system does not require GPUs for the build process, as is usually the case, compilation of GPU-accelerated codes is generally expected to work well on the Polaris login nodes. If your build system _does_ require GPUs, you cannot yet compile on the Polaris login nodes, as they do not currently have GPUs installed. You may in this case compile your applications on the Polaris compute nodes. Do this by submitting an [interactive single-node job](../running-jobs.md#Interactive-Jobs-on-Compute-Nodes), or running your build system in a batch job.

<!-- The following section on home file system would be more useful somewhere else --Tim W.: -->

Expand Down
2 changes: 1 addition & 1 deletion docs/running-jobs/example-job-scripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ A simple example using a similar script on Polaris is available in the

## CPU MPI-OpenMP Examples

The following `submit.sh` example submits a 1-node job to Polaris with 16 MPI ranks per node and 2 OpenMP threads per rank. See [Queues](./job-and-queue-scheduling/#queues) for details on practical limits to node counts and job times for different sizes of jobs.
The following `submit.sh` example submits a 1-node job to Polaris with 16 MPI ranks per node and 2 OpenMP threads per rank. See [Queues](./job-and-queue-scheduling.md#queues) for details on practical limits to node counts and job times for different sizes of jobs.

The [`hello_affinity`](https://github.com/argonne-lcf/GettingStarted/tree/master/Examples/Polaris/affinity_gpu) program is a compiled C++ code, which is built via `make -f Makefile.nvhpc` in the linked directory after cloning the [Getting Started](https://github.com/argonne-lcf/GettingStarted) repository.

Expand Down
2 changes: 1 addition & 1 deletion docs/running-jobs/job-and-queue-scheduling.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ Where:
* `walltime=HH:MM:SS` specifying a wall time is mandatory at the ALCF. Valid wall times depend on the queue you are using. There is a table with the queues for each machine at the end of this section and in the machine specific documentation.
* `filesystems=fs1:fs2:...` Specifying which filesystems your application uses is mandatory at ALCF. The reason for this is if a filesystem goes down, we have a way of making PBS aware of that and it won't run jobs that need that filesystem. If you don't specify filesystems you will receive the following error: `qsub: Resource: filesystems is required to be set.`
* `place=scatter` is telling PBS you want each of your chunks on a separate vnode. By default, PBS will pack your chunks to get maximum utilization. If you requested `ncpus=1` and `chunks=64` **without** `place=scatter` on a system with `ncpus=64`, all your chunks would end up on one node.
* Your job script: See [Example Job Scripts](../example-job-scripts) for more information about how to build your job script. For options that wont change, you do have the option of taking things off the command line and putting them in your job script. For instance the above command line could be simplified to `qsub -l select=<#> <your job script>` if you added the following to the top (the PBS directives have to be before any executable line) of your job script:
* Your job script: See [Example Job Scripts](./example-job-scripts.md) for more information about how to build your job script. For options that wont change, you do have the option of taking things off the command line and putting them in your job script. For instance the above command line could be simplified to `qsub -l select=<#> <your job script>` if you added the following to the top (the PBS directives have to be before any executable line) of your job script:

```bash
#PBS -A <project>
Expand Down
2 changes: 1 addition & 1 deletion docs/services/jenkins.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Jenkins on Theta

## Jenkins to be decommissioned
New projects should request access to use our GitLab-CI-based service. You can learn how to request access in our documentation found [here](/services/gitlab-ci/#quickstart).
New projects should request access to use our GitLab-CI-based service. You can learn how to request access in our documentation found [here](./gitlab-ci.md#quickstart).

Existing projects can continue to use Jenkins. We will notify projects when we have the date it will be retired. Projects will have ample notice to migrate their work to our GitLab-CI service.

Expand Down
9 changes: 9 additions & 0 deletions docs/stylesheets/alcf-extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -913,3 +913,12 @@ footer a:hover {
.js-dropdown-hidden {
display: none;
}

table {
table-layout: fixed;
max-width: 100%;
}

.md-typeset code {
overflow-wrap: break-word;
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ To build Python packages for ThetaGPU, there are two options: build on top of a
## Build on ThetaGPU compute using Conda
To build on ThetaGPU compute and install your own packages, login to theta and then submit an interactive job to log on to ThetaGPU compute node.

Please see [Running PyTorch with Conda](/dl-frameworks/running-pytorch-conda.md) or [Running TensorFlow with Conda](/dl-frameworks/running-tensorflow-conda/index.html) for more information.
Please see [Running PyTorch with Conda](./dl-frameworks/running-pytorch-conda.md) or [Running TensorFlow with Conda](./dl-frameworks/running-tensorflow-conda.md) for more information.

## Building on top of a container
At the moment, you will need two shells to do this: have one open on a login node (for example, ```thetaloginN```, and one open on a compute node (```thetagpuN```). First, start the container in interactive mode:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ As with all Argonne Leadership Computing Facility production systems, job priori
* job duration - shorter duration jobs will accumulate priority more quickly, so it is best to specify the job run time as accurately as possible

### Reservations and Scheduling Policy
Some work will require use of Theta that requires deviation from regular policy. On such occasions, normal reservation policy applies. Please send the [regular form](/docs/theta/queueing-and-running-jobs/machine-reservations.md) no fewer than five (5) business days in advance.
Some work will require use of Theta that requires deviation from regular policy. On such occasions, normal reservation policy applies. Please send the [regular form](../../theta/queueing-and-running-jobs/machine-reservations.md) no fewer than five (5) business days in advance.

### Monday Maintenance
When the ALCF is on a regular business schedule, preventitive maintenance is typically scheduled on alternate Mondays. The showres command may be used to view pending and active maintenance reservations.
Expand Down
Loading

0 comments on commit a8436b2

Please sign in to comment.