Skip to content

Downloading

Scott Kirkland edited this page Mar 10, 2021 · 2 revisions

Downloading files from Box (or any FTP enabled site) onto the Farm Cluster

We need to be able to download a lot of data (mostly pixel files) from Box so that we can run the cec-dataprep algorithm and convert them into processed cluster files.

To download on farm you can use the following scripts, which utilize the "high" priority cluster of farm to make sure the job is not interrupted.

download-pixels-from-box.sh

#!/bin/bash

export DIRECTORY="results_for_processing/"
export FTPPASS="pass"
export FTPUSER="username"

mkdir -p /scratch/$USER/$SLURM_JOBID

cd /scratch/$USER/$SLURM_JOBID

wget -m -nd --ftp-user=$FTPUSER --ftp-password=$FTPPASS "ftp://ftp.box.com/CEC DSS Project/Project Tasks/Task 2 - Spatial Analysis to Locate the Woody Feedstock/$DIRECTORY"

echo "Done downloading files, moving them to home directory"

cp *.* ~/boxftp

echo "Done, cleaning up files in /scratch"

rm -rf /scratch/$USER/$SLURM_JOBID

echo "All done, exiting"

download-pixels-from-box-runner.sh

#!/bin/bash -l
# NOTE the -l flag!

# If you need any help, please email [email protected]

# Name of the job - You'll probably want to customize this.
#SBATCH -J download

# Standard out and Standard Error output files with the job number in the name.
#SBATCH -o slurm-download-%j.output
#SBATCH -e slurm-download-%j.output

#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --time=08:01:00          # total run time limit (HH:MM:SS)
#SBATCH --partition=high         # high priority so it won't be interrupted

# The useful part of your job goes below

# hostname is just for debugging
hostname

# The main job executable to run: note the use of srun before it
sh ./download-pixels-from-box.sh

Running

Now we can kick off the download by simply running sbatch download-pixels-from-box-runner.sh.

All of the files in the given box directory will be downloaded in a flat (no-directory) structure to ~/boxftp.

For comparison, downloading 21 pixel files of approx 50-75GB took about 40min.