-
Notifications
You must be signed in to change notification settings - Fork 0
Downloading
Scott Kirkland edited this page Mar 10, 2021
·
2 revisions
We need to be able to download a lot of data (mostly pixel files) from Box so that we can run the cec-dataprep algorithm and convert them into processed cluster files.
To download on farm you can use the following scripts, which utilize the "high" priority cluster of farm to make sure the job is not interrupted.
#!/bin/bash
export DIRECTORY="results_for_processing/"
export FTPPASS="pass"
export FTPUSER="username"
mkdir -p /scratch/$USER/$SLURM_JOBID
cd /scratch/$USER/$SLURM_JOBID
wget -m -nd --ftp-user=$FTPUSER --ftp-password=$FTPPASS "ftp://ftp.box.com/CEC DSS Project/Project Tasks/Task 2 - Spatial Analysis to Locate the Woody Feedstock/$DIRECTORY"
echo "Done downloading files, moving them to home directory"
cp *.* ~/boxftp
echo "Done, cleaning up files in /scratch"
rm -rf /scratch/$USER/$SLURM_JOBID
echo "All done, exiting"
#!/bin/bash -l
# NOTE the -l flag!
# If you need any help, please email [email protected]
# Name of the job - You'll probably want to customize this.
#SBATCH -J download
# Standard out and Standard Error output files with the job number in the name.
#SBATCH -o slurm-download-%j.output
#SBATCH -e slurm-download-%j.output
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --time=08:01:00 # total run time limit (HH:MM:SS)
#SBATCH --partition=high # high priority so it won't be interrupted
# The useful part of your job goes below
# hostname is just for debugging
hostname
# The main job executable to run: note the use of srun before it
sh ./download-pixels-from-box.sh
Now we can kick off the download by simply running sbatch download-pixels-from-box-runner.sh
.
All of the files in the given box directory will be downloaded in a flat (no-directory) structure to ~/boxftp.
For comparison, downloading 21 pixel files of approx 50-75GB took about 40min.