-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the cec-dataprep wiki!
While git
is already installed, you need to install node
in a conda environment.
First, load the conda module in farm module load conda3
.
One time only, create the environment: conda create -yn cec nodejs
.
From then on, every time you login you need to load the conda3 module and do source activate cec
.
It also might be a good idea to write your startup into a bash script and then use source startup.sh
.
You need to download the OSRM data onto farm and put it somewhere in your home directory before the project will run. Use wget to download.
If you haven't already, git clone https://github.com/ucdavis/cec-dataprep
to get this project and then npm install
your dependencies. Run npm run build
to build the javascript files from the typescript files in the repo. You'll have to rebuild every time you update from source.
Now create a batch file to run the main job over NUM_CLUSTERS
clusters.
dataprep.sh
# Standard out and Standard Error output files with the job number in the name.
#SBATCH -o slurm-dataprep-%j.output
#SBATCH -e slurm-dataprep-%j.output
# Ask for enough memory to hold our CSV contents
#SBATCH --mem 8000
# Print the hostname for debugging purposes
hostname
# Set your variables
export OSRM_FILE="/home/postit/osrm-data/california-latest.osrm"
export PIXEL_FILE="/home/postit/pixels/sierra-pixels.csv"
export HGT_FILES="/home/postit/hgt/"
export HGT_USER="postit"
export HGT_PASS="PASSWORD"
export TREATED_OUT_FILE="/home/postit/results.csv"
# export DEBUG=knex:tx
export NODE_OPTIONS="--max-old-space-size=8192" #increase to 8gb
# Run the actual work you want to do
srun node ./cec-dataprep/index.js
Once you have a script, the real work happens when it is batched to be run by the farm slurm system
Example runs:
sbatch -t 30 dataprep.sh
-- submit the job to be run on one node for up to 30 minutes
sbatch -N 1 -n 2 -t 30 dataprep.sh
use one node with 2 processes
sbatch --array=[1-5] -t 30 dataprep.sh
run it 5 times, once for each array value
You can monitor your submitted job with squeue -u $USER
For bulk downloading, see https://github.com/ucdavis/cec-dataprep/wiki/Downloading.
For individual downloading, make the file accessible via link, possibly using Box. If you use box, you must copy the "direct download link" and not preview share link. Download it into a subfolder of your home directory: wget -O file.csv https://url.com/