The doAzureParallel package allows you to manage long running jobs easily. There are 2 ways to run a job:
- Synchronous
- Asynchronous
Long-running job should be run in non-interactive and asynchronous mode.
doAzureParallel also helps you manage your jobs so that you can run many jobs at once while managing it through a few simple methods.
# List your jobs:
getJobList()
# Get your job by job id:
getJob(jobId = 'unique_job_id', verbose = TRUE)
This will also let you run long running jobs easily.
With long running jobs, you will need to keep track of your jobs as well as set your job to a non-blocking state. You can do this with the .options.azure options:
# set the .options.azure option in the foreach loop
opt <- list(job = 'unique_job_id', wait = FALSE)
# NOTE - if the option wait = FALSE, foreach will return your unique job id
job_id <- foreach(i = 1:number_of_iterations, .options.azure = opt) %dopar % { ... }
# get back your job results with your unique job id
results <- getJobResult(job_id)
Finally, you may also want to track the status of jobs by state (active, completed etc):
# List jobs in completed state:
filter <- list()
filter$state <- c("active", "completed")
jobList <- getJobList(filter)
View(jobList)
You can learn more about how to execute long-running jobs here.
With long-running jobs, you can take advantage of Azure's autoscaling capabilities to save time and/or money. Learn more about autoscale here.
You can configure a job to run asynchronously by specifying wait = FALSE in job options:
options <- list(wait = FALSE)
jobId <- foreach(i = 1:number_of_iterations, .options.azure = options) %dopar% { ... }
The returned value is the job Id associated with the foreach loop. Use this returned value you can get job status and job result.
You can optionally specify the job Id in options as shown below:
options <- list(wait = FALSE, job = 'myjob')
foreach(i = 1:number_of_iterations, .options.azure = options) %dopar% { ... }
You can list all jobs currently running in your account by running:
getJobList()
Example output:
getJobList()
sample output:
--------------
Id State Status FailedTasks TotalTasks
1 job11 active No tasks in the job 0 0
2 job20170714215517 active 0 % 0 6
3 job20170714220129 active 0 % 0 6
4 job20170714221557 active 84 % 4 6
5 job20170803210552 active 0 % 0 6
6 job20170803212205 active 0 % 0 6
7 job20170803212558 active 0 % 0 6
8 job20170714211502 completed 100 % 5 6
9 job20170714223236 completed 100 % 0 6
You can also filter job list by job state such as active or completed
filter <- filter <- list()
filter$state <- c("active", "completed")
getJobList(filter)
getJob returns job metadata, such as chunk size, whether cloud combine is enabled, and packages specified for the job, it also returns task counts in different state
getJob(jobId)
getJob(jobId, verbose = TRUE)
sample output:
--------------
job metadata:
chunkSize: 1
enableCloudCombine: TRUE
packages: httr
tasks:
active: 1
running: 0
completed: 5
succeeded: 0
failed: 5
total: 6
job state: completed
Once job is completed successfully, you can call getJobResult to retrieve the job result:
jobResult <- getJobResult(jobId)
Once you get the job result, you can delete the job and its result. Please note deleteJob will delete the job at batch service and the storage container holding the job result.
deleteJob(jobId)
A working sample can be found in the samples directory.