Shreya's Notes InnerEye (Issue 148)

1/31/21 - Met with Alisha and Jacopo to discuss Alisha's progress with the issue this past fall

Her first step was to email someone (contact info pending) to gain access to a JHU azure account
After this, Alisha followed the Getting Started portion of the InnerEye wiki and set up the HelloWorld Module
For more context on the above, she ran through (1) Setting up your environment, (2) Training a HelloWorld segmentation model, and (3) How to set up Azure Machine Learning for InnerEye
She then followed the Lung Segmentation Task

Based on her progress, Jacopo and I were able to plan some immediate next steps:

We need to gain access to a JHU azure account
We need to run through all the above setup steps and be able to follow the sample classification and sample segmentation tutorials
Then, we will manually upload the raw benchmarking data to Azure and attempt to run the Lung Segmentation or Glaucoma Classification models on that data.
Note that this will involve converting from .tif format (our data) to NIFTI format (what InnerEye seems compatible with). We could use an approach like the one linked here.

For our longer term goals:

We want to be able to use Azure computation credits without storing the data on Azure. We need to write a script that can port data over from AWS and feed it into InnerEye.
If segmentation/classification performance on the brainlit benchmarking data is not great because it is a difficult transfer learning task, we can test InnerEye on other potential datasets. (This could include brain1/2/3 and data that isn't MouseLight data)
Lastly, if manually uploading the benchmarking data and/or porting the data from AWS does not seem feasible, we can can shift gears and work on making the CloudVolume package compatible with Azure, as detailed by this issue. We could then use CloudVolume to preprocess and upload the MouseLight data directly to Azure and run InnerEye from there.

Method 1: Copying/mirroring data from S3 to Azure using rclone

Pros:

Suggested by WeiWei Yang from MSR - it seems she got something similar to work successfully
Wouldn't require moving the data from S3

Cons:

Method 2: Running scripts on Azure that reference the data in S3

Pros:

Cons:

Method 3: Directly uploading data to Azure

Pros:

Cons:

How to create a Linux VM in Azure and run local scripts on it with limited overhead. HERE

How to create and run experiments in the Azure ML environment. More seamless but with more overhead. HERE

Using AzureML Compute Clusters through local script. HERE