Table of Contents
unirep_analysis
is a ChRIS app that is wrapped around the UniRep project (https://github.com/churchlab/UniRep)
This plugin is GPU-capable. The 64-unit model should be OK to run on any machine. The full-sized model will require a machine with more than 8GB of GPU RAM.
For full information about the underlying method, consult the UniRep publication:
Paper: https://www.nature.com/articles/s41592-019-0598-1
The source code of UniRep is available on Github: https://github.com/churchlab/UniRep.
unirep_analysis \ [--dimension <modelDimension>] \ [--batch_size <batchSize>] \ [--learning_rate <learningRate>] \ [--inputFile <inputFileToProcess>] \ [--inputGlob <inputGlobPattern>] \ [--modelWeightPath <pathToWeights>] \ [--outputFile <resultOutputFile>] \ [--topModelTraining] \ [--jointModelTraining] \ [--json] \ <inputDir> <outputDir>
unirep_analysis
is a ChRIS-based "plugin" application that is capable of inferencing protein sequence representations and generative modelling aka "babbling".
Simply pull the docker image,
docker pull fnndsc/pl-unirep_analysis
and go straight to the examples section.
[--dimension <modelDimension>] By default, the <modelDimension> is 64. However, the value can be changed to 1900 (full) or 256 and the corresponding weights files (present inside the container) will be used. [--batch_size <batchSize>] This represents the batch size of the babbler. Default value is 12. [--learning_rate <learningRate>] Needed to build the model. Default is 0.001. [--inputFile <inputFileToProcess>] The name of the input text file that contains your amino acid sequences. The default file name is an empty string. The full path to the <inputFileToProcess> is constructed by concatenating <inputDir> <inputDir>/<inputFileToProcess> [--inputGlob <inputGlob>] A glob pattern string, default '**/*txt', that specifies the file containing an amino acid sequence. This parameter allows for dynamic searching in the input space a sequence file, and the first "hit" is grabbed. [--modelWeightPath <path>] A path to a directory containing model weight files to use for inference. [--outputFile <resultOutputFile>] The name of the output or formatted 'txt' file. Default name is 'format.txt' [--topModelTraining] If specified, run a training model just optimizing top model [--jointModelTraing] If specified, jointly train top model and mLSTM [-h] Display inline help [--json] If specified, print a JSON representation of the app.
The execute vector of this plugin is via docker
.
To run using docker
, be sure to assign an "input" directory to /incoming
and an output directory to /outgoing
. Make sure that the $(pwd)/out
directory is world writable!
Now, prefix all calls with
docker run --rm -v $(pwd)/out:/outgoing \
fnndsc/pl-unirep_analysis \
unirep_analysis \
Thus, getting inline help is:
mkdir in out && chmod 777 out
docker run --rm -v $(pwd)/in:/incoming -v $(pwd)/out:/outgoing \
fnndsc/pl-unirep_analysis \
unirep_analysis \
-h \
/incoming /outgoing
Assuming that the <inputDir>
layout conforms to
<inputDir>
│
└──█ sequence.txt
to process this (by default on a GPU) do
docker run --rm --gpus all \
-v $(pwd)/in:/incoming -v $(pwd)/out:/outgoing \
fnndsc/pl-unirep_analysis unirep_analysis \
--inputFile sequence.txt --outputFile formatted.txt \
/incoming /outgoing
(note the --gpus all
is not necessarily required) which will create in the <outputDir>
:
<outputDir>
│
└──█ formatted.txt
To perform in-line debugging of the container, do
docker run --rm -it --userns=host -u $(id -u):$(id -g) \
-v $PWD/unirep_analysis.py:/usr/local/lib/python3.5/dist-packages/unirep_analysis.py:ro \
-v $PWD/src:/usr/local/lib/python3.5/dist-packages/src \
-v $PWD/in:/incoming:ro -v $PWD/out:/outgoing:rw -w /outgoing \
local/pl-unirep_analysis2 unirep_analysis /incoming /outgoing
Note, if you want to use pudb for debugging, then omit the -u $(id -u):$(id -g)
:
docker run --rm -it --userns=host \
-v $PWD/unirep_analysis.py:/usr/local/lib/python3.5/dist-packages/unirep_analysis.py:ro \
-v $PWD/src:/usr/local/lib/python3.5/dist-packages/src \
-v $PWD/in:/incoming:ro -v $PWD/out:/outgoing:rw -w /outgoing \
local/pl-unirep_analysis2 unirep_analysis /incoming /outgoing
Of course, in both cases above, use approrpiate CLI args if required.
_-30-_