Machine learning of low and high temperature proteins
Create and activate the environment specified in environment.yml
conda env create --file environment.yml
conda activate learn2thermML
Ensure that the following environmental variables are set for pipeline exacution:
LOGLEVEL
(optional) - Specified logging level to run the package. eg 'INFO' or 'DEBUG'
Data Version Control (DVC) is used to track data, parameters, metrics, and execution pipelines.
To use a DVC remote, see the the documentation.
DVC tracked data, metrics, and models are found in ./data
while scripts and parameters can be found in ./pipeline
. To execute pipeline steps, run dvc exp run <stage-name>
where stages are listed below:
- ogt_protein_classifier_data_prep
- ogt_protein_classifier_train_evaluate
Note that script execution is expected to occur with the top level as the current working directory, and paths are specified with respect to the repo top level.
Installable, importable code is found in ltml_utils
and should be installed given the above steps in the Environemnt section.
-data/ # Contains DVC tracked data, models, and metrics
-pipeline/ # Contains DVC tracked executable pipeline steps and parameters
-notebooks/ # notebooks for testing and decision making
-environment.yml # Conda dependancies
-docs/ # repository documentation
-l2tml_utils/ # python package