Merge branch 'main' of https://github.com/BeckResearchLab/nomelt

BeckResearchLab · Mar 7, 2024 · 619d1bc · 619d1bc
2 parents 38e089b + 740ebe1
commit 619d1bc
Show file tree

Hide file tree

Showing 2 changed files with 11 additions and 30 deletions.
diff --git a/README.md b/README.md
@@ -100,6 +100,16 @@ This can be extremely expensive and requires multiple GPUs. As of Jan 2024, only
 
 __Enable Step 4__. Configure the estimator to use, the number of trials in exploring the library, the type of sampler for choosing mutations to testm etc. This outputs a file "optimize_results.json" which contains the sequence, score, and predicted structure file of the best sequence found. It also outputs "trials.csv" which is a dataframe of all of the trials executed. 
 
+## Manuscript plots
+Some of the figures in the manuscript were created during the main pipeline steps, while others were created in notebooks.
+- Figure 1: located at `./analysis/figures/data_redundancy.png` created in notebook `./analysis/dataset_stats.ipynb`
+- Figure 2: located at `./analysis/figures/AA_propensities.png` created in notebook `./analysis/probe_model.ipynb`
+- Figure 3: located at `./analysis/figures/disulfide_logits.png` created in notebook `./analysis/probe_model.ipynb`
+- Figure 4: located at `./analysis/figures/estimated_shift_thermo_gen.png` created in notebook `./analysis/dataset_stats.ipynb`
+- Figure 5: located at `./analysis/figures/mAF_scores.png` created in notebook `./analysis/compare_estimated_stability.ipynb`
+- Figure 6: see repo https://zenodo.org/records/10625583
+- Figure 7: located at './data/plots/exp_tm_scores.png' created in script `./scripts/zero_shot_experiment.py`
+
 ## License
 This project is licensed under the MIT License - see the [LICENSE.md](LICENSE) file for details
 

diff --git a/scripts/README.md b/scripts/README.md
@@ -4,35 +4,6 @@ Scripts should be executed by running DVC stages from the root of the NOMELT rep
 
 These are the scripts used to prepare data, train the model, evaluate the model, use it to engineer ENH1, and run it zero-shot on experimental data.
 
-## Script Name: [Script_Name]
-
-### Description
-- Brief description of what the script does.
-
-### Input Files
-- [Input_File_1]: Description of input file 1.
-- [Input_File_2]: Description of input file 2.
-- ...
-
-### Output Files
-- [Output_File_1]: Description of output file 1.
-- [Output_File_2]: Description of output file 2.
-- ...
-
-### Plots Generated
-- [Plot_1]: Description of plot 1.
-- [Plot_2]: Description of plot 2.
-- ...
-
-### Parameters
-- [Parameter_1]: Current value - [Value], Description.
-- [Parameter_2]: Current value - [Value], Description.
-- ...
-
-### Additional Notes
-- Any other relevant information about the script.
-
-
 ## Script Name: prepare_data.py
 
 ### Description
@@ -405,4 +376,4 @@ These are not part of the linear pipeline to train, evaluated, and use the model
 - `./scripts/proof_of_principle/check_training_set_for_case_studies.py`: Script to check for specific proteins in the training dataset.
 
 ### Output Files
-- `./data/enh/training_data_homologs.json`: JSON file containing the e-values of any hits found in the training set, relevant to the case study proteins.
+- `./data/enh/training_data_homologs.json`: JSON file containing the e-values of any hits found in the training set, relevant to the case study proteins.