Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
EvanKomp committed Mar 7, 2024
2 parents 38e089b + 740ebe1 commit 619d1bc
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 30 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,16 @@ This can be extremely expensive and requires multiple GPUs. As of Jan 2024, only

__Enable Step 4__. Configure the estimator to use, the number of trials in exploring the library, the type of sampler for choosing mutations to testm etc. This outputs a file "optimize_results.json" which contains the sequence, score, and predicted structure file of the best sequence found. It also outputs "trials.csv" which is a dataframe of all of the trials executed.

## Manuscript plots
Some of the figures in the manuscript were created during the main pipeline steps, while others were created in notebooks.
- Figure 1: located at `./analysis/figures/data_redundancy.png` created in notebook `./analysis/dataset_stats.ipynb`
- Figure 2: located at `./analysis/figures/AA_propensities.png` created in notebook `./analysis/probe_model.ipynb`
- Figure 3: located at `./analysis/figures/disulfide_logits.png` created in notebook `./analysis/probe_model.ipynb`
- Figure 4: located at `./analysis/figures/estimated_shift_thermo_gen.png` created in notebook `./analysis/dataset_stats.ipynb`
- Figure 5: located at `./analysis/figures/mAF_scores.png` created in notebook `./analysis/compare_estimated_stability.ipynb`
- Figure 6: see repo https://zenodo.org/records/10625583
- Figure 7: located at './data/plots/exp_tm_scores.png' created in script `./scripts/zero_shot_experiment.py`

## License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE) file for details

Expand Down
31 changes: 1 addition & 30 deletions scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,35 +4,6 @@ Scripts should be executed by running DVC stages from the root of the NOMELT rep

These are the scripts used to prepare data, train the model, evaluate the model, use it to engineer ENH1, and run it zero-shot on experimental data.

## Script Name: [Script_Name]

### Description
- Brief description of what the script does.

### Input Files
- [Input_File_1]: Description of input file 1.
- [Input_File_2]: Description of input file 2.
- ...

### Output Files
- [Output_File_1]: Description of output file 1.
- [Output_File_2]: Description of output file 2.
- ...

### Plots Generated
- [Plot_1]: Description of plot 1.
- [Plot_2]: Description of plot 2.
- ...

### Parameters
- [Parameter_1]: Current value - [Value], Description.
- [Parameter_2]: Current value - [Value], Description.
- ...

### Additional Notes
- Any other relevant information about the script.


## Script Name: prepare_data.py

### Description
Expand Down Expand Up @@ -405,4 +376,4 @@ These are not part of the linear pipeline to train, evaluated, and use the model
- `./scripts/proof_of_principle/check_training_set_for_case_studies.py`: Script to check for specific proteins in the training dataset.

### Output Files
- `./data/enh/training_data_homologs.json`: JSON file containing the e-values of any hits found in the training set, relevant to the case study proteins.
- `./data/enh/training_data_homologs.json`: JSON file containing the e-values of any hits found in the training set, relevant to the case study proteins.

0 comments on commit 619d1bc

Please sign in to comment.