From 71048c1d5259c177620797b3a135a06acf0564d9 Mon Sep 17 00:00:00 2001 From: Goosang Yu Date: Thu, 25 Jan 2024 09:40:08 +0900 Subject: [PATCH] =?UTF-8?q?=F0=9F=93=9D=20Update=20Docs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 69 +++++-------------- docs/en/1_Predict/4_predict_pe.md | 58 +++++++++++----- docs/en/_README.md | 0 docs/en/_index.md | 9 --- .../contents/en_0_1_2_CRISPR_machanism.svg | 1 + .../contents/ko_0_1_2_CRISPR_machanism.svg | 1 - docs/en/getting_started.md | 25 +------ docs/en/introduction.md | 13 +--- mkdocs.yml | 10 ++- 9 files changed, 65 insertions(+), 121 deletions(-) delete mode 100644 docs/en/_README.md delete mode 100644 docs/en/_index.md create mode 100644 docs/en/assets/contents/en_0_1_2_CRISPR_machanism.svg delete mode 100644 docs/en/assets/contents/ko_0_1_2_CRISPR_machanism.svg diff --git a/README.md b/README.md index 1c06e1a..ffe7c3a 100644 --- a/README.md +++ b/README.md @@ -61,7 +61,7 @@ GenET was developed for anyone interested in the field of genome editing. Especi ## Example: Prediction of prime editing efficiency by DeepPrime ![](docs/en/assets/contents/en_1_4_1_DeepPrime_architecture.svg) -DeepPrime is a prediction model for evaluating prime editing guideRNAs (pegRNAs) that target specific target sites for prime editing ([Yu et al. Cell 2023](https://doi.org/10.1016/j.cell.2023.03.034)). DeepSpCas9 prediction score is calculated simultaneously and requires tensorflow (version >=2.6). DeepPrime was developed on pytorch. +DeepPrime is a prediction model for evaluating prime editing guideRNAs (pegRNAs) that target specific target sites for prime editing ([Yu et al. Cell 2023](https://doi.org/10.1016/j.cell.2023.03.034)). DeepSpCas9 prediction score is calculated simultaneously and requires tensorflow (version >=2.6). DeepPrime was developed on pytorch. For more details, please see the [documentation](https://goosang-yu.github.io/genet/). ```python from genet.predict import DeepPrime @@ -69,20 +69,20 @@ from genet.predict import DeepPrime seq_wt = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATGTCAACTGAAACCTTAAAGTGAGTATTTAATTGAGCTGAAGT' seq_ed = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGAACTATAACCTGCAAATGTCAACTGAAACCTTAAAGTGAGTATTTAATTGAGCTGAAGT' -pegrna = DeepPrime('Test', seq_wt, seq_ed, edit_type='sub', edit_len=1) +pegrna = DeepPrime('SampleName', seq_wt, seq_ed, edit_type='sub', edit_len=1) # check designed pegRNAs ->>> pegrna.features +>>> pegrna.features.head() ``` -| | ID | WT74_On | Edited74_On | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | type_sub | type_ins | type_del | Tm1 | Tm2 | Tm2new | Tm3 | Tm4 | TmD | nGCcnt1 | nGCcnt2 | nGCcnt3 | fGCcont1 | fGCcont2 | fGCcont3 | MFE3 | MFE4 | DeepSpCas9_score | -| - | ---- | -------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | -------- | -------- | -------- | -------- | ------- | ------- | --------- | -------- | --------- | ------- | ------- | ------- | -------- | -------- | -------- | ------ | ----- | ---------------- | -| 0 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxxxCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 7 | 35 | 42 | 34 | 1 | 1 | 1 | 0 | 0 | 16.19097 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 5 | 16 | 21 | 71.42857 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 | -| 1 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxxCCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 8 | 35 | 43 | 34 | 1 | 1 | 1 | 0 | 0 | 30.19954 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 6 | 16 | 22 | 75 | 45.71429 | 51.16279 | \-10.4 | \-0.6 | 45.96754 | -| 2 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 9 | 35 | 44 | 34 | 1 | 1 | 1 | 0 | 0 | 33.78395 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 6 | 16 | 22 | 66.66667 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 | -| 3 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxCACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 10 | 35 | 45 | 34 | 1 | 1 | 1 | 0 | 0 | 38.51415 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 70 | 45.71429 | 51.11111 | \-10.4 | \-0.6 | 45.96754 | -| 4 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 11 | 35 | 46 | 34 | 1 | 1 | 1 | 0 | 0 | 40.87411 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 63.63636 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 | -| 5 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 12 | 35 | 47 | 34 | 1 | 1 | 1 | 0 | 0 | 40.07098 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 58.33333 | 45.71429 | 48.93617 | \-10.4 | \-0.6 | 45.96754 | +| | ID | Spacer | RT-PBS | PBS_len | RTT_len | RT-PBS_len | Edit_pos | Edit_len | RHA_len | Target | ... | deltaTm_Tm4-Tm2 | GC_count_PBS | GC_count_RTT | GC_count_RT-PBS | GC_contents_PBS | GC_contents_RTT | GC_contents_RT-PBS | MFE_RT-PBS-polyT | MFE_Spacer | DeepSpCas9_score | +| --- | ---- | -------------------- | ------------------------------------------------- | ------- | ------- | ---------- | -------- | -------- | ------- | ------------------------------------------------- | --- | --------------- | ------------ | ------------ | --------------- | --------------- | --------------- | ------------------ | ---------------- | ---------- | ---------------- | +| 0 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 5 | 16 | 21 | 71.42857 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 | +| 1 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 6 | 16 | 22 | 75 | 45.71429 | 51.16279 | \-10.4 | \-0.6 | 45.96754 | +| 2 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 6 | 16 | 22 | 66.66667 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 | +| 3 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 7 | 16 | 23 | 70 | 45.71429 | 51.11111 | \-10.4 | \-0.6 | 45.96754 | +| 4 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 7 | 16 | 23 | 63.63636 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 | + Next, select model PE system and run DeepPrime ```python @@ -90,46 +90,13 @@ pe2max_output = pegrna.predict(pe_system='PE2max', cell_type='HEK293T') >>> pe2max_output.head() ``` -| | Target | Spacer | RT-PBS | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | PE2max_score | -| - | ------------------------------------------------- | ------------------------------ | ---------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | ------------ | -| 0 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | 0.904907 | -| 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | 2.377118 | -| 2 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | 2.613841 | -| 3 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | 3.643573 | -| 4 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | 3.770234 | - - -The previous function, ```pe_score()```, is still available for use. However, please note that this function will be deprecated in the near future. -```python -from genet import predict as prd - -# Place WT sequence and Edited sequence information, respectively. -# And select the edit type you want to make and put it in. -#Input seq: 60bp 5' context + 1bp center + 60bp 3' context (total 121bp) - -seq_wt = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATGTCAACTGAAACCTTAAAGTGAGTATTTAATTGAGCTGAAGT' -seq_ed = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGAACTATAACCTGCAAATGTCAACTGAAACCTTAAAGTGAGTATTTAATTGAGCTGAAGT' -alt_type = 'sub1' - -df_pe = prd.pe_score(seq_wt, seq_ed, alt_type) -df_pe.head() -``` -| | Target | Spacer | RT-PBS | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | PE2max_score | -| - | ------------------------------------------------- | ------------------------------ | ---------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | ------------ | -| 0 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | 0.904907 | -| 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | 2.377118 | -| 2 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | 2.613841 | -| 3 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | 3.643573 | -| 4 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | 3.770234 | - - - -It is also possible to predict other cell lines (A549, DLD1...) and PE systems (PE2max, PE4max...). - -```python -df_pe = prd.pe_score(seq_wt, seq_ed, alt_type, sID='MyGene', pe_system='PE4max', cell_type='A549') -``` - +| | ID | PE2max_score | Spacer | RT-PBS | PBS_len | RTT_len | RT-PBS_len | Edit_pos | Edit_len | RHA_len | Target | +| - | ---- | ------------ | -------------------- | ---------------------------------------------- | ------- | ------- | ---------- | -------- | -------- | ------- | ------------------------------------------------- | +| 0 | SampleName | 0.904387 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | +| 1 | SampleName | 2.375938 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | +| 2 | SampleName | 2.61238 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | +| 3 | SampleName | 3.641537 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | +| 4 | SampleName | 3.768321 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | Please send all comments and questions to gsyu93@gmail.com \ No newline at end of file diff --git a/docs/en/1_Predict/4_predict_pe.md b/docs/en/1_Predict/4_predict_pe.md index d6356fe..7ee3ab6 100644 --- a/docs/en/1_Predict/4_predict_pe.md +++ b/docs/en/1_Predict/4_predict_pe.md @@ -12,17 +12,17 @@ seq_ed = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGAACTAT pegrna = DeepPrime('Test', seq_wt, seq_ed, edit_type='sub', edit_len=1) # check designed pegRNAs ->>> pegrna.features +>>> pegrna.features.head() ``` -| | ID | WT74_On | Edited74_On | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | type_sub | type_ins | type_del | Tm1 | Tm2 | Tm2new | Tm3 | Tm4 | TmD | nGCcnt1 | nGCcnt2 | nGCcnt3 | fGCcont1 | fGCcont2 | fGCcont3 | MFE3 | MFE4 | DeepSpCas9_score | -| - | ---- | -------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | -------- | -------- | -------- | -------- | ------- | ------- | --------- | -------- | --------- | ------- | ------- | ------- | -------- | -------- | -------- | ------ | ----- | ---------------- | -| 0 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxxxCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 7 | 35 | 42 | 34 | 1 | 1 | 1 | 0 | 0 | 16.19097 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 5 | 16 | 21 | 71.42857 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 | -| 1 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxxCCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 8 | 35 | 43 | 34 | 1 | 1 | 1 | 0 | 0 | 30.19954 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 6 | 16 | 22 | 75 | 45.71429 | 51.16279 | \-10.4 | \-0.6 | 45.96754 | -| 2 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 9 | 35 | 44 | 34 | 1 | 1 | 1 | 0 | 0 | 33.78395 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 6 | 16 | 22 | 66.66667 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 | -| 3 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxCACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 10 | 35 | 45 | 34 | 1 | 1 | 1 | 0 | 0 | 38.51415 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 70 | 45.71429 | 51.11111 | \-10.4 | \-0.6 | 45.96754 | -| 4 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 11 | 35 | 46 | 34 | 1 | 1 | 1 | 0 | 0 | 40.87411 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 63.63636 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 | -| 5 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 12 | 35 | 47 | 34 | 1 | 1 | 1 | 0 | 0 | 40.07098 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 58.33333 | 45.71429 | 48.93617 | \-10.4 | \-0.6 | 45.96754 | +| | ID | Spacer | RT-PBS | PBS_len | RTT_len | RT-PBS_len | Edit_pos | Edit_len | RHA_len | Target | ... | deltaTm_Tm4-Tm2 | GC_count_PBS | GC_count_RTT | GC_count_RT-PBS | GC_contents_PBS | GC_contents_RTT | GC_contents_RT-PBS | MFE_RT-PBS-polyT | MFE_Spacer | DeepSpCas9_score | +| --- | ---- | -------------------- | ------------------------------------------------- | ------- | ------- | ---------- | -------- | -------- | ------- | ------------------------------------------------- | --- | --------------- | ------------ | ------------ | --------------- | --------------- | --------------- | ------------------ | ---------------- | ---------- | ---------------- | +| 0 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 5 | 16 | 21 | 71.42857 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 | +| 1 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 6 | 16 | 22 | 75 | 45.71429 | 51.16279 | \-10.4 | \-0.6 | 45.96754 | +| 2 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 6 | 16 | 22 | 66.66667 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 | +| 3 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 7 | 16 | 23 | 70 | 45.71429 | 51.11111 | \-10.4 | \-0.6 | 45.96754 | +| 4 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 7 | 16 | 23 | 63.63636 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 | + Next, select model PE system and run DeepPrime ```python @@ -30,15 +30,37 @@ pe2max_output = pegrna.predict(pe_system='PE2max', cell_type='HEK293T') >>> pe2max_output.head() ``` - -| | Target | Spacer | RT-PBS | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | PE2max_score | -| - | ------------------------------------------------- | ------------------------------ | ---------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | ------------ | -| 0 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | 0.904907 | -| 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | 2.377118 | -| 2 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | 2.613841 | -| 3 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | 3.643573 | -| 4 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | 3.770234 | - +| | ID | PE2max_score | Spacer | RT-PBS | PBS_len | RTT_len | RT-PBS_len | Edit_pos | Edit_len | RHA_len | Target | +| - | ---- | ------------ | -------------------- | ---------------------------------------------- | ------- | ------- | ---------- | -------- | -------- | ------- | ------------------------------------------------- | +| 0 | SampleName | 0.904387 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | +| 1 | SampleName | 2.375938 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | +| 2 | SampleName | 2.61238 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | +| 3 | SampleName | 3.641537 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | +| 4 | SampleName | 3.768321 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | + + +### Current available DeepPrime models: +| Cell type | PE system | Model | +| ---------- | ----------- | ----------------------------------------------------------------- | +| HEK293T | PE2 | DeepPrime_base | +| HEK293T | NRCH_PE2 | DeepPrime-FT: HEK293T, NRCH-PE2 with Optimized scaffold | +| HEK293T | NRCH_PE2max | DeepPrime-FT: HEK293T, NRCH-PE2max with Optimized scaffold | +| HEK293T | PE2 | DeepPrime-FT: HEK293T, PE2 with Conventional scaffold | +| HEK293T | PE2max-e | DeepPrime-FT: HEK293T, PE2max with Optimized scaffold and epegRNA | +| HEK293T | PE2max | DeepPrime-FT: HEK293T, PE2max with Optimized scaffold | +| HEK293T | PE4max-e | DeepPrime-FT: HEK293T, PE4max with Optimized scaffold and epegRNA | +| HEK293T | PE4max | DeepPrime-FT: HEK293T, PE4max with Optimized scaffold | +| A549 | PE2max-e | DeepPrime-FT: A549, PE2max with Optimized scaffold and epegRNA | +| A549 | PE2max | DeepPrime-FT: A549, PE2max with Optimized scaffold | +| A549 | PE4max-e | DeepPrime-FT: A549, PE4max with Optimized scaffold and epegRNA | +| A549 | PE4max | DeepPrime-FT: A549, PE4max with Optimized scaffold | +| DLD1 | NRCH_PE4max | DeepPrime-FT: DLD1, NRCH-PE4max with Optimized scaffold | +| DLD1 | PE2max | DeepPrime-FT: DLD1, PE2max with Optimized scaffold | +| DLD1 | PE4max | DeepPrime-FT: DLD1, PE4max with Optimized scaffold | +| HCT116 | PE2 | DeepPrime-FT: HCT116, PE2 with Optimized scaffold | +| HeLa | PE2max | DeepPrime-FT: HeLa, PE2max with Optimized scaffold | +| MDA-MB-231 | PE2 | DeepPrime-FT: MDA-MB-231, PE2 with Optimized scaffold | +| NIH3T3 | NRCH_PE4max | DeepPrime-FT: NIH3T3, NRCH-PE4max with Optimized scaffold | ### Get ClinVar record and DeepPrime score using GenET diff --git a/docs/en/_README.md b/docs/en/_README.md deleted file mode 100644 index e69de29..0000000 diff --git a/docs/en/_index.md b/docs/en/_index.md deleted file mode 100644 index efce46c..0000000 --- a/docs/en/_index.md +++ /dev/null @@ -1,9 +0,0 @@ - - -Welcome to GenET test page. \ No newline at end of file diff --git a/docs/en/assets/contents/en_0_1_2_CRISPR_machanism.svg b/docs/en/assets/contents/en_0_1_2_CRISPR_machanism.svg new file mode 100644 index 0000000..80e1e13 --- /dev/null +++ b/docs/en/assets/contents/en_0_1_2_CRISPR_machanism.svg @@ -0,0 +1 @@ +Cas9 protein binds to the target DNA with the guide RNA complex.NGGThe target genome is completely cleaved by the Cas9.NGGNGGNGGNHEJHDRRandom insertion or deletion (InDel).Precise genome editing \ No newline at end of file diff --git a/docs/en/assets/contents/ko_0_1_2_CRISPR_machanism.svg b/docs/en/assets/contents/ko_0_1_2_CRISPR_machanism.svg deleted file mode 100644 index 233633a..0000000 --- a/docs/en/assets/contents/ko_0_1_2_CRISPR_machanism.svg +++ /dev/null @@ -1 +0,0 @@ -카스9단백질가이드RNA복합체가표적유전자에결합NGG표적유전체가카스9유전자가위에의해완전절단NGGNGGNGGNHEJHDR무작위삽입또는삭제(InDel)정확한유전자교정 \ No newline at end of file diff --git a/docs/en/getting_started.md b/docs/en/getting_started.md index baba540..2e08152 100644 --- a/docs/en/getting_started.md +++ b/docs/en/getting_started.md @@ -25,29 +25,6 @@ import genet.utils ``` -## GenET에서 제공하는 기능들 -GenET에서 제공 (예정 포함)하는 기능들을 아래와 같다. - -| Module | Functions | Descriptions | Status | -| -------- | -------------- | --------------------------------------------------------------------- | ------ | -| Predict | SpCas9 | DeepSpCas9 모델 사용 | 사용가능 | -| Predict | SpCas9variants | DeepSpCas9variants 모델 사용 | 사용가능 | -| Predict | Base editor | DeepBE 모델 사용 | 개발예정 | -| Predict | Prime editor | DeepPrime 모델 사용 | 사용가능 | -| Design | KOLiD | Genome-wide KO library design | 개발예정 | -| Design | ReLiD | Gene regulation library design | 개발예정 | -| Design | CRISPRStop | Design gRNA for inducing premature stop codon using CBE | 개발예정 | -| Design | SynonymousPE | Design pegRNA containing additional synonymousmutation in RT template | 사용가능 | -| Database | GetGenome | NCBI database에서 genome data를 가져오는 기능 | 사용가능 | -| Database | GetGene | NCBI database에서 특정 gene의 정보를 가져오는 기능 | 개발예정 | -| Database | GenBankParser | GenBank file에서 원하는 정보들을 찾아내는 기능 | 개발예정 | -| Database | DFConverter | NCBI genbank file의 형태를 DataFrame으로 변환하는 기능 | 사용가능 | -| Analysis | SGE | Saturation genome editing 데이터를 분석하기 위한 기능 | 개발예정 | -| Analysis | UMItools | UMI 분석을 위한 함수 (from UMI-tools) | 사용가능 | -| Utils | request_file | HTTP protocol을 이용해 서버에서 원하는 파일을 다운로드 하는 | 사용가능 | -| Utils | SplitFastq | FASTQ 파일을 작은 크기들로 나눠주는 기능 | 사용가능 | - - ## Need help? Look at the issues section to find out about specific cases and others. @@ -57,7 +34,7 @@ If you still have doubts or cannot solve the problem, please consider opening an Please send all comments and questions to gsyu93@gmail.com -## GenET 인용하기 +## GenET Citation ``` @Manual {GenET, diff --git a/docs/en/introduction.md b/docs/en/introduction.md index 3d52e6b..56dbfe9 100644 --- a/docs/en/introduction.md +++ b/docs/en/introduction.md @@ -15,7 +15,7 @@ Gene editing involves the technology to modify specific genetic information at d CRISPR is a unique sequence structure discovered by scientists specializing in the study of bacterial genes. It consists of repeated sequences with specific intervals of spacer sequences. While many gene sequences were previously unknown, the regular repetition of sequences was uncommon. This structure, found not only in specific bacterial strains but also in numerous species, was later identified as the guide RNA (gRNA) that specifies the location for the action of a gene-editing protein called Cas9. -![CRISPR_machanism](assets/contents/ko_0_1_2_CRISPR_machanism.svg) +![CRISPR_machanism](assets/contents/en_0_1_2_CRISPR_machanism.svg) ## Various Types of CRISPR Systems @@ -44,15 +44,4 @@ Through GenET, various functionalities are available (or planned) for research o | Utils | SplitFastq | Function to split FASTQ files into smaller sizes | Available | -## GenET 인용하기 - -``` -@Manual {GenET, - title = {GenET: Python package for genome editing research}, - author = {Goosang Yu}, - year = {2024}, - month = {January}, - note = {GenET version 0.13.1}, - url = {https://github.com/Goosang-Yu/genet} - } ``` diff --git a/mkdocs.yml b/mkdocs.yml index 7612d3d..935b4d9 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -96,9 +96,7 @@ markdown_extensions: - pymdownx.mark - attr_list - pymdownx.emoji: - # emoji_index: !!python/name:materialx.emoji.twemoji emoji_index: !!python/name:material.extensions.emoji.twemoji - # emoji_generator: !!python/name:materialx.emoji.to_svg emoji_generator: !!python/name:material.extensions.emoji.to_svg @@ -119,20 +117,20 @@ nav: - Prime editor: 1_Predict/4_predict_pe.md - Design: - - Introduction: 2_Design/1_Design_intro.md + - Genet design module: 2_Design/1_Design_intro.md - Synonymous PE: 2_Design/2_SynonymousPE.md - Database: - - Introduction: 3_Database/1_database_intro.md + - Genet database module: 3_Database/1_database_intro.md - Background : 3_Database/2_Genome_resource_background.md - Metadata : 3_Database/3_Metadata from databases.md - Download : 3_Database/4_Download_files.md - Analysis: - - Introduction: 4_Analysis/1_analysis_intro.md + - Genet Analysis module: 4_Analysis/1_analysis_intro.md - Utils: - - Introduction: 5_Utils/1_utils_intro.md + - Genet Utils module: 5_Utils/1_utils_intro.md - Download from server: 5_Utils/2_download_files.md - Application note: