diff --git a/README.md b/README.md
index 1c06e1a..ffe7c3a 100644
--- a/README.md
+++ b/README.md
@@ -61,7 +61,7 @@ GenET was developed for anyone interested in the field of genome editing. Especi
## Example: Prediction of prime editing efficiency by DeepPrime
![](docs/en/assets/contents/en_1_4_1_DeepPrime_architecture.svg)
-DeepPrime is a prediction model for evaluating prime editing guideRNAs (pegRNAs) that target specific target sites for prime editing ([Yu et al. Cell 2023](https://doi.org/10.1016/j.cell.2023.03.034)). DeepSpCas9 prediction score is calculated simultaneously and requires tensorflow (version >=2.6). DeepPrime was developed on pytorch.
+DeepPrime is a prediction model for evaluating prime editing guideRNAs (pegRNAs) that target specific target sites for prime editing ([Yu et al. Cell 2023](https://doi.org/10.1016/j.cell.2023.03.034)). DeepSpCas9 prediction score is calculated simultaneously and requires tensorflow (version >=2.6). DeepPrime was developed on pytorch. For more details, please see the [documentation](https://goosang-yu.github.io/genet/).
```python
from genet.predict import DeepPrime
@@ -69,20 +69,20 @@ from genet.predict import DeepPrime
seq_wt = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATGTCAACTGAAACCTTAAAGTGAGTATTTAATTGAGCTGAAGT'
seq_ed = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGAACTATAACCTGCAAATGTCAACTGAAACCTTAAAGTGAGTATTTAATTGAGCTGAAGT'
-pegrna = DeepPrime('Test', seq_wt, seq_ed, edit_type='sub', edit_len=1)
+pegrna = DeepPrime('SampleName', seq_wt, seq_ed, edit_type='sub', edit_len=1)
# check designed pegRNAs
->>> pegrna.features
+>>> pegrna.features.head()
```
-| | ID | WT74_On | Edited74_On | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | type_sub | type_ins | type_del | Tm1 | Tm2 | Tm2new | Tm3 | Tm4 | TmD | nGCcnt1 | nGCcnt2 | nGCcnt3 | fGCcont1 | fGCcont2 | fGCcont3 | MFE3 | MFE4 | DeepSpCas9_score |
-| - | ---- | -------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | -------- | -------- | -------- | -------- | ------- | ------- | --------- | -------- | --------- | ------- | ------- | ------- | -------- | -------- | -------- | ------ | ----- | ---------------- |
-| 0 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxxxCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 7 | 35 | 42 | 34 | 1 | 1 | 1 | 0 | 0 | 16.19097 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 5 | 16 | 21 | 71.42857 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
-| 1 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxxCCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 8 | 35 | 43 | 34 | 1 | 1 | 1 | 0 | 0 | 30.19954 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 6 | 16 | 22 | 75 | 45.71429 | 51.16279 | \-10.4 | \-0.6 | 45.96754 |
-| 2 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 9 | 35 | 44 | 34 | 1 | 1 | 1 | 0 | 0 | 33.78395 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 6 | 16 | 22 | 66.66667 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
-| 3 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxCACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 10 | 35 | 45 | 34 | 1 | 1 | 1 | 0 | 0 | 38.51415 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 70 | 45.71429 | 51.11111 | \-10.4 | \-0.6 | 45.96754 |
-| 4 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 11 | 35 | 46 | 34 | 1 | 1 | 1 | 0 | 0 | 40.87411 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 63.63636 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
-| 5 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 12 | 35 | 47 | 34 | 1 | 1 | 1 | 0 | 0 | 40.07098 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 58.33333 | 45.71429 | 48.93617 | \-10.4 | \-0.6 | 45.96754 |
+| | ID | Spacer | RT-PBS | PBS_len | RTT_len | RT-PBS_len | Edit_pos | Edit_len | RHA_len | Target | ... | deltaTm_Tm4-Tm2 | GC_count_PBS | GC_count_RTT | GC_count_RT-PBS | GC_contents_PBS | GC_contents_RTT | GC_contents_RT-PBS | MFE_RT-PBS-polyT | MFE_Spacer | DeepSpCas9_score |
+| --- | ---- | -------------------- | ------------------------------------------------- | ------- | ------- | ---------- | -------- | -------- | ------- | ------------------------------------------------- | --- | --------------- | ------------ | ------------ | --------------- | --------------- | --------------- | ------------------ | ---------------- | ---------- | ---------------- |
+| 0 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 5 | 16 | 21 | 71.42857 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
+| 1 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 6 | 16 | 22 | 75 | 45.71429 | 51.16279 | \-10.4 | \-0.6 | 45.96754 |
+| 2 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 6 | 16 | 22 | 66.66667 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
+| 3 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 7 | 16 | 23 | 70 | 45.71429 | 51.11111 | \-10.4 | \-0.6 | 45.96754 |
+| 4 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 7 | 16 | 23 | 63.63636 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
+
Next, select model PE system and run DeepPrime
```python
@@ -90,46 +90,13 @@ pe2max_output = pegrna.predict(pe_system='PE2max', cell_type='HEK293T')
>>> pe2max_output.head()
```
-| | Target | Spacer | RT-PBS | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | PE2max_score |
-| - | ------------------------------------------------- | ------------------------------ | ---------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | ------------ |
-| 0 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | 0.904907 |
-| 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | 2.377118 |
-| 2 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | 2.613841 |
-| 3 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | 3.643573 |
-| 4 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | 3.770234 |
-
-
-The previous function, ```pe_score()```, is still available for use. However, please note that this function will be deprecated in the near future.
-```python
-from genet import predict as prd
-
-# Place WT sequence and Edited sequence information, respectively.
-# And select the edit type you want to make and put it in.
-#Input seq: 60bp 5' context + 1bp center + 60bp 3' context (total 121bp)
-
-seq_wt = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATGTCAACTGAAACCTTAAAGTGAGTATTTAATTGAGCTGAAGT'
-seq_ed = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGAACTATAACCTGCAAATGTCAACTGAAACCTTAAAGTGAGTATTTAATTGAGCTGAAGT'
-alt_type = 'sub1'
-
-df_pe = prd.pe_score(seq_wt, seq_ed, alt_type)
-df_pe.head()
-```
-| | Target | Spacer | RT-PBS | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | PE2max_score |
-| - | ------------------------------------------------- | ------------------------------ | ---------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | ------------ |
-| 0 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | 0.904907 |
-| 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | 2.377118 |
-| 2 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | 2.613841 |
-| 3 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | 3.643573 |
-| 4 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | 3.770234 |
-
-
-
-It is also possible to predict other cell lines (A549, DLD1...) and PE systems (PE2max, PE4max...).
-
-```python
-df_pe = prd.pe_score(seq_wt, seq_ed, alt_type, sID='MyGene', pe_system='PE4max', cell_type='A549')
-```
-
+| | ID | PE2max_score | Spacer | RT-PBS | PBS_len | RTT_len | RT-PBS_len | Edit_pos | Edit_len | RHA_len | Target |
+| - | ---- | ------------ | -------------------- | ---------------------------------------------- | ------- | ------- | ---------- | -------- | -------- | ------- | ------------------------------------------------- |
+| 0 | SampleName | 0.904387 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... |
+| 1 | SampleName | 2.375938 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... |
+| 2 | SampleName | 2.61238 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... |
+| 3 | SampleName | 3.641537 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... |
+| 4 | SampleName | 3.768321 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... |
Please send all comments and questions to gsyu93@gmail.com
\ No newline at end of file
diff --git a/docs/en/1_Predict/4_predict_pe.md b/docs/en/1_Predict/4_predict_pe.md
index d6356fe..7ee3ab6 100644
--- a/docs/en/1_Predict/4_predict_pe.md
+++ b/docs/en/1_Predict/4_predict_pe.md
@@ -12,17 +12,17 @@ seq_ed = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGAACTAT
pegrna = DeepPrime('Test', seq_wt, seq_ed, edit_type='sub', edit_len=1)
# check designed pegRNAs
->>> pegrna.features
+>>> pegrna.features.head()
```
-| | ID | WT74_On | Edited74_On | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | type_sub | type_ins | type_del | Tm1 | Tm2 | Tm2new | Tm3 | Tm4 | TmD | nGCcnt1 | nGCcnt2 | nGCcnt3 | fGCcont1 | fGCcont2 | fGCcont3 | MFE3 | MFE4 | DeepSpCas9_score |
-| - | ---- | -------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | -------- | -------- | -------- | -------- | ------- | ------- | --------- | -------- | --------- | ------- | ------- | ------- | -------- | -------- | -------- | ------ | ----- | ---------------- |
-| 0 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxxxCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 7 | 35 | 42 | 34 | 1 | 1 | 1 | 0 | 0 | 16.19097 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 5 | 16 | 21 | 71.42857 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
-| 1 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxxCCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 8 | 35 | 43 | 34 | 1 | 1 | 1 | 0 | 0 | 30.19954 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 6 | 16 | 22 | 75 | 45.71429 | 51.16279 | \-10.4 | \-0.6 | 45.96754 |
-| 2 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 9 | 35 | 44 | 34 | 1 | 1 | 1 | 0 | 0 | 33.78395 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 6 | 16 | 22 | 66.66667 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
-| 3 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxCACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 10 | 35 | 45 | 34 | 1 | 1 | 1 | 0 | 0 | 38.51415 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 70 | 45.71429 | 51.11111 | \-10.4 | \-0.6 | 45.96754 |
-| 4 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 11 | 35 | 46 | 34 | 1 | 1 | 1 | 0 | 0 | 40.87411 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 63.63636 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
-| 5 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 12 | 35 | 47 | 34 | 1 | 1 | 1 | 0 | 0 | 40.07098 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 58.33333 | 45.71429 | 48.93617 | \-10.4 | \-0.6 | 45.96754 |
+| | ID | Spacer | RT-PBS | PBS_len | RTT_len | RT-PBS_len | Edit_pos | Edit_len | RHA_len | Target | ... | deltaTm_Tm4-Tm2 | GC_count_PBS | GC_count_RTT | GC_count_RT-PBS | GC_contents_PBS | GC_contents_RTT | GC_contents_RT-PBS | MFE_RT-PBS-polyT | MFE_Spacer | DeepSpCas9_score |
+| --- | ---- | -------------------- | ------------------------------------------------- | ------- | ------- | ---------- | -------- | -------- | ------- | ------------------------------------------------- | --- | --------------- | ------------ | ------------ | --------------- | --------------- | --------------- | ------------------ | ---------------- | ---------- | ---------------- |
+| 0 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 5 | 16 | 21 | 71.42857 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
+| 1 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 6 | 16 | 22 | 75 | 45.71429 | 51.16279 | \-10.4 | \-0.6 | 45.96754 |
+| 2 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 6 | 16 | 22 | 66.66667 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
+| 3 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 7 | 16 | 23 | 70 | 45.71429 | 51.11111 | \-10.4 | \-0.6 | 45.96754 |
+| 4 | SampleName | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ... | \-340.105 | 7 | 16 | 23 | 63.63636 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
+
Next, select model PE system and run DeepPrime
```python
@@ -30,15 +30,37 @@ pe2max_output = pegrna.predict(pe_system='PE2max', cell_type='HEK293T')
>>> pe2max_output.head()
```
-
-| | Target | Spacer | RT-PBS | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | PE2max_score |
-| - | ------------------------------------------------- | ------------------------------ | ---------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | ------------ |
-| 0 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | 0.904907 |
-| 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | 2.377118 |
-| 2 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | 2.613841 |
-| 3 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | 3.643573 |
-| 4 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | 3.770234 |
-
+| | ID | PE2max_score | Spacer | RT-PBS | PBS_len | RTT_len | RT-PBS_len | Edit_pos | Edit_len | RHA_len | Target |
+| - | ---- | ------------ | -------------------- | ---------------------------------------------- | ------- | ------- | ---------- | -------- | -------- | ------- | ------------------------------------------------- |
+| 0 | SampleName | 0.904387 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... |
+| 1 | SampleName | 2.375938 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... |
+| 2 | SampleName | 2.61238 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... |
+| 3 | SampleName | 3.641537 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... |
+| 4 | SampleName | 3.768321 | AAGACAACACCCTTGCCTTG | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... |
+
+
+### Current available DeepPrime models:
+| Cell type | PE system | Model |
+| ---------- | ----------- | ----------------------------------------------------------------- |
+| HEK293T | PE2 | DeepPrime_base |
+| HEK293T | NRCH_PE2 | DeepPrime-FT: HEK293T, NRCH-PE2 with Optimized scaffold |
+| HEK293T | NRCH_PE2max | DeepPrime-FT: HEK293T, NRCH-PE2max with Optimized scaffold |
+| HEK293T | PE2 | DeepPrime-FT: HEK293T, PE2 with Conventional scaffold |
+| HEK293T | PE2max-e | DeepPrime-FT: HEK293T, PE2max with Optimized scaffold and epegRNA |
+| HEK293T | PE2max | DeepPrime-FT: HEK293T, PE2max with Optimized scaffold |
+| HEK293T | PE4max-e | DeepPrime-FT: HEK293T, PE4max with Optimized scaffold and epegRNA |
+| HEK293T | PE4max | DeepPrime-FT: HEK293T, PE4max with Optimized scaffold |
+| A549 | PE2max-e | DeepPrime-FT: A549, PE2max with Optimized scaffold and epegRNA |
+| A549 | PE2max | DeepPrime-FT: A549, PE2max with Optimized scaffold |
+| A549 | PE4max-e | DeepPrime-FT: A549, PE4max with Optimized scaffold and epegRNA |
+| A549 | PE4max | DeepPrime-FT: A549, PE4max with Optimized scaffold |
+| DLD1 | NRCH_PE4max | DeepPrime-FT: DLD1, NRCH-PE4max with Optimized scaffold |
+| DLD1 | PE2max | DeepPrime-FT: DLD1, PE2max with Optimized scaffold |
+| DLD1 | PE4max | DeepPrime-FT: DLD1, PE4max with Optimized scaffold |
+| HCT116 | PE2 | DeepPrime-FT: HCT116, PE2 with Optimized scaffold |
+| HeLa | PE2max | DeepPrime-FT: HeLa, PE2max with Optimized scaffold |
+| MDA-MB-231 | PE2 | DeepPrime-FT: MDA-MB-231, PE2 with Optimized scaffold |
+| NIH3T3 | NRCH_PE4max | DeepPrime-FT: NIH3T3, NRCH-PE4max with Optimized scaffold |
### Get ClinVar record and DeepPrime score using GenET
diff --git a/docs/en/_README.md b/docs/en/_README.md
deleted file mode 100644
index e69de29..0000000
diff --git a/docs/en/_index.md b/docs/en/_index.md
deleted file mode 100644
index efce46c..0000000
--- a/docs/en/_index.md
+++ /dev/null
@@ -1,9 +0,0 @@
-
-
-Welcome to GenET test page.
\ No newline at end of file
diff --git a/docs/en/assets/contents/en_0_1_2_CRISPR_machanism.svg b/docs/en/assets/contents/en_0_1_2_CRISPR_machanism.svg
new file mode 100644
index 0000000..80e1e13
--- /dev/null
+++ b/docs/en/assets/contents/en_0_1_2_CRISPR_machanism.svg
@@ -0,0 +1 @@
+
\ No newline at end of file
diff --git a/docs/en/assets/contents/ko_0_1_2_CRISPR_machanism.svg b/docs/en/assets/contents/ko_0_1_2_CRISPR_machanism.svg
deleted file mode 100644
index 233633a..0000000
--- a/docs/en/assets/contents/ko_0_1_2_CRISPR_machanism.svg
+++ /dev/null
@@ -1 +0,0 @@
-
\ No newline at end of file
diff --git a/docs/en/getting_started.md b/docs/en/getting_started.md
index baba540..2e08152 100644
--- a/docs/en/getting_started.md
+++ b/docs/en/getting_started.md
@@ -25,29 +25,6 @@ import genet.utils
```
-## GenET에서 제공하는 기능들
-GenET에서 제공 (예정 포함)하는 기능들을 아래와 같다.
-
-| Module | Functions | Descriptions | Status |
-| -------- | -------------- | --------------------------------------------------------------------- | ------ |
-| Predict | SpCas9 | DeepSpCas9 모델 사용 | 사용가능 |
-| Predict | SpCas9variants | DeepSpCas9variants 모델 사용 | 사용가능 |
-| Predict | Base editor | DeepBE 모델 사용 | 개발예정 |
-| Predict | Prime editor | DeepPrime 모델 사용 | 사용가능 |
-| Design | KOLiD | Genome-wide KO library design | 개발예정 |
-| Design | ReLiD | Gene regulation library design | 개발예정 |
-| Design | CRISPRStop | Design gRNA for inducing premature stop codon using CBE | 개발예정 |
-| Design | SynonymousPE | Design pegRNA containing additional synonymousmutation in RT template | 사용가능 |
-| Database | GetGenome | NCBI database에서 genome data를 가져오는 기능 | 사용가능 |
-| Database | GetGene | NCBI database에서 특정 gene의 정보를 가져오는 기능 | 개발예정 |
-| Database | GenBankParser | GenBank file에서 원하는 정보들을 찾아내는 기능 | 개발예정 |
-| Database | DFConverter | NCBI genbank file의 형태를 DataFrame으로 변환하는 기능 | 사용가능 |
-| Analysis | SGE | Saturation genome editing 데이터를 분석하기 위한 기능 | 개발예정 |
-| Analysis | UMItools | UMI 분석을 위한 함수 (from UMI-tools) | 사용가능 |
-| Utils | request_file | HTTP protocol을 이용해 서버에서 원하는 파일을 다운로드 하는 | 사용가능 |
-| Utils | SplitFastq | FASTQ 파일을 작은 크기들로 나눠주는 기능 | 사용가능 |
-
-
## Need help?
Look at the issues section to find out about specific cases and others.
@@ -57,7 +34,7 @@ If you still have doubts or cannot solve the problem, please consider opening an
Please send all comments and questions to gsyu93@gmail.com
-## GenET 인용하기
+## GenET Citation
```
@Manual {GenET,
diff --git a/docs/en/introduction.md b/docs/en/introduction.md
index 3d52e6b..56dbfe9 100644
--- a/docs/en/introduction.md
+++ b/docs/en/introduction.md
@@ -15,7 +15,7 @@ Gene editing involves the technology to modify specific genetic information at d
CRISPR is a unique sequence structure discovered by scientists specializing in the study of bacterial genes. It consists of repeated sequences with specific intervals of spacer sequences. While many gene sequences were previously unknown, the regular repetition of sequences was uncommon. This structure, found not only in specific bacterial strains but also in numerous species, was later identified as the guide RNA (gRNA) that specifies the location for the action of a gene-editing protein called Cas9.
-![CRISPR_machanism](assets/contents/ko_0_1_2_CRISPR_machanism.svg)
+![CRISPR_machanism](assets/contents/en_0_1_2_CRISPR_machanism.svg)
## Various Types of CRISPR Systems
@@ -44,15 +44,4 @@ Through GenET, various functionalities are available (or planned) for research o
| Utils | SplitFastq | Function to split FASTQ files into smaller sizes | Available |
-## GenET 인용하기
-
-```
-@Manual {GenET,
- title = {GenET: Python package for genome editing research},
- author = {Goosang Yu},
- year = {2024},
- month = {January},
- note = {GenET version 0.13.1},
- url = {https://github.com/Goosang-Yu/genet}
- }
```
diff --git a/mkdocs.yml b/mkdocs.yml
index 7612d3d..935b4d9 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -96,9 +96,7 @@ markdown_extensions:
- pymdownx.mark
- attr_list
- pymdownx.emoji:
- # emoji_index: !!python/name:materialx.emoji.twemoji
emoji_index: !!python/name:material.extensions.emoji.twemoji
- # emoji_generator: !!python/name:materialx.emoji.to_svg
emoji_generator: !!python/name:material.extensions.emoji.to_svg
@@ -119,20 +117,20 @@ nav:
- Prime editor: 1_Predict/4_predict_pe.md
- Design:
- - Introduction: 2_Design/1_Design_intro.md
+ - Genet design module: 2_Design/1_Design_intro.md
- Synonymous PE: 2_Design/2_SynonymousPE.md
- Database:
- - Introduction: 3_Database/1_database_intro.md
+ - Genet database module: 3_Database/1_database_intro.md
- Background : 3_Database/2_Genome_resource_background.md
- Metadata : 3_Database/3_Metadata from databases.md
- Download : 3_Database/4_Download_files.md
- Analysis:
- - Introduction: 4_Analysis/1_analysis_intro.md
+ - Genet Analysis module: 4_Analysis/1_analysis_intro.md
- Utils:
- - Introduction: 5_Utils/1_utils_intro.md
+ - Genet Utils module: 5_Utils/1_utils_intro.md
- Download from server: 5_Utils/2_download_files.md
- Application note: