-
Notifications
You must be signed in to change notification settings - Fork 0
Augustus Training (without optimisation)
Gemy George Kaithakottil edited this page Apr 30, 2024
·
5 revisions
cd /home/train/Annotation_workshop/Augustus/Training
rm -rf Output && mkdir -p Output
gff2gbSmallDNA.pl Inputs/gold.gff Inputs/Athaliana_447_TAIR10.Chr3.fa 1000 Output/good_transcripts.full_models.gb --overlap
grep -c '^LOCUS' Output/good_transcripts.full_models.gb
If over (>) 1000 models:
randomSplit.pl Output/good_transcripts.full_models.gb 1000
cd Output && ln -s good_transcripts.full_models.gb.test good_transcripts.gb && cd ..
else:
cd Output && ln -s good_transcripts.full_models.gb good_transcripts.gb && cd ..
randomSplit.pl Output/good_transcripts.gb 200
5. Copy Augustus species config directory to the current working directory to prevent accident modification of pre-trained Augustus species.
cp -a /opt/data/config Output/config
new_species.pl --AUGUSTUS_CONFIG_PATH=$PWD/Output/config --species=ath_wksp
vim Output/config/species/ath_wksp/ath_wksp_parameters.cfg
8. Make an initial Augustus training. This creates parameter files for exon, intron and intergenic region in the directory $AUGUSTUS_CONFIG_PATH/species/ath_wksp
etraining Output/good_transcripts.gb.train --AUGUSTUS_CONFIG_PATH=$PWD/Output/config --species=ath_wksp --stopCodonExcludedFromCDS=false
...
...
start codon frequencies: ATG(800)
# admissible start codons and their probabilities: ATG(1), CTG(0), TTG(0)
...
...
/usr/bin/time -v augustus --AUGUSTUS_CONFIG_PATH=$PWD/Output/config --species=ath_wksp Output/good_transcripts.gb.test > Output/augustus.initial_test.txt
grep -A 22 Evaluation Output/augustus.initial_test.txt
...
...
******* Evaluation of gene prediction *******
---------------------------------------------\
| sensitivity | specificity |
---------------------------------------------|
nucleotide level | 0.978 | 0.799 |
---------------------------------------------/
----------------------------------------------------------------------------------------------------------\
| #pred | #anno | | FP = false pos. | FN = false neg. | | |
| total/ | total/ | TP |--------------------|--------------------| sensitivity | specificity |
| unique | unique | | part | ovlp | wrng | part | ovlp | wrng | | |
----------------------------------------------------------------------------------------------------------|
| | | | 395 | 175 | | |
exon level | 1590 | 1370 | 1195 | ------------------ | ------------------ | 0.872 | 0.752 |
| 1590 | 1370 | | 115 | 6 | 274 | 118 | 6 | 51 | | |
----------------------------------------------------------------------------------------------------------/
----------------------------------------------------------------------------\
transcript | #pred | #anno | TP | FP | FN | sensitivity | specificity |
----------------------------------------------------------------------------|
gene level | 307 | 200 | 104 | 203 | 96 | 0.52 | 0.339 |
----------------------------------------------------------------------------/
What the above means is that:
Of the 200 genes 104 were predicted exactly
87.2% of the exons were predicted exactly
75.2% of the predicted exons were exactly as in the test set.
cp -a Output/config Output/ath_wksp_config
- Workshop Wiki Home
- Transcript assembly commands
- Mikado commands
- REAT transcriptome commands
- Augustus
- Helixer commands
- GALBA commands
- BRAKER3 commands
- Minos commands
- EVidenceModeler (EVM) commands
- Annotation Web Apollo Browser
- Workshop data locations
- Software tools used
- Guacamole tips
- Troubleshooting