Skip to content

Augustus Training (without optimisation)

Gemy George Kaithakottil edited this page Apr 30, 2024 · 5 revisions

Augustus Training (without optimisation)

1. Setup output directory:

cd /home/train/Annotation_workshop/Augustus/Training
rm -rf Output && mkdir -p Output

2. Convert GFF3 into Genbank format:

gff2gbSmallDNA.pl Inputs/gold.gff Inputs/Athaliana_447_TAIR10.Chr3.fa 1000 Output/good_transcripts.full_models.gb --overlap

3. Count number of models:

grep -c '^LOCUS' Output/good_transcripts.full_models.gb

If over (>) 1000 models:

3.1 Randomly split the above set into a training and a test set

randomSplit.pl Output/good_transcripts.full_models.gb 1000

3.2 Setup links

cd Output && ln -s good_transcripts.full_models.gb.test good_transcripts.gb && cd ..

else:

3.1 Setup links

cd Output && ln -s good_transcripts.full_models.gb good_transcripts.gb && cd ..

4. Randomly split the above set into a training and a test set (~200 models)

randomSplit.pl Output/good_transcripts.gb 200

5. Copy Augustus species config directory to the current working directory to prevent accident modification of pre-trained Augustus species.

cp -a /opt/data/config Output/config

6. Create an Augustus metadata parameters files for your species.

new_species.pl --AUGUSTUS_CONFIG_PATH=$PWD/Output/config --species=ath_wksp

7. Turn on the prediction of untranslated (UTR) regions (UTR off to on)

vim Output/config/species/ath_wksp/ath_wksp_parameters.cfg

8. Make an initial Augustus training. This creates parameter files for exon, intron and intergenic region in the directory $AUGUSTUS_CONFIG_PATH/species/ath_wksp

etraining Output/good_transcripts.gb.train --AUGUSTUS_CONFIG_PATH=$PWD/Output/config --species=ath_wksp --stopCodonExcludedFromCDS=false
...
...
start codon frequencies: ATG(800)
# admissible start codons and their probabilities: ATG(1), CTG(0), TTG(0)
...
...

9. Check initial Augustus prediction accuracy (without optimisation) [< 2 minutes]

/usr/bin/time -v augustus --AUGUSTUS_CONFIG_PATH=$PWD/Output/config --species=ath_wksp Output/good_transcripts.gb.test > Output/augustus.initial_test.txt

10. Check the accuracy report

grep -A 22 Evaluation Output/augustus.initial_test.txt
...
...
*******      Evaluation of gene prediction     *******

---------------------------------------------\
                 | sensitivity | specificity |
---------------------------------------------|
nucleotide level |       0.978 |       0.799 |
---------------------------------------------/

----------------------------------------------------------------------------------------------------------\
           |  #pred |  #anno |      |    FP = false pos. |    FN = false neg. |             |             |
           | total/ | total/ |   TP |--------------------|--------------------| sensitivity | specificity |
           | unique | unique |      | part | ovlp | wrng | part | ovlp | wrng |             |             |
----------------------------------------------------------------------------------------------------------|
           |        |        |      |                395 |                175 |             |             |
exon level |   1590 |   1370 | 1195 | ------------------ | ------------------ |       0.872 |       0.752 |
           |   1590 |   1370 |      |  115 |    6 |  274 |  118 |    6 |   51 |             |             |
----------------------------------------------------------------------------------------------------------/

----------------------------------------------------------------------------\
transcript | #pred | #anno |   TP |   FP |   FN | sensitivity | specificity |
----------------------------------------------------------------------------|
gene level |   307 |   200 |  104 |  203 |   96 |        0.52 |       0.339 |
----------------------------------------------------------------------------/

What the above means is that:

Of the 200 genes 104 were predicted exactly  
  87.2% of the exons were predicted exactly  
  75.2% of the predicted exons were exactly as in the test set.

11. Backup the initial Augustus config directory

cp -a Output/config Output/ath_wksp_config
Clone this wiki locally