We introduce DeepGene, a model leveraging Pan-genome and Minigraph representations to encompass the broad diversity of genetic language. DeepGene employs the rotary position embedding to improve the length extrapolation in various genetic analysis tasks. On the 28 tasks in Genome Understanding Evaluation, DeepGene reaches the top position in 9 tasks, second in 5, and achieves the overall best score. DeepGene outperforms other cutting-edge models for its compact model size and its superior efficiency in processing sequences of varying lengths.
Preprint available at bioRxiv.
Please see PanGeneGraphTrans/requirements.txt
.
Download Minigraph file (.rgfa) and place it in the dataPretreatment
folder.
Please see dataPretreatment
and PanGeneGraphTrans/dataset.py
.
Please see PanGeneGraphTrans/pretrain.py
.
Download pretrained model.
Please see PanGeneGraphTrans/finetune.py
.
Download prom_5000 data and place it in the \data\LPD\promoter_prediction\prom_5000
folder.