DeePhafier: A Phage Lifestyle Classifier using a Multilayer self-attention neural network combining protein information

Version 1.1
Authors: Yan Miao, Zhenyuan Sun, Chen Lin, et.al.
Maintainer: Yan Miao [email protected]

Description

Bacteriophages is known as viruses that infect bacterial cells. They are the most entities on earth and play important roles in microbiome. According to the phage lifestyle, phages can be divided into the virulent phages and the temperate phages. Classifying virulent and temperate phages is crucial for further understanding of phage-host interactions. Although there are several methods designed for phage lifestyle classification, they either consider sequence features or gene features merely, leading to low accuracy. A new computational method, DeePhafier, is proposed to improve classification performance on phage lifestyle. Built by several multilayer self-attention neural network, a global self-attention neural network and being combined by protein features of PSSM matrix, DeePhafier improves the classification accuracy and outperforms two benchmark methods.

Dependencies

To utilize DeePhafier, numpy, h5py, TensorFlow, Keras are needed to be previously installed. Some other packages are also needed to make sure the code running correctly such as "os", "ast", etc.

In convenience, you can download Anaconda from https://repo.anaconda.com/archive/, where contains most of needed packages. If there are still some special packages that are missed when running, you can use "pip install" to install the specific packages.

To install tensorflow, start "cmd.exe" and enter

pip install tensorflow

To insatll Keras, start "cmd.exe" and enter

pip install Keras

To insatll h5py, start "cmd.exe" and enter

pip install h5py

Our codes are all edited by Python 3.6.7 with TensorFlow 1.4.0 and Keras 2.1.3.

Usage

It is simple to use DeePhafier for users' database.
Before testing, users' own query contigs should be preprocessed to an available format using preprocessing.py and be condon embedded by embedding.py as Seq2Vec(https://github.com/crazyinter/Seq2Vec). The embedding matrix is the same as Virtifier from https://github.com/crazyinter/Seq2Vec/blob/master/supplementary_files/embedding_matrix.csv. The query sequences < 9000bp in the test file should be zero-padded to 9000bp. The query sequences > 9000bp in the test file should be cut-off to 9000bp.

To make a prediction by "test.py", users' own query contigs should be edited into a ".csv" file. Then run "test.py" or copy the code into jupyter notebook and run seperately.

Citation

The manuscript is under review.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
model.h5		model.h5
test.fasta		test.fasta
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeePhafier: A Phage Lifestyle Classifier using a Multilayer self-attention neural network combining protein information

Description

Dependencies

Usage

Citation

About

Releases

Packages

Languages

crazyinter/DeePhafier

Folders and files

Latest commit

History

Repository files navigation

DeePhafier: A Phage Lifestyle Classifier using a Multilayer self-attention neural network combining protein information

Description

Dependencies

Usage

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages