Pipeline writen in python to get a given protein's residues conservation JSD scores
The pipeline is the following :
- First the user gives a protein name, and the code will look in Uniprot for an ID corresponding to this protein
- Then a Blastp is run on the sequence of this protein ID using UniProtKB/Swiss-Prot database.
- A multiple sequence alignment is then proceeded using MAFFT.
- Finaly the residues conservation scores are calculated using the JS Divergence method.
- Linux : The code has only been tested on Ubuntu.
The code has only been tested with firefox.
In order to be able to run this code of course you need to have python3 but also some python packages :
PyPi installation :
$pip install selenium
$pip install argparse
$pip install pandas
$pip install biopython
Conda installation :
$conda create -n resconsscore python
$source activate resconsscore
$conda install -c conda-forge selenium
$conda install -c conda-forge argparse
$conda install pandas
conda install -c conda-forge biopython
selenium requires geckodriver for firefox, check this link for the other browsers.
Conservation.py
Auto_Uniprot.py
Blast_Align.py
Auto_Mafft.py
Res_Conserv_Score.py
- First clone this repository :
$git clone https://github.com/hocinebib/ResConsScorePipeline.git
or download it.
$cd ResConsScorePipeline/
$python src/Conservation.py "MexA MexB OprM"