-
Notifications
You must be signed in to change notification settings - Fork 0
Home
git clone [email protected]:iliapopov17/phyloki.git && cd phyloki
pip install -r requirements.txt
Input
from phyloki import *
To use Phyloki user must have accession numbers.txt
file. It must look like this:
Input
! head -5 demo_data/accession_numbers.txt
Output
NC_034519
NC_055636
NC_005225
NC_038939
NC_038529
Input
- input txt file with the list of accession numbers
- output file
get_sequences('[email protected]', 'demo_data/accession_numbers.txt', 'genbank_sequences')
Output
Downloaded: NC_034519
Downloaded: NC_055636
Downloaded: NC_005225
Downloaded: NC_038939
Downloaded: NC_038529
Downloaded: NC_034467
Downloaded: NC_034553
Downloaded: NC_003468
Downloaded: NC_034515
Downloaded: NC_038299
Downloaded: NC_077671
Downloaded: NC_034403
Downloaded: NC_038695
Downloaded: NC_034556
Downloaded: LC553715
Downloaded: NC_005238
Downloaded: NC_005235
Downloaded: NC_034517
Downloaded: NC_006435
Downloaded: NC_005222
Downloaded: NC_055147
Downloaded: NC_034560
Downloaded: NC_034485
Downloaded: NC_034407
Downloaded: NC_034399
Downloaded: NC_034402
Downloaded: MG663536
Downloaded: NC_038515
Downloaded: NC_078262
Downloaded: NC_055632
Downloaded: NC_034401
Downloaded: OR684449
Downloaded: FJ593498
Downloaded: KX512433
Downloaded: NC_078485
Downloaded: KX779126
Downloaded: NC_034564
Downloaded: NC_010707
Downloaded: NC_055170
All downloads completed.
Below used tree is based on demonstrational data. Tree construction is made in tree_construction.ipynb
notebook.
The most important file is demo_data/tree_ufb.treefile
. Upload it to iTOL for visualization.
Figure 1. Reference tree from the original paper
Figure 2. Naked phylogenetic tree
This tree is naked.
There is no:
- Annotation of the organisms name. There are only accession numbers that cannot say anything.
- The tree demonstrates phylogenetic relationships between different viruses. But there is no information about host organisms of that viruses.
It is worth mentioning that the trees are literally identical. (Bootstrap values are even better in my variant).
Input
- input txt file with the list of accession numbers
- output file
get_organisms('[email protected]', 'demo_data/accession_numbers.txt', 'demo_data/accession_organism.txt')
Output
The request has been fulfilled.
File saved to demo_data/accession_organism.txt
Input
! head -5 demo_data/accession_organism.txt
Output
NC_034519.1 Orthohantavirus khabarovskense
NC_055636.1 Orthohantavirus tatenalense
NC_005225.1 Orthohantavirus puumalaense
NC_038939.1 Orthohantavirus prospectense
NC_038529.1 Eothenomys miletus hantavirus LX309
Input
- input txt file with the list of accession numbers and organisms names
- input tree file
- output modified tree
update_tree('demo_data/accession_organism.txt', 'demo_data/tree_ufb.treefile', 'demo_data/modified_tree.treefile')
Output
The request has been fulfilled.
File saved to demo_data/accession_organism.txt
Input
! head demo_data/tree_ufb.treefile
Output
(FJ593498.1:0.1240225441,KX512433.1:0.1580233515,((((((KX779126.1:0.1801341369,NC_034564.1:0.1518834757)100:0.2690724126,NC_010707.1:0.4026159852)100:0.5357342048,NC_055170.1:3.2821188681)96:0.1993926731,(((((((LC553715.1:0.2424396410,NC_034556.1:0.2425091493)100:0.1540926638,NC_005238.1:0.2987153355)100:0.0815585291,((NC_005222.1:0.1745023294,NC_006435.1:0.1329576555)100:0.2619343862,(NC_005235.1:0.3091225291,NC_034517.1:0.3426538757)100:0.0890215926)59:0.0409665566)100:0.2464858691,NC_055147.1:0.5009202874)69:0.0579848758,((NC_034399.1:0.4574934455,NC_034407.1:0.4201827666)100:0.2133110745,(NC_034485.1:0.3554924125,NC_034560.1:0.3958031671)100:0.1134597575)100:0.0983251229)88:0.0539862361,NC_034402.1:0.6194957047)100:0.2179091508,(((((NC_003468.2:0.3220309131,NC_034553.1:0.3217768427)100:0.0967750566,(NC_034515.1:0.3420020277,NC_038299.1:0.3578938480)78:0.0604905717)100:0.0681808060,(NC_034403.1:0.4057149461,NC_077671.1:0.3295415521)96:0.0808310506)100:0.1515347499,NC_038695.1:0.6146030315)75:0.0529123693,(((NC_005225.1:0.3178994625,(NC_034519.1:0.2903408237,NC_055636.1:0.2951103060)96:0.0707049689)100:0.1162566928,NC_038939.1:0.4860686808)100:0.0974816090,(NC_034467.1:0.3408088379,NC_038529.1:0.3214413064)100:0.1876647016)100:0.0906674433)100:0.3112288106)100:0.2995543026)97:0.1136359007,NC_078485.1:1.2137610889)49:0.0697049196,(((MG663536.1:0.4927348232,NC_038515.1:0.3837609395)94:0.0895431598,NC_078262.1:0.4767046102)100:0.2182159381,((NC_034401.1:0.5482148765,NC_055632.1:0.5333969980)100:0.2727779310,OR684449.1:0.6549294470)90:0.1135643862)55:0.0661132415)100:0.9075896851);
Usual treefile contains only accession numbers. They cannot say anything.
Input
! head demo_data/modified_tree.treefile
Output
(FJ593498.1 Nova virus:0.1240225441,KX512433.1 Nova virus:0.1580233515,((((((KX779126.1 Imjin virus:0.1801341369,NC_034564.1 Imjin virus:0.1518834757)100:0.2690724126,NC_010707.1 Thottapalayam virus:0.4026159852)100:0.5357342048,NC_055170.1 Hainan oriental leaf-toed gecko hantavirus:3.2821188681)96:0.1993926731,(((((((LC553715.1 Orthohantavirus thailandense:0.2424396410,NC_034556.1 Anjozorobe virus:0.2425091493)100:0.1540926638,NC_005238.1 Orthohantavirus seoulense:0.2987153355)100:0.0815585291,((NC_005222.1 Orthohantavirus hantanense:0.1745023294,NC_006435.1 Hantavirus Z10:0.1329576555)100:0.2619343862,(NC_005235.1 Orthohantavirus dobravaense:0.3091225291,NC_034517.1 Orthohantavirus sangassouense:0.3426538757)100:0.0890215926)59:0.0409665566)100:0.2464858691,NC_055147.1 Tigray virus:0.5009202874)69:0.0579848758,((NC_034399.1 Jeju virus:0.4574934455,NC_034407.1 Bowe virus:0.4201827666)100:0.2133110745,(NC_034485.1 Orthohantavirus caobangense:0.3554924125,NC_034560.1 Kenkeme virus:0.3958031671)100:0.1134597575)100:0.0983251229)88:0.0539862361,NC_034402.1 Bruges virus:0.6194957047)100:0.2179091508,(((((NC_003468.2 Orthohantavirus andesense:0.3220309131,NC_034553.1 Maporal virus:0.3217768427)100:0.0967750566,(NC_034515.1 Orthohantavirus delgaditoense:0.3420020277,NC_038299.1 Orthohantavirus bayoui:0.3578938480)78:0.0604905717)100:0.0681808060,(NC_034403.1 Orthohantavirus montanoense:0.4057149461,NC_077671.1 Orthohantavirus sinnombreense:0.3295415521)96:0.0808310506)100:0.1515347499,NC_038695.1 Rockport virus:0.6146030315)75:0.0529123693,(((NC_005225.1 Orthohantavirus puumalaense:0.3178994625,(NC_034519.1 Orthohantavirus khabarovskense:0.2903408237,NC_055636.1 Orthohantavirus tatenalense:0.2951103060)96:0.0707049689)100:0.1162566928,NC_038939.1 Orthohantavirus prospectense:0.4860686808)100:0.0974816090,(NC_034467.1 Fugong virus:0.3408088379,NC_038529.1 Eothenomys miletus hantavirus LX309:0.3214413064)100:0.1876647016)100:0.0906674433)100:0.3112288106)100:0.2995543026)97:0.1136359007,NC_078485.1 Lena virus:1.2137610889)49:0.0697049196,(((MG663536.1 Dakrong virus:0.4927348232,NC_038515.1 Laibin virus:0.3837609395)94:0.0895431598,NC_078262.1 Xuan son virus:0.4767046102)100:0.2182159381,((NC_034401.1 Quezon virus:0.5482148765,NC_055632.1 Orthohantavirus robinaense:0.5333969980)100:0.2727779310,OR684449.1 Buritiense virus:0.6549294470)90:0.1135643862)55:0.0661132415)100:0.9075896851);
Modified treefile contains accession numbers and organisms names. It makes more sense.
Input
- input txt file with the list of accession numbers
- output file
get_hosts('[email protected]', 'demo_data/accession_numbers.txt', 'demo_data/accession_host.txt')
Output
The request has been fulfilled.
File saved to demo_data/accession_host.txt
Input
! head -5 demo_data/accession_host.txt
Output
NC_034519.1 Microtus maximowiczii
NC_055636.1 Microtus agrestis
NC_005225.1 ND
NC_038939.1 Microtus pennsylvanicus
NC_038529.1 Eothenomys miletus
Input
- input txt file with the list of accession numbers and hosts
- output file
get_hosts_orders('[email protected]', 'demo_data/accession_host.txt', 'demo_data/accession_order.txt')
Output
The request has been fulfilled.
Please do not forget to edit the file manually.
The query to NCBI database from this function is pretty difficult.
Sometimes this function prints:
"Error - HTTP Error 400: Bad Request" in case of bad connection or
"Note - False record" in case there is no record about the host organism.
Input
! head -5 demo_data/accession_order.txt
Output
NC_034519.1 Rodentia
NC_055636.1 Rodentia
NC_005225.1 ND
NC_038939.1 Rodentia
NC_038529.1 Rodentia
Input
unique_orders = get_unique_orders("demo_data/accession_order.txt")
print(unique_orders)
Output
['Rodentia', 'ND', 'Eulipotyphla', 'Chiroptera', 'Squamata']
Input
color_map = set_color_map("demo_data/accession_order.txt")
print(color_map)
Interactive window will open and will ask to set HEX codes for each unique order
Output
{'Rodentia': '#0ca20c', 'ND': '#ffffff', 'Eulipotyphla': '#0078ff', 'Chiroptera': '#000000', 'Squamata': '#ffa500'}
Input
- input txt file with the list of accession numbers and organisms names
- input txt file with the list of accession numbers and taxonomic order of microorganism host
- output file
- manually created color map
get_itol_dataset("demo_data/accession_organism.txt", "demo_data/accession_order.txt", "demo_data/dataset_for_iTOL.txt", color_map)
Output
Colors were set by the user.
The request has been fulfilled.
Input
! head -5 demo_data/dataset_for_iTOL.txt
Output
DATASET_COLORSTRIP
SEPARATOR TAB
DATASET_LABEL Host Group Colors
DATA
NC_034519.1 Orthohantavirus khabarovskense #0ca20c Rodentia
- Visit iTOL
- Upload
demo_data/modified_tree.treefile
file as the tree - Upload
demo_data/dataset_for_iTOL.txt
as the annotation dataset
Fig 3. Second tree. With annotation info containing organisms names and manually adjusted colors indicating hosts taxonomic order
This is the best tree easily made with DSTU software
Let's take a look at the original tree again
It can be seen that in original version authors did annotation manually and they made some mistakes in hosts annotation. DSTU software did not make this mistakes.
Input
- input txt file with the list of accession numbers and organisms names
- input txt file with the list of accession numbers and taxonomic order of microorganism host
- output file
get_itol_dataset("demo_data/accession_organism.txt", "demo_data/accession_order.txt", "demo_data/dataset_for_iTOL_2.txt")
Output
Colors were not set, they were generated randomly.
The request has been fulfilled.
Input
! head -5 demo_data/dataset_for_iTOL_2.txt
Output
DATASET_COLORSTRIP
SEPARATOR TAB
DATASET_LABEL Host Group Colors
DATA
NC_034519.1 Orthohantavirus khabarovskense #e31342 Rodentia
- Visit iTOL
- Upload
demo_data/modified_tree.treefile
file as the tree - Upload
demo_data/dataset_for_iTOL_2.txt
as the annotation dataset
Fig 4. Third tree. With annotation info containing organisms names and randomly generated colors indicating hosts taxonomic order
In this case random generation played a bad joke! Almost every color is the same. It will be much more convenient to adjust color map manually.