Releases: ZhijianZhou01/virusrecom
virusrecom v1.3.6
- Data Preprocessing: Remove positions (columns) that consist entirely of “-” in input-sequence file, to adapt to scenarios where the input-sequence file (as subsets) are extracted from large alignment files in the pipeline.
virusrecom v1.3.5
-
For the output files of
*_site_WIC_from_lineages.xlsx
and*_site_WIC.csv
in output directoryWICs_of_sites
, and only*_site_WIC.csv
is retained because they differ only in file format. -
The file format of
*_mWIC_from_lineages.xlsx
in output directoryWICs_of_slide_window
is replaced by*_mWIC_from_lineages.csv
. -
Optimized to further reduce unnecessary computational time consumption.
virusrecom v1.3.2
Compared to virusrecom v1.2.1
1. Optimize memory usage
-
Solve the bug of large memory usage when plotting plotting WIC figures or mWIC figures in batches.
-
Sites in the sequence alignment can be iteratively read and loaded into memory in the form of sub-block. Specifies the maximum number of sites per sub-block by the parameter
--block
(default value: 40000), different sub-blocks will be sequentially loaded to calculate the WIC value. For example,--block 20000
means that no more than 20,000 sites in per iteration load. This optimization allow large amounts of sequences to be computed at lower memory.Here's the run log from the example 3.1 (1000 sequences from 10 lineages, the number of sites in alignment is 29,172):
>>> Treat query_recombinant as a potential recombination lineage...
>>> VirusRecom starts calculating weighted information content from each lineage...
VirusRecom is importing data blocks 1
Load sites: 1 - 20000
VirusRecom is removing sites (columns) containing gap (-)...
VirusRecom is extracting polymorphic sites...
WIC for data_blocks 1 have been completed.
VirusRecom is importing data blocks 2
Load sites: 20001 - 29172
VirusRecom is removing sites (columns) containing gap (-)...
VirusRecom is extracting polymorphic sites...
WIC for data_blocks 2 have been completed.
>>> The WIC calculations of 1015 sites have been completed.
>>> VirusRecom starts scanning using sliding window ...
Possible major parent: reference_lineage_1 (global mWIC: 1.8976186779157704)
Other possible parents and recombination region (map at the alignment):
reference_lineage_2 [['7237 to 11539(mWIC: 1.9553354371515168)', 'p_value: 7.831109305531836e-06']]
>>> Take 0:00:18.073764 seconds in total.
2. Streamline the output
- Reduce the output of logs on the screen.
- Rename the file
Possible_recombination_event_detailed.txt
toidentify_logs_detailed.txt
, because it is not the final identification of recombination.
virusrecom v1.2.1
Compared to virusrecom v1.1.5
-
Multi-process-based parallelization is applied to compute WIC and mWIC from different lineages, and each lineage data is computed simultaneously on different processes. The number of threads (cores) used is specified by the parameter
-t
, such as-t 6
.This optimization can take full advantage of the multi-core CPU and significantly reduce computation time, and it is especially suitable for recombination indentification with a large number of lineage types from
lineage data
(or a large number sequences fromnon-lineage data
).
virusrecom v1.1.5
- Fixed the bug that the size of the diagram could not be automatically adapted when drawing a WIC dot diagram.
virusrecom v1.1.4
Set the font size to change automatically in the legend to solve the problem of incomplete legend display when the number of lineages is too large.
virusrecom v1.1.3
-
Add an output file used to record these input parameters in command-line-interface.
-
Add a parameter
--no_wic_fig
, and its function is "Do not draw the image of WICs". -
Add a parameter
--no_mwic_fig
, and its function is "Do not draw the image of mWICs".
virusrecom v1.1.2
Compatible with later versions of pandas, and this contribution comes from Wang Haoyang (https://github.com/Wesady).
virusrecom v1.1
- Optimize the input logic of file to make virusrecom easier to use.
- Add a function to mark the lineage for data preparation.
virusrecom v1.0
Updated on September 20, 2022