Skip to content

VEBA_v2.1.0

Compare
Choose a tag to compare
@jolespin jolespin released this 17 May 14:13
· 61 commits to main since this release
b67f0ed

Official release of VEBA v2.1.0 with updates to address peer reviewers. Mostly documentation but also including the following:

  • [2024.4.30] - Added concatenate_files.py which can concatenate files (and mixed compressed/decompressed files) using either arguments, list file, or glob. Reason for this is that unix has a limit of arguments that can be used (e.g., cat *.fasta > output.fasta where *.fasta results in 50k files will crash)
  • [2024.4.29] - Added /volumes/workspace/ directory to Docker containers for situations when your input and output directories are the same.
  • [2024.4.29] - featureCounts can only handle 64 threads at a time so added min(64, opts.n_jobs) for all the modules/scripts that use featureCounts commands.
  • [2024.4.23] - Added uniprot_to_enzymes.py which reformats tables and fasta from https://www.uniprot.org/uniprotkb?query=ec%3A*
  • [2024.4.18] - Developed a faster CLI implementation of KofamScan called PyKofamSearch which leverage PyHmmer. This will be used in future versions of VEBA.
  • [2024.4.18] - Developed a faster CLI implementation of HMMSearch called PyHMMSearch which leverage PyHmmer. This will be used in future versions of VEBA.
  • [2024.3.26] - Added --metaeuk_split_memory_limit to metaeuk_wrapper.py.
  • [2024.3.26] - Added -d/--genome_identifier_directory_index to scaffolds_to_bins.py for directories that are structured path/to/genomes/bin_a/reference.fasta where you would use -d -2.
  • [2024.3.26] - Added --minimum_af to edgelist_to_clusters.py with an option to accept 4 column inputs [id_1]<tab>[id_2]<tab>[weight]<tab>[alignment_fraction]. global_clustering.py, local_clustering.py, and cluster.py now use this by default --af_threshold 30.0. If you want to retain previous behavior, just use --af_threshold 0.0.
  • [2024.3.18] - edgelist_to_clusters.py only includes edges where both nodes are in identifiers set. If --identifiers are provided, then only those identifiers are used. If not, then it includes all nodes.
  • [2024.3.18] - Added --export_representatives argument for edgelist_to_clusters.py to output table with [id_node]<tab>[id_cluster]<tab>[intra-cluster_connectivity]<tab>[representative]. Also includes this information in nx.Graph objects.
  • [2024.3.18] - Changed singleton weight to np.nan instead of np.inf for edgelist_to_clusters.py to allow for representative calculations.
  • YouTube channel (https://www.youtube.com/@VEBA-Multiomics)