Skip to content

Simple Python interface to CoverM's fast coverage estimation functions

License

Notifications You must be signed in to change notification settings

apcamargo/pycoverm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyCoverM

pyCoverM is a Python library that provides bindings to CoverM, enabling fast coverage estimation.

Installation

pyCoverM is available via PyPI or Conda.

PyPI installation

pip install pycoverm

Conda installation

The Conda package can be installed though Pixi or Mamba/Conda.

# Pixi
pixi init pycoverm_project
cd pycoverm_project
pixi project channel add bioconda
pixi add pycoverm

# Mamba (just replace 'mamba' with 'conda' if you have Conda installed)
mamba create -n pycoverm_env -c conda-forge -c bioconda pycoverm
mamba activate pycoverm_env

Quick start

pyCoverM provides two functions:

  1. is_bam_sorted: Checks if a BAM file is sorted by coordinate and returns True if sorted, or False otherwise.
  2. get_coverages_from_bam: Computes the average contig coverage from sorted BAM files. It returns a tuple where the first element is a list of contig names, and the second is a NumPy array of coverage values.

Example usage

>>> import pycoverm
>>> TEST_BAM = "tests/test_data.bam"
>>> pycoverm.is_bam_sorted(TEST_BAM)
True
>>> coverages = pycoverm.get_coverages_from_bam([TEST_BAM])
>>> coverages[0]
['contig_7847997', 'contig_11746202', 'contig_9129108', …, 'contig_2917594']
>>> coverages[1]
array([[0.        ],
       [0.526652  ],
       [0.08541025],
       …           ,
       [0.00907206]], dtype=float32)

Note

If multiple BAM files are provided, the resulting NumPy array will contain one column for each BAM file, with each column corresponding to the coverage values from a specific BAM file.

API

/// is_bam_sorted(bam_file)
/// --
///
/// Checks whether a BAM file is sorted by coordinate.
///
/// Parameters
/// ----------
/// bam_file : str
///     Path to a BAM file.
///
/// Returns
/// -------
/// bool
///     Returns `True` if the BAM file is sorted by coordinate and `False`
///     otherwise.
/// get_coverages_from_bam(bam_list, contig_end_exclusion=75, min_identity=0.97,
/// trim_lower=0.0, trim_upper=0.0, contig_list=None, threads=1)
/// --
///
/// Computes contig mean coverages from sorted BAM files. All BAM files must be
/// mapped to the same reference.
/// Trimmed means will be computed if `trim_min` and/or `trim_max` are set to
/// values greater than 0.
///
/// Parameters
/// ----------
/// bam_list : list
///     A list of paths to input BAM files.
/// contig_end_exclusion : int, optional
///     Exclude bases at the ends of reference sequences from calculation.
///     Default is 75.
/// min_identity : float, optional
///     Exclude reads by overall identity to the reference sequences.
///     Default is 0.97.
/// trim_lower : float, optional
///     Fraction to trim from the lower tail of the coverage distribution.
///     Default is 0.0.
/// trim_upper : float, optional
///     Fraction to trim from the upper tail of the coverage distribution.
///     Default is 0.0.
/// contig_set : set, optional
///     If provided, only the coverages of the contigs within `contig_set` will
///     returned.
///     Default is None (return the coverages of all contigs).
/// threads : int, optional
///     Number of threads to use for coverage computation. Default is 1.
///
/// Returns
/// -------
/// tuple
///     A tuple whose fist element is a list of the contig names and the second
///     one is a numpy matrix of contig coverages in the input BAM files.

About

Simple Python interface to CoverM's fast coverage estimation functions

Resources

License

Stars

Watchers

Forks

Packages

No packages published