Skip to content
Jill V. Hagey, PhD edited this page Oct 1, 2024 · 26 revisions

Welcome to the PHoeNIx 🔥🐦🔥 wiki!

⁠PHoeNIx was built and is maintained by the CDC's Division of Healthcare Quality Promotion (DHQP) to standardize pathogen surveillance and support public health laboratories in their genomic analysis. Next generation sequencing (NGS) is a powerful tool to aid in characterization, prevention and control of bacteria that cause healthcare-associated infections (HAIs) - this includes surveillance, outbreak investigations, molecular epidemiology, transmission patterns, and virulence mechanisms, including antimicrobial resistance (AR) mechanisms.

CDC’s Antimicrobial Resistance Laboratory Network (AR Lab Network) mission is to rapidly detect emerging and novel antimicrobial resistance threats. PHoeNIx (Portable Healthcare Nextgen Informatics pipeline) was developed to advance these objectives. Specifically, it includes quality control, assembly, taxonomic identification, sequence typing (MLST), plasmid replicons, antimicrobial resistance gene identification, and hypervirulence gene identification. However, PHoeNIx does not currently include phylogenetic analyses, but these are a planned future addition.

PHoeNIx is a bioinformatics analysis pipeline built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The use of PHoeNIx provides a standardized approach for identifying and characterizing healthcare-associated bacterial pathogens, specifically for public health partners. The Nextflow DSL 2 implementation of this pipeline uses one container per process which makes it easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

PHoeNIx takes in Illumina paired-end reads and was designed for use with pathogens causing healthcare-associated bacterial infections. This comprehensive pipeline performs:

  • Quality control
  • Checks for contamination
  • Confirms taxa ID
  • Performs sequence typing
  • Assembles reads into scaffolds
  • Detects antimicrobial resistance and hypervirulence genes
  • Searches for plasmid markers

PHoeNIx generates several files that are compatible with downstream analytic tools, such as those used for phylogenetic tree-building. PHoeNIx was developed to support bioinformatics capacity in public health laboratories. This pipeline is available to run on Terra, Nextflow tower, CLI and is incorporated into the StaPH-B toolkit. Broad distribution of this tool will enhance both local public health capacity and national efficiency in utilizing WGS data for healthcare-associated infection (HAI) surveillance and investigation.

This pipeline was developed in the Division of Healthcare Quality Promotion (DHQP) at the CDC for use with pathogens commonly encountered in healthcare settings (e.g., Acinetobacter baumannii, Klebsiella pneumoniae, Pseudomonas aeruginosa, Staphylococcus aureus). Although, the pipeline can theoretically be run on any bacterial species more caution is required for non-HAI organisms as we have less/no experience on how PHoeNIx might handle them. A list of organisms DHQP has sequenced and run through PHoeNIx is found here. If you require assistance, please contact the DHQP Clinical and Environmental Microbiology Branch (CEMB) at [email protected]. You can also contact us in the Slack #phoenix-dev channel.

We highly encourage collaboration and welcome feedback on CDC’s PHoeNix pipeline. However, if you choose to fork the PHoeNix pipeline, you are assuming responsibility for that new fork. The DHQP Clinical and Environmental Microbiology Branch does not guarantee the performance or results of any pipelines or scripts that have been forked or copied from this repo.

Credits: