For more information:
The nearest neighbor suspect spectral library is a spectral library that was created in a data-driven fashion by propagating annotations from hundreds of millions of public mass spectra to molecules that are structurally related to previous reference molecules using MS/MS based spectral alignment. It is a freely available resource provided through the GNPS platform to help investigators hypothesize candidate structures for unknown MS/MS spectra in their untargeted metabolomics data.
All code is available as open-source under the BSD-3-Clause license.
If you use the nearest neighbor suspect spectral library in your work, please cite the following publication:
- Bittremieux, W. et al. Open access repository-scale propagated nearest neighbor suspect spectral library for untargeted metabolomics. bioRxiv (2022) doi:10.1101/2022.05.15.490691.
The nearest neighbor suspect spectral library can be directly included in any analysis on GNPS that uses spectral library searching. To select the nearest neighbor suspect spectral library as one of the spectral libraries included in your data analysis:
- Browse to the "GNPS-SUSPECTLIST.mgf" file under "CCMS_SpectralLibraries" > "GNPS_Propogated_Libraries" > "GNPS-SUSPECTLIST" in the file selector.
- Click on "Library Files" to add the nearest neighbor suspect spectral library.
- Verify that the library now appears in the "Selected Library Files" category in the selection panel.
Alternatively, you can download the nearest neighbor suspect spectral library as an MGF file from GNPS or from its Zenodo archive and include it in any external MS/MS data analysis tool.
All of the data used to compile the nearest neighbor suspect spectral library are publicly available through GNPS/MassIVE and archived on Zenodo.
- GNPS living data (version November 17, 2020)
- Living data global molecular network
- Spectral library searching using the default GNPS libraries only: part 1, part 2, part 3, part 4, part 5, part 6, part 7, part 8
- Spectral library searching using the default GNPS spectral libraries and the nearest neighbor suspect spectral library: part 1, part 2, part 3, part 4, part 5, part 6, part 7, part 8
- Molecular networking of apratoxin suspects
- Molecular networking of azithromycin suspects
- Molecular networking of flavonoid suspects
- Molecular networking of home environment personal care products
- Spectral library searching of Alzheimer's disease data
You can use the code in this repository to compile the nearest neighbor suspect spectral library (or a similar spectral library) from the GNPS living data results yourself. This requires Python 3.8 or above. You can create a suitable code environment and install all dependencies using conda:
conda env create -f https://raw.githubusercontent.com/bittremieux/gnps_suspect_library/master/environment.yml && conda activate suspect_library
See the environment.yml
file for full details on the software dependencies.
You can generate the nearest neighbor suspect spectral library from the GNPS living data results by cloning the repository and running the main Python script:
git clone https://github.com/bittremieux/gnps_suspect_library.git && cd gnps_suspect_library/src
python suspects.py
This will create Parquet files that include tabular information and provenance for all the suspect MS/MS spectra.
Compiling an MS/MS spectral library MGF file from the Parquet metadata file can be done using the export_mgf.ipynb
Jupyter notebook in the notebooks
directory.
The Jupyter notebooks in the notebooks
directory can also be used to fully recreate all analyses reported in the manuscript.
For more information you can visit the official code website or send an email to [email protected].