Unlocking a Greater Perspective: Mapping the Chemical Space of Biomes Using Sirius

Untargeted mass spectrometry is a powerful tool for analyzing the immense chemical complexity of natural environments. However, interpreting such large datasets remains a significant challenge. To overcome this, researchers have developed an innovative approach using SIRIUS that prioritizes chemical profiling over exhaustive identification. This method allows for more effective comparisons of (micro-)biomes, providing deeper insights into biochemical diversity across different environments.
Two hands full of soil.
SIRIUS allows for effective untargeted metabolomic comparison of biomes, such as soil, water, animal gut regions, fungal bodies, and many more.

Untargeted Power

Mass spectrometry is a powerful tool for exploring the vast chemical diversity of natural environments. Targeted analysis relies on prior knowledge, such as retention time and fragmentation pattern, requiring chemical standards which may be expensive or difficult to obtain. This limits its scope to only dozens or hundreds of compounds1. In contrast, untargeted mass spectrometry is a powerful tool that allows researchers to analyze complex biological samples without predefined targets generating thousands of peaks from a single sample. But it’s a challenge to interpret and compare these massive and highly complex datasets effectively.

The Big Data Problem: Challenges in Untargeted Metabolomics

Traditional methods for comparing untargeted biological samples primarily focus on intensity-based quantification of individual compounds. However, these workflows require full annotation of detected compounds, a process hindered by low annotation rates. In fact, less than 15% of peaks are typically identified using standard spectral libraries2, leaving the majority of compounds uncharacterized. This bottleneck complicates meaningful comparisons and limits the ability to derive broad chemical insights. 

A more effective approach prioritizes chemical characterization over full structural identification. Instead of relying solely on annotated features, analyzing broader chemical characteristics provides more comprehensive comparisons across samples.

Beyond Identification: A Smarter Way to Compare Samples

To address these limitations, researchers at Friedrich Schiller University Jena, Université Paris-Saclay and Wageningen University developed a novel approach utilizing SIRIUS to generate chemical characteristics vectors of untargeted datasets3. These vectors describe the chemical properties of compounds in a sample by estimating the ratio of compounds with specific chemical properties. They use SIRIUS-predicted molecular fingerprints and compound classes as features which are transformed into binary values. Averaging across all compounds in a sample produces ratios reflecting the frequency of chemical properties within the sample. The resulting chemical characteristics vectors offer a nuanced understanding of the chemical composition of a complex sample without the necessity for exhaustive and often elusive final identification. In addition, machine learning models can easily be employed to compare these vectors, enabling the classification of samples by biome.

Method Details

SIRIUS-predicted molecular fingerprints31,32 and compound classes33 are the features of the chemical characteristics vectors. Molecular fingerprints (MFP) are generatde with CSI:FingerID and compound classes (CC) with CANOPUS. The molecular fingerprint vector consisted of 5,899 bits, while each compound class vector contained 2,723 bits. All probabilistic MFPs and CCs obtained from SIRIUS calculations were converted into binary values using a threshold of 0.5, indicating the presence or absence of specific chemical characteristics within the compound structure. To summarize the chemical features within a sample, averaging was applied for each feature (bit) in the vector. This was done either by averaging over all compounds in the entire sample or by grouping compounds based on their precursor mass (100-250, 250-300, 300-350, 350-400, 400-450, 450-550, and 550-900 m/z), averaging within each group, and then concatenating the averaged vectors. Grouping the compounds into mass categories adds resolution and improves performance.

Comparing Biomes with Chemical Signatures

To demonstrate the power of chemical characteristics vectors, the researchers applied this method to over 500 samples from eleven distinct biomes from the Earth Microbiome Project4,5. These biomes included environments such as soil, water, sediment, animal gut regions, plant surfaces, and fungal bodies.The researchers computed chemical characteristics vectors with SIRIUS for ~62% of the compounds, allowing for large-scale comparison of the chemical compositions. They found that 

  • ethers were significantly enriched in environmental biomes, such as water, subsurface, soil, and sediment,
  • monosaccharide phosphates and glycerone phosphates, which are crucial for carbon and phosphorus cycling, were more abundant in plant and environmental biomes but nearly absent in animal-associated biomes, and 
  • prenol lipids, steroids, bile acids, and amino acids were more dominant in animal-associated biomes, distinguishing them from environmental samples

These findings underscore the effectiveness of chemical characteristics vectors in distinguishing biomes and identifying key chemical features driving these differences.

Beyond Biome Comparisons

Beyond simple biome comparison, this approach holds significant potential for broader metabolomics applications, including:

  • Environmental monitoring: Identifying chemical differences across ecosystems over time.
  • Time-series data analysis: Tracking metabolic changes in response to environmental shifts.
  • Biomarker discovery: Comparing chemical compositions in sick versus healthy organisms.
  • Multi-omics integration: Correlating metabolomic features with microbial traits, gene expression, and metabolic pathways.

Conclusion

Chemical characteristics vectors using SIRIUS can move your untargeted analysis beyond traditional annotation-based comparisons by shifting the focus to broader chemical profiling. This method enables large-scale comparisons across diverse biological and environmental samples, allowing researchers to classify biomes, track metabolic changes, and uncover biomarkers. Its potential to identify hidden patterns in complex systems makes it a powerful tool for advancing metabolomics research.

References
  1. Zhou, J. & Yin, Y. Strategies for large-scale targeted metabolomics quantification by liquid chromatography-mass spectrometry. The Analyst 141, 6362–6373 (2016). https://doi.org/10.1039/C6AN01753C
  2. Bittremieux, W., Wang, M. & Dorrestein, P. C. The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics 18, 94 (2022). https://doi.org/10.1007/s11306-022-01947-y
  3. Pilleriin Peets, Aristeidis Litos, Kai Dührkop, Daniel R. Garza, Justin J.J. van der Hooft, Sebastian Böcker, Bas E. Dutilh. Chemistry-based vectors map the chemical space of natural biomes from untargeted mass spectrometry data. bioRxiv 2025.01.22.634253. (2025) https://doi.org/10.1101/2025.01.22.634253
  4. Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017). https://doi.org/10.1038/nature24621
  5. Shaffer, J. P. et al. Standardized multi-omics of Earth’s microbiomes reveals microbial and metabolite diversity. Nat. Microbiol. 7, 2128–2150 (2022). https://doi.org/10.1038/s41564-022-01266-x

The easy way to comprehensive structure elucidation​

SIRIUS is proven to be the best computational method for identifying molecules from tandem mass spectrometry data. SIRIUS is the umbrella application comprising molecular formula identification (ZODIAC), structure database search (CSI:FingerID), confidence score assignment (COSMIC) and compound class prediction (CANOPUS).​

Share