Going Barcode-Free: Screening Massive Small Molecule Libraries for Early Drug Discovery

Collaborations, Press Releases, Research Projects
chemical diversity, CSI:FingerID, drug discovery, health, screening
November 4, 2025
Franziska Hufsky

Our recent study co-authored by researchers at Bright Giant, FSU Jena, Leiden University and Oncode Institute introduces a major leap forward in affinity selection screening for early drug discovery: Self-Encoded Libraries. Our approach uses advanced mass spectrometry to screen hundreds of thousands of small molecules in a single experiment, bypassing the significant limitations of traditional high-throughput screening as well as affinity selection with barcoded libraries. It allows drug discovery teams to identify high-affinity drug candidates faster, more affordably, and against targets previously inaccessible to common screening methods.

The identification of high-affinity ligands is a foundational step in early drug discovery. Traditional high-throughput screening typically requires screening immense compound libraries, testing every compound separately using biochemical and cellular assays¹. Although effective, this is a resource-intensive process, often requiring months of work and complex and costly infrastructure. The cost and complexity limit the availability of these massive platforms primarily to large pharmaceutical companies.

Affinity selection technologies offer a powerful alternative, enabling the screening of large libraries in a single experiment². This approach typically involves exposing an immobilized target protein to a compound pool to separate binders from non-binders, then isolating the target-ligand complexes to identify the bound compounds. A crucial step in this process is the decoding of each hit, which is most commonly achieved using DNA- or RNA-barcodes attached to each ligand³. This large DNA tag represents the primary limitation, complicating synthesis⁴ and potentially biasing results. The barcode is typically more than 50 times larger than the small molecule and can interfere with binding⁵, especially for targets with nucleic acid binding sites⁶.

Going Barcode-Free: Introducing Self-Encoded Libraries

In our recent study—together with researchers from Leiden University, FSU Jena, and Oncode Institute—we introduce Self-Encoded Libraries, which directly screen over half a million small molecules in a single experiment without using encoding tags⁷. Instead, we use the molecule’s own mass signature for decoding and tandem mass spectrometry (MS/MS) fragmentation to accurately reconstruct the molecular structure of the selected ligands.

Our goal was to screen small molecules in their most native form, eliminating the potential for any structural bias from a large, attached tag. This can be achieved by using the molecule’s own mass signature for decoding, which not only provides an unbiased screen but also radically opens up the chemical reactions we can use, accelerating library creation.

This barcode-free approach offers two critical advantages:
(1) The molecule is screened in its completely unmodified form, eliminating any potential bias resulting from a large encoding tag.
(2) Self-Encoded Libraries can undergo any reaction condition compatible with the small molecule itself, enabling a broader range of chemical transformations and allowing highly diverse libraries to be synthesized rapidly (e.g., in under a week) using standard, cost-effective organic synthesis techniques.

SIRIUS-COMET for Structural Annotation of Ligands

To manage the complexity of decoding vast, untagged chemical mixtures, we introduce SIRIUS-COMET. Unequivocal structural annotation is essential for a successful screen, but is challenged by the number of isobars in large libraries, i.e. compounds with the same mass but different structures. SIRIUS-COMET is essential for accurately reconstructing the molecular structure of selected ligands from the mass spectrometry data. While the complete space of potential structures is known (the Self-Encoded Library), there are no reference MS/MS measurements for the complex mixture of substances. Instead, the library is imported into SIRIUS as a structure database (a list of SMILES). The COMET filter is designed to manage the high volume of MS/MS scans. It relies on predicted fragmentation patterns based on prominent recurring fragmentation rules for each library scaffold, drastically reducing the number of mass spectrometry scans that require full annotation. This combined approach achieved a high correct recall and annotation rate of 66–74% on the tested libraries.

Evaluation and Validation

The efficacy of Self-Encoded Libraries was successfully validated in two distinct case studies, demonstrating its capability to screen libraries of up to 500,000 members.

Validating Scale: We provided a crucial validation of the method’s large-scale capability by screening a nearly 500,000-member library for CAIX (Carbonic Anhydrase IX) ligands. While CAIX is a known oncology target with established binders, the successful identification and validation of several nanomolar binders—including the expected enrichment of 4-sulfamoylbenzoic acid—demonstrates that Self-Encoded Libraries are suitable for high-throughput applications at the scale required for early drug discovery. Moreover, the system even allows for screening higher diversity by pooling libraries.

Unlocking Novel Targets: Flap Endonuclease 1 (FEN1), a DNA-processing enzyme, was previously considered inaccessible to traditional barcode-based screening due to its inherent DNA-binding site, which the large DNA tag would compromise or obscure. Targeting FEN1 with a focused 4,000-member library, our approach successfully identified and confirmed two compounds that inhibit its activity. This result highlights the method’s potential to unlock historically challenging drug targets that were previously intractable with existing DNA-Encoded Library technology.

The successful application of Self-Encoded Libraries to large compound sets establishes a viable, barcode-free alternative for high-throughput ligand discovery, offering a new pathway to accelerate the identification of therapeutic starting points for numerous diseases.

Edith van der Nol, Nils Alexander Haupt, Qing Qing Gao, Benthe A. M. Smit, Martin Andre Hoffmann, Martin Engler-Lukajewski, Marcus Ludwig, Sean McKenna, J. Miguel Mata, Olivier J. M. Béquignon, Gerard van Westen, Tiemen J. Wendel, Sylvie M. Noordermeer, Sebastian Böcker & Sebastian Pomplun.
Barcode-free hit discovery from massive libraries enabled by automated small molecule structure annotation.
Nat Commun 16, 9479 (2025). doi: 10.1038/s41467-025-65282-1

References

MacArron R, Banks MN, Bojanic D, Burns DJ, Cirovic DA, Garyantes T, Green DV, Hertzberg RP, Janzen WP, Paslay JW, Schopfer U, Sittampalam GS. Impact of high-throughput screening in biomedical research. Nat Rev Drug Discov. 2011 Mar;10(3):188-95. doi: 10.1038/nrd3368
Mata JM, van der Nol E, Pomplun SJ. Advances in Ultrahigh Throughput Hit Discovery with Tandem Mass Spectrometry Encoded Libraries. J Am Chem Soc. 2023 Aug 30;145(34):19129-19139. doi: 10.1021/jacs.3c04899
Favalli N, Bassi G, Pellegrino C, Millul J, De Luca R, Cazzamalli S, Yang S, Trenner A, Mozaffari NL, Myburgh R, Moroglu M, Conway SJ, Sartori AA, Manz MG, Lerner RA, Vogt PK, Scheuermann J, Neri D. Stereo- and regiodefined DNA-encoded chemical libraries enable efficient tumour-targeting applications. Nat Chem. 2021 Jun;13(6):540-548. doi: 10.1038/s41557-021-00660-y
Götte, K., Chines, S. & Brunschweiger, A. Reaction development for DNA-encoded library technology: From evolution to revolution? Tetrahedron Lett 61, 151889 (2020).
Montoya AL, Hogendorf AS, Tingey S, Kuberan A, Yuen LH, Schüler H, Franzini RM. Widespread false negatives in DNA-encoded library data: how linker effects impair machine learning-based lead prediction. Chem Sci. 2025 May 9;16(24):10918-10927. doi: 10.1039/d5sc00844a.
Henley MJ, Koehler AN. Advances in targeting ‘undruggable’ transcription factors with small molecules. Nat Rev Drug Discov. 2021 Sep;20(9):669-688. doi: 10.1038/s41573-021-00199-0.
van der Nol E, Haupt NA, Gao QQ, Smit BAM, Hoffmann MA, Engler-Lukajewski M, Ludwig M, McKenna S, Mata JM, Béquignon OJM, van Westen G, Wendel TJ, Noordermeer SM, Böcker S, Pomplun S. Barcode-free hit discovery from massive libraries enabled by automated small molecule structure annotation. Nat Commun. 2025 Oct 27;16(1):9479. doi: 10.1038/s41467-025-65282-1

The easy way to comprehensive structure elucidation

SIRIUS is the comprehensive software solution for the high-throughput identification of small molecules from fragmentation mass spectrometry data. SIRIUS provides a comprehensive set of features spanning every step from feature detection to detailed result validation. It is designed to not only accurately characterize known compounds but also to confidently identify “unknown unknowns” in complex biological samples.

Discover more

Thawing permafrost: Another step towards assessing the consequences

Wooden pipe in the forest from which spring water flows

Detecting pharmaceuticals and their transformation products with SIRIUS

Befriend your competitor: CSI:FingerID identifies metabolite linked to dual-species biofilm pathogenesis in cystic fibrosis

Share

Going Barcode-Free: Screening Massive Small Molecule Libraries for Early Drug Discovery

Going Barcode-Free: Introducing Self-Encoded Libraries

SIRIUS-COMET for Structural Annotation of Ligands

Evaluation and Validation

References

The easy way to comprehensive structure elucidation​

Discover more

Thawing permafrost: Another step towards assessing the consequences

Detecting pharmaceuticals and their transformation products with SIRIUS

Befriend your competitor: CSI:FingerID identifies metabolite linked to dual-species biofilm pathogenesis in cystic fibrosis

The easy way to comprehensive structure elucidation