The identification of high-affinity ligands is a foundational step in early drug discovery. Traditional high-throughput screening typically requires screening immense compound libraries, testing every compound separately using biochemical and cellular assays1. Although effective, this is a resource-intensive process, often requiring months of work and complex and costly infrastructure. The cost and complexity limit the availability of these massive platforms primarily to large pharmaceutical companies.
Affinity selection technologies offer a powerful alternative, enabling the screening of large libraries in a single experiment2. This approach typically involves exposing an immobilized target protein to a compound pool to separate binders from non-binders, then isolating the target-ligand complexes to identify the bound compounds. A crucial step in this process is the decoding of each hit, which is most commonly achieved using DNA- or RNA-barcodes attached to each ligand3. This large DNA tag represents the primary limitation, complicating synthesis4 and potentially biasing results. The barcode is typically more than 50 times larger than the small molecule and can interfere with binding5, especially for targets with nucleic acid binding sites6.
Going Barcode-Free: Introducing Self-Encoded Libraries
In our recent study—together with researchers from Leiden University, FSU Jena, and Oncode Institute—we introduce Self-Encoded Libraries, which directly screen over half a million small molecules in a single experiment without using encoding tags7. Instead, we use the molecule’s own mass signature for decoding and tandem mass spectrometry (MS/MS) fragmentation to accurately reconstruct the molecular structure of the selected ligands.
Our goal was to screen small molecules in their most native form, eliminating the potential for any structural bias from a large, attached tag. This can be achieved by using the molecule’s own mass signature for decoding, which not only provides an unbiased screen but also radically opens up the chemical reactions we can use, accelerating library creation.
This barcode-free approach offers two critical advantages:
(1) The molecule is screened in its completely unmodified form, eliminating any potential bias resulting from a large encoding tag.
(2) Self-Encoded Libraries can undergo any reaction condition compatible with the small molecule itself, enabling a broader range of chemical transformations and allowing highly diverse libraries to be synthesized rapidly (e.g., in under a week) using standard, cost-effective organic synthesis techniques.
SIRIUS-COMET for Structural Annotation of Ligands
To manage the complexity of decoding vast, untagged chemical mixtures, we introduce SIRIUS-COMET. Unequivocal structural annotation is essential for a successful screen, but is challenged by the number of isobars in large libraries, i.e. compounds with the same mass but different structures. SIRIUS-COMET is essential for accurately reconstructing the molecular structure of selected ligands from the mass spectrometry data. While the complete space of potential structures is known (the Self-Encoded Library), there are no reference MS/MS measurements for the complex mixture of substances. Instead, the library is imported into SIRIUS as a structure database (a list of SMILES). The COMET filter is designed to manage the high volume of MS/MS scans. It relies on predicted fragmentation patterns based on prominent recurring fragmentation rules for each library scaffold, drastically reducing the number of mass spectrometry scans that require full annotation. This combined approach achieved a high correct recall and annotation rate of 66–74% on the tested libraries.
Evaluation and Validation
The efficacy of Self-Encoded Libraries was successfully validated in two distinct case studies, demonstrating its capability to screen libraries of up to 500,000 members.
Validating Scale: We provided a crucial validation of the method’s large-scale capability by screening a nearly 500,000-member library for CAIX (Carbonic Anhydrase IX) ligands. While CAIX is a known oncology target with established binders, the successful identification and validation of several nanomolar binders—including the expected enrichment of 4-sulfamoylbenzoic acid—demonstrates that Self-Encoded Libraries are suitable for high-throughput applications at the scale required for early drug discovery. Moreover, the system even allows for screening higher diversity by pooling libraries.
Unlocking Novel Targets: Flap Endonuclease 1 (FEN1), a DNA-processing enzyme, was previously considered inaccessible to traditional barcode-based screening due to its inherent DNA-binding site, which the large DNA tag would compromise or obscure. Targeting FEN1 with a focused 4,000-member library, our approach successfully identified and confirmed two compounds that inhibit its activity. This result highlights the method’s potential to unlock historically challenging drug targets that were previously intractable with existing DNA-Encoded Library technology.
The successful application of Self-Encoded Libraries to large compound sets establishes a viable, barcode-free alternative for high-throughput ligand discovery, offering a new pathway to accelerate the identification of therapeutic starting points for numerous diseases.
Edith van der Nol, Nils Alexander Haupt, Qing Qing Gao, Benthe A. M. Smit, Martin Andre Hoffmann, Martin Engler-Lukajewski, Marcus Ludwig, Sean McKenna, J. Miguel Mata, Olivier J. M. Béquignon, Gerard van Westen, Tiemen J. Wendel, Sylvie M. Noordermeer, Sebastian Böcker & Sebastian Pomplun.
Barcode-free hit discovery from massive libraries enabled by automated small molecule structure annotation.
Nat Commun 16, 9479 (2025). doi: 10.1038/s41467-025-65282-1
References
- MacArron R, Banks MN, Bojanic D, Burns DJ, Cirovic DA, Garyantes T, Green DV, Hertzberg RP, Janzen WP, Paslay JW, Schopfer U, Sittampalam GS. Impact of high-throughput screening in biomedical research. Nat Rev Drug Discov. 2011 Mar;10(3):188-95. doi: 10.1038/nrd3368
- Mata JM, van der Nol E, Pomplun SJ. Advances in Ultrahigh Throughput Hit Discovery with Tandem Mass Spectrometry Encoded Libraries. J Am Chem Soc. 2023 Aug 30;145(34):19129-19139. doi: 10.1021/jacs.3c04899
- Favalli N, Bassi G, Pellegrino C, Millul J, De Luca R, Cazzamalli S, Yang S, Trenner A, Mozaffari NL, Myburgh R, Moroglu M, Conway SJ, Sartori AA, Manz MG, Lerner RA, Vogt PK, Scheuermann J, Neri D. Stereo- and regiodefined DNA-encoded chemical libraries enable efficient tumour-targeting applications. Nat Chem. 2021 Jun;13(6):540-548. doi: 10.1038/s41557-021-00660-y
- Götte, K., Chines, S. & Brunschweiger, A. Reaction development for DNA-encoded library technology: From evolution to revolution? Tetrahedron Lett 61, 151889 (2020).
- Montoya AL, Hogendorf AS, Tingey S, Kuberan A, Yuen LH, Schüler H, Franzini RM. Widespread false negatives in DNA-encoded library data: how linker effects impair machine learning-based lead prediction. Chem Sci. 2025 May 9;16(24):10918-10927. doi: 10.1039/d5sc00844a.
- Henley MJ, Koehler AN. Advances in targeting ‘undruggable’ transcription factors with small molecules. Nat Rev Drug Discov. 2021 Sep;20(9):669-688. doi: 10.1038/s41573-021-00199-0.
- van der Nol E, Haupt NA, Gao QQ, Smit BAM, Hoffmann MA, Engler-Lukajewski M, Ludwig M, McKenna S, Mata JM, Béquignon OJM, van Westen G, Wendel TJ, Noordermeer SM, Böcker S, Pomplun S. Barcode-free hit discovery from massive libraries enabled by automated small molecule structure annotation. Nat Commun. 2025 Oct 27;16(1):9479. doi: 10.1038/s41467-025-65282-1


