AI in natural products drug discovery

Artificial intelligence facilitating natural product drug discovery

Finding the needle in the haystack of molecules produced by nature

Saarbrücken, September 11, 2023 – Drug discovery is slow and tedious, especially for finding natural products, which could serve as scaffolds for potential drug candidates. How can artificial intelligence speed up this process? A consortium of scientists, amongst them also Prof Olga Kalinina, Prof Andrea Volkamer,  Prof Rolf Müller and Prof Anna Hirsch from the Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), has written a review about the current applications of artificial intelligence (AI) in natural product research and how AI can assist in drug discovery projects in the future. The review paper is a direct result of the "Lorentz Workshop on AI and Natural Product Drug Discovery and Design", which was co-organised by Prof Hirsch and took place in September 2021. The HIPS is a site of the Helmholtz Centre for Infection Research (HZI) in cooperation with Saarland University. 

Due to the availability of large-scale omics data and enhanced computing capabilities, natural product research is witnessing a renaissance in both academia and industry. Natural products, also known as secondary metabolites, pose a promising source for active compounds due to their relatively high degree of three-dimensionality, compared to mostly “flat” synthetic structures. Since these substances are of natural origin, they are very likely to be substrates for transporter systems, which will eventually be beneficial for the drug to reach its target. The prediction of the molecular targets of the drug candidates, their biological activities as well as possible side effects are among the most important application areas for AI in natural product research.

Currently, AI-based approaches are focussing on DNA sequences to predict the chemical structures of the natural products produced by the biosynthetic gene clusters (BGCs), which are encoded in the genome of the producing organisms. Fortunately, over 2,500 BGCs and their products have already been experimentally characterized. This paves the way for computational genomic analysis, also known as genome mining, to possibly identify a multitude of biosynthetic pathways for novel molecules. A key challenge for genome mining is the identification of novel types of BGCs and unclustered biosynthetic pathways, where the gene sequences are not in close proximity but more or less “scattered” over the genome. Especially in these areas, machine learning algorithms can help to discover them. However, an expert is still needed to manually update the BGC boundaries as input for machine learning. Moreover, the prediction methods for bioactive compounds require further improvement and research.

In their review, the researchers mention the lack of high-quality standardized data as one of the bottlenecks of current AI technologies to exploit their full potential. These algorithms generally rely on training datasets that should be sufficient to support the model’s complexity; only then, their performance can be improved. It is crucial that different datasets and annotation methods are consistent in order to advance for example natural product discovery. The authors encourage researchers to include a standardized machine-readable file for all of the compounds they describe in their papers. This file should contain necessary information like the annotated chemical structure, compound name, the producing organism and BGC. Subsequently, these compounds can be imported into databases by natural-product-centric resources.

"AI-based approaches offer the chance to make the lengthy and expensive process of drug development much more efficient and thus to arrive at effective and safe drugs more quickly," says Anna Hirsch, one of the corresponding authors of the review article and head of the Drug Design and Optimisation Department at HIPS. Furthermore, the collective resources of the global scientific community are by far bigger than the capacity of any single lab. If general guidelines were available for the generation of community and curated datasets, this would advance the application of AI in natural product drug discovery.

Original publication:
Marnix H. Medema and colleagues. Artificial intelligence for natural product drug discovery. Nature Reviews Drug Discovery, 2023. DOI: 10.1038/s41573-023-00774-7

More News