Making Data Work: How AI and Multi-Omics Integration Are Reshaping Drug Discovery

AI alone isn't enough. Here's how smarter data integration is accelerating drug discovery.
By
Maria Giovanna Trovato
July 9, 2025
Share this post
Making Data Work: How AI and Multi-Omics Integration Are Reshaping Drug Discovery

Across drug discovery today, one thing keeps coming up: we’re moving fast—but not always together.

Whether it's high-throughput screening, single-cell metabolomics, or using AI to surface new targets, the same issue shows up again and again: we have the data, but it’s scattered. Fragmented systems, missing metadata, and disconnected insights make it hard to translate findings into action.

Files sit in different systems. Metadata is missing or unstructured. Even strong results are hard to trace, replicate, or scale. The problem isn’t just technical—it’s that we’re trying to drive innovation without full context.

AI can help—but not alone. We need models built with scientists in the loop, grounded in real workflows, and trained to capture complexity, not erase it.

At the Institute for Experiential AI, I’ve spent the past year working side-by-side with research teams, biotech leaders, and technical experts to address this exact gap. Not with more buzzwords—but by building systems that actually work across teams. That align data to what matters. That support long-term progress, not just quick wins.

This topic came up again and again at the Bio-IT World Conference earlier this year—and it’s still central to nearly every conversation I’m having across the field.

The Scientific Case for Multi-Omics Integration

Multi-omics data—genomics, transcriptomics, proteomics, metabolomics, epigenomics—offers a layered view of biology that no single data type can provide alone. When you connect these layers, you don’t just get more data—you get a clearer view of what’s really going on inside the system1.

Benefits of multi-omics integration include:

  • More precise target identification and validation, as overlapping signals across omics increase confidence in causal mechanisms2.
  • Stronger patient stratification, by capturing both genetic predispositions and phenotypic manifestations3.
  • Reduced false positives in biomarker discovery, through cross-validation of molecular signatures4.
  • Improved modeling of drug resistance, particularly in oncology and immunotherapy, where single-omics data often misses adaptive cellular responses5.
  • Better understanding of disease heterogeneity, especially in neurodegenerative and autoimmune disorders where etiology spans multiple biological scales6.

Still, every data type has its blind spots.
Transcriptomics can mislead if you don’t check downstream protein activity.
Metabolomics shifts with handling or timing—context matters.
And in single-cell RNA-seq, dropout or amplification bias can distort the picture unless anchored with proteomic or epigenetic data.

This becomes especially clear in oncology, where only by integrating metabolic flux with immune profiling have we uncovered how tumors change their environment to survive therapy—signals completely missed in genomic-only views.

However, these gains depend on interoperability—standardized formats, shared ontologies, and robust metadata pipelines that can support federated learning across studies. AI amplifies this integration by learning latent relationships between data layers, but without careful preprocessing and contextualization, even the best models risk amplifying noise rather than insight.

Three Real Challenges We’re Tackling

  1. Optimizing Formulation Pipelines with AI Formulation optimization is often manual and fragmented.
    We’re building AI models to streamline rheology, dissolution, and performance evaluation across the formulation lifecycle—aiming to speed up iteration while reducing cost and variability.
  2. Unifying Unstructured Data Across Organizations Critical pipeline decisions are based on incomplete or siloed data.
    We’re developing multi-objective optimization tools that can ingest and interpret fragmented, unstructured datasets to support more informed strategic planning.
  3. Scaling Experimental Metabolomics and Failure PredictionHigh heterogeneity and unannotated signals in mass spectrometry limit insight.
    Working alongside omics teams, we’re exploring AI for machine optimization, experimental design, and proactive failure detection—especially in single-cell studies and cloud-based workflows.

Why Data Integration Matters in Real-World Applications

Data silos aren’t just frustrating—they’re a real barrier to progress. They slow down trials. Hide important patterns. And make it hard to track what worked, when, and why.

Real-world failures have highlighted this risk. In one multi-omics initiative, predictive models built on siloed datasets showed high internal accuracy—but failed to generalize across external cohorts due to metadata inconsistencies and batch effects7.  In another case, downstream validation stalled when computational insights weren’t aligned with wet-lab priorities, underscoring that successful integration isn’t just about data fusion, but human collaboration and biological grounding8.

When we bring together structured and unstructured data—from lab notebooks and LIMS, to clinical data and omics pipelines—we unlock:

  • Faster, more confident decision-making
  • Workflows that can recover from failures because there’s context
  • Transparent, reproducible pipelines ready for regulatory review

It is particularly important to emphasize that harmonizing data across multiple domains significantly boosts translational potential and expedites the pathway from biomarker discovery to clinical implementation.9 Similarly, multimodal integration of molecular diagnostics, radiological and histological imaging, and coded clinical data enables next-generation biomarkers that better predict resistance mechanisms and support personalized cancer care10.

In short, AI alone cannot transform R&D—but AI trained on well-integrated, context-rich data can unlock its full potential. In other words, the success of AI in biomedicine will hinge not on model sophistication alone, but on how well we structure, link, and curate the data feeding those models.11

This reinforces a growing consensus: the future of drug discovery won’t be driven by algorithms in isolation—but by intelligent systems embedded in real workflows, trained on cohesive, interoperable datasets, and guided by scientists who understand the biology behind the signal.

References:

  1. Hasin, Y., Seldin, M., & Lusis, A. (2017). Multi-omics approaches to disease. Genome Biology, 18(1), 83. https://doi.org/10.1186/s13059-017-1215-1
  2. Dugourd, A., Kuppe, C., Sciacovelli, M., Gjerga, E., Gabor, A., Emdal, K. B., Vieira, V., Bekker‑Jensen, D. B., Kranz, J., Bindels, E. M. J., Costa, A. S. H., Sousa, A., Beltrao, P., Rocha, M., Olsen, J. V., Frezza, C., Kramann, R. & Saez‑Rodriguez, J. (2021) Causal integration of multi‑omics data with prior knowledge to generate mechanistic hypotheses, Molecular Systems Biology, 17(1), e9730. https://doi.org/10.15252/msb.20209730
  3. Kioroglou, D., Gil‑Redondo, R., Marigorta, U. M. et al. (2025). Multi‑omic integration sets the path for early prevention strategies on healthy individuals. npj Genomic Medicine, 10(1), 35. https://doi.org/10.1038/s41525-025-00491-7
  4. Du, P., Fan, R., Zhang, N., Wu, C., & Zhang, Y. (2024). Advances in integrated multi-omics analysis for drug-target discovery. BioData Mining, 17, 26. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11201992/
  5. Ma, A., Xin, G. & Ma, Q. (2022). The use of single-cell multi‑omics in immuno‑oncology. Nature Communications, 13, 2728. https://www.nature.com/articles/s41467-022-30549-4
  6. Eteleeb, A. M., Novotny, B. C., Tarraga, C. S., Sohn, C., Dhungel, E., Brase, L., et al. (2024). Brain high-throughput multi-omics data reveal molecular heterogeneity in Alzheimer’s disease. PLOS Biology, 22(4), e3002607.  https://doi.org/10.1371/journal.pbio.3002607
  7. Zheng, Y., Liu, Y., Yang, J., Dong, L., Zhang, R., Tian, S., Yu, Y., Ren, L., Hou, W., Zhu, F., Mai, Y., & Han, J. (2023). Correcting batch effects in large-scale multiomics studies using ratio-based scaling of reference material. Genome Biology, 25(1), 254. https://doi.org/10.1186/s13059-023-03047-z
  8. Robinson, M. D., Cai, P., Emons, M., Gerber, R., Germain, P.‑L., Gunz, S., Luo, S., Moro, G., Sonder, E., Sonrel, A., Wang, J., Wissel, D., & Mallona, I. (2024). Ten simple rules for collaborating with wet lab researchers for computational researchers. arXiv. Available at: https://arxiv.org/abs/2402.18348
  9. Subramanian, I., Verma, S., Kumar, S., Jere, A., & Anamika, K. (2020). Multi‑omics data integration, interpretation, and its application. Bioinformatics and Biology Insights, 14, 1177932219899051. https://pubmed.ncbi.nlm.nih.gov/32076369/
  10. Boehm, K. M., Khosravi, P., Vanguri, R., Gao, J., & Shah, S. P. (2022). Harnessing multimodal data integration to advance precision oncology. Nature Reviews Cancer, 22(2), 114–126. https://pubmed.ncbi.nlm.nih.gov/34663944/
  11. Zitnik, M., Nguyen, F., Wang, B., Leskovec, J., Goldenberg, A. & Hoffman, M. M. (2018). Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. PNAS, 115(17), E3904–E3911. https://pmc.ncbi.nlm.nih.gov/articles/PMC6242341/