“AI will transform biology.” As far as anything can be predicted, I know this is true.
But it is also a statement that risks being useless, or even counterproductive, for 2 reasons.
The first reason is that AI is massive. It can be applied in a myriad of ways across every aspect of the value chain.
Saying we need to use "AI" is like saying "we need to use electricity": Obvious and useless unless you talk specifics.
Much more meaningful is, “we need to apply large language models to improve the user interfaces for our complex equipment and methodologies,” or “we should use active learning to optimize the development of assays for early discovery.”
The second reason? It’s useless. It’s an empty call to arms, with no acknowledgment of all the change that will be needed to make the touted revolution come about.
In the second industrial revolution, electricity was insufficient by itself to increase productivity. People needed to first realize that it offered a way of changing the way they worked.
Factories no longer had to be arranged around massive drive-shafts powered by steam engines. Instead, they could arranged into production lines. It was the combination of new technology (electrification) and new ways of working (production lines and separation of labor) that enabled the step-change in productivity.
AI and biology: Why our approach to data is the missing link
My underlying belief is that AI and biological research don’t fit together properly yet. AI is a technology that fundamentally demands change from the people who want to use it. So for AI to have a fundamental impact on biology, we must change the way we approach the process of science in the first place.
Organizations and teams will have to adopt new mindsets, which demand new scientific processes and must be supported by an updated ecosystem of tooling. In effect, the biggest way AI will change our industry and the study of biology is the change it prompts us to make to how we work.
A conversation about AI is, by this point, really a conversation about data. So what data would an AI system need to untangle biological complexity?
Data quality and context are as important as volume
Biology's complexity emerges from the interactions of its simpler components, giving rise to unique properties and behaviors. These emergent features can't be reliably predicted from individual components, necessitating a comprehensive and interconnected dataset for a deeper understanding of biological systems.
Much of the big data produced in biology are multi-omic studies: Highly detailed molecular snapshots of a system. But apart from genomic data, all of these readouts are highly dynamic—they change over time and in response to a multitude of stimuli.
To truly understand a biological system, we must understand its dynamics as any number of factors change.
So we can’t just measure a lot of things, we have to measure them in the context of this multifactorial landscape, systematically running experiments that map the space, and allow AI to “see” what is going on.
Just sequencing something isn't enough; we must also look at how it works, interacts, and reacts to different stimuli. In our pursuit of understanding the intricacies of biological processes, it's clear that one-dimensional data alone won't lead us far along this investigative path.
Where experimental data records miss the mark
When recording experimental data, we often lose vital context regarding its production. A thorough grasp of the array of experimental inputs and their systematic variations is crucial for understanding response complexity.
Yet, the extent of data we possess about our experiments—why we selected them, our methods, lab conditions, liquid classes used with automated pipettors—is extensive, but challenging to document effectively with current processes and tools.
This not only spotlights the need to enhance data recording methods: It also emphasizes how important it is to conduct experiments that yield higher quality data from the outset.
Which means that gaining a profound understanding of this data calls for a comprehensive insight into its creation process.Experiment metadata should be a pivotal component of any future AI strategy, prompting a shift in our work methodologies.
When AI can help a scientist identify the best possible experiment, run it, help analyze the full breadth and depth of experimental data and metadata, and then use that data to decide on the next experiment, then the transformation of our industry will truly have happened.
What the future of AI and data-centric biology could look like
Some companies today are already great examples of a future-oriented approach to biological data, especially in regard to AI. Consider companies like Recursion and Insitro, with comprehensive automated platforms designed for systematic, full-digitized exploration of biological systems.
They offer a glimpse into the future: Routine generation of high-quality, multidimensional data with rich metadata. This data forms the AI foundation, revolutionizing our understanding of, and interaction with, biological systems.
Beyond data quantity, it's the depth of context that truly matters. Some companies already lead by building from the ground up for nuanced, rich data for discovery. As this approach evolves, it holds promise for the industry and for biology more widely.