The Virtual Cell Comeback
An introduction to simulating biology
Biology is both overwhelming and mesmerizing. Thanks to the growing availability of massive compute and fit-for-purpose datasets, we might be on the verge of a breakthrough that helps us decode that complexity: Virtual Cells are making a comeback and there’s reason to be excited.
Welcome to Inside Valence, a new, regular blog series where you’ll get a behind-the-scenes look at our research, exploring new ways to predict, explain, and ultimately decode biology. While we aren’t the first to explore the topic of Virtual Cells (e.g., here and here), we are excited to join the conversation. Throughout this series, we’ll specifically focus on the application we’re most passionate about: drug discovery.
Whether you’re looking to enter the field, or are simply curious about the frontier of AI-based drug discovery, this first post will give you a high-level introduction to what Virtual Cells could unlock, what has changed since earlier attempts, and how different groups are building them today. We plan to dive much deeper into these themes in the coming weeks. In the meantime, we’ve linked plenty of blogs and papers for those ready to dig in.
Failing Faster with Virtual Patients
Anyone exploring AI in drug discovery has likely come across the claim (e.g., here or here) that it costs north of a billion dollars and over a decade to bring a single drug to market (and that these numbers are going up). Without context, these statistics can be a bit misleading. The actual out-of-pocket costs for a single drug usually sit in the hundreds of millions. It is a large sum of money, but a lot less than the billion(s) we started with. The gap is primarily explained by capitalization (the opportunity cost of not investing that money elsewhere for over a decade) and the price of failure. For every successful drug, we also pay the R&D costs for several others that fail along the way.
This leads to another figure you’ll come across often: about 90% of the drugs that enter clinical trials fail. Some of the smartest and most passionate people in the world, backed by millions in investment, spend years painstakingly collecting evidence that a drug is effective and safe for humans. Yet only 1 in every 10 candidates actually is.
What if we could simulate the clinical trial first with Virtual Patients? Instead of failing at the end of a decade-long process, we could front-load risk, close the translatability gap (where early experiments fail to predict human outcomes) and pursue only those drugs with the highest chance of success. In theory, this could drastically cut both time and costs, leading to cheaper and better treatments for the patients who need them most. It may sound ambitious—unrealistic even—but it wouldn’t be a completely novel idea. Simulating first is the gold standard in several high-stakes industries, like semiconductors and aerospace, where complex simulations guide investment decisions prior to real-world manufacturing.
From Virtual Patients to Virtual Cells
Simulating the interactions within and between the trillions of cells that make up our bodies is a massive leap, but we can start by trying to simulate just one. While simulating a single cell won’t tell us much about patients, the research could serve as a basis for the simulations we would like to build at higher levels of biological organization. Furthermore, simulating single cells would already have therapeutic value. Many diseases are caused by cells behaving abnormally, even if their progression depends on complex interactions across tissues and systems. The ability to virtually experiment on (or perturb) a single cell—like understanding how the effects of an intervention, such as a candidate drug, propagate throughout the cell—could completely change how we do drug discovery. At its core, this is the vision for Virtual Cells.
It’s not a new idea. Understanding the rules that govern cells could even be seen as the grand challenge of all of biology. Around the turn of the century, early pioneers (like E-CELL and VCell) tried to codify these rules in computational models. To this day, it’s a research direction that captures the imagination of many scientists (see this recent work comes with cool animations!) but these efforts have been limited in the scale and complexity of cells they can model, and the use of these methods to simulate complete human cells has remained out of reach.
The renewed excitement for Virtual Cells stems from a fundamentally different approach: rather than hard-coding the rules, we learn them from data. It’s the “bitter lesson” of AI and a strategy we’ve seen play out in several domains already, from the ImageNet moment in computer vision to the rise of LLMs in natural language processing and the AlphaFold breakthrough in protein folding. In every one of these instances, fit-for-purpose datasets were the primary driver of rapid innovation; with growing data and compute, the time might finally be right to apply this same strategy to cellular behavior.
The Data-Driven Virtual Cell
Over the last decade, a new generation of biotech companies has been building automated labs, designed specifically to generate the massive, fit-for-purpose datasets that drive AI breakthroughs.
With the growing availability of such public and private datasets, a number of groups (like Arc Institute, Noetik, and Xaira) are focused on simulating the behavior of the cell in terms of its observable traits (like cell imaging or transcriptomics). If we can learn to simulate these high-level traits well, we could discover perturbations that return a diseased cell to its healthy state and infer the mechanics that drive these changes through controlled experiments. While these efforts have so far focused on single modalities, they have laid the groundwork for today’s AI breakthroughs.
Rather than starting from the top-down, companies like DeepMind and Isomorphic seem to be beginning with proteins and aim to simulate increasingly large and complex systems. From proteins to complexes, from complexes to pathways, and from pathways to whole cells. Building on the momentum of AlphaFold’s success (if you haven’t yet, The Thinking Game is worth a watch), it’s a way to reduce biology to chemistry (or even physics) and to simulate the pieces that make up the whole.
Recent attempts at Virtual Cells, like TxPert, State, and X-Cell, successfully learned from individual data layers. This is meaningful progress, but for drug discovery we will need a more holistic and rich representation of cellular behavior. The effective integration of different data layers in multimodal models is the next frontier, with groups like ourselves, at Recursion, and CZI’s BioHub already pushing in that direction.
Competitions to Define Progress
The field is picking up pace and with more and more methods coming out, it’s becoming increasingly important that we turn these abstract ideas into standardized, robust benchmarks.
In protein folding, the Critical Assessment of protein Structure Prediction (or CASP) is a competition that runs every two years. It’s where DeepMind demonstrated their AlphaFold breakthrough, but its true impact is frequently understated. Regular competitions like CASP don’t just track progress, they define it. Across multiple decades, the disciplined organization has continuously refined the competition’s design—its tasks, data splits, and performance metrics—through rigorous evaluation and community engagement. By aligning the field around a standardized measure of success, CASP allowed new research ideas to be systematically compared, providing a clear basis to justify further investments or a pivot.
Lacking an equally strong foundation, Virtual Cell research risks wasting effort as we’re entering a phase of accelerated progress. For example, early benchmarking initiatives (like in PerturbBench and here) have already started challenging claims of the perceived state-of-the-art.
Luckily, we’re also seeing early attempts to host competitions. Competitions differ from benchmarks in that methods are evaluated on a newly generated, blinded dataset in a centralized manner. Myllia organized the Echoes of Fallen Genes competition on Kaggle and the Arc Institute launched the Virtual Cell Challenge. Like CASP, the Virtual Cell Challenge is a regular (in this case yearly) competition. The ambition is there, but its inaugural edition—as is to be expected—surfaced some key challenges (like here and here). With its next edition scheduled for later this year, it will be interesting to see how the competition evolves in response to this feedback.
Our Vision: Predict, Explain, Discover
Last year, we presented our own vision for Virtual Cells in “Virtual Cells: Predict, Explain, Discover”.
As part of Recursion, a clinical-stage biotech, our vision for Virtual Cells is shaped by the challenges faced in drug discovery. Through this lens, we believe a Virtual Cell requires three core capabilities to be useful:
Predict: Virtual Cells need to predict a cell’s functional response, as measured through a number of different modalities, to a wide range of therapeutic interventions, such as how a disease impacts the cell and whether a potential drug can reverse the effects.
Explain: Virtual Cells need to explain the cascade of biochemical interactions that cause the cell’s behaviour by formulating falsifiable hypotheses.
Discover: Jointly, and by integrating them with autonomous agents and automated labs that validate AI-generated predictions and hypotheses, Virtual Cells will discover novel disease biology and treatments.
At Recursion, we have embraced the thesis of large-scale, multimodal, and consistently-generated datasets from day one. These massive datasets, as well as the infrastructure needed to scale to new modalities, and our very own supercomputer, have shaped our vision, and we’re excited to share more about our progress soon.
The Virtual Cell Comeback
Virtual Cells are not a new idea, but they may finally be tractable. Rapidly growing compute and data are cause for justified excitement. We have all the ingredients we need to make real progress on long-standing challenges in biology; now the real work starts.
This was the inaugural blog post for Inside Valence, a new, regular blog series where you’ll get a behind-the-scenes look at our research, exploring new ways to predict, explain, and ultimately decode biology. If you enjoyed this blog post, consider subscribing!








Brilliant article! Loved it!