Building the Virtual Protein
Escaping the Molecular Dynamics Bottleneck
tl;dr: MarS-FM is a flow-based generative model that rethinks protein simulation from the ground up, learning across biologically meaningful states rather than crawling through chronological time. Delivering a virtual cell requires first building a virtual protein, and that means efficiently mapping the full conformational landscape; especially the rare transitions that define how proteins actually function. MarS-FM does this at 600 times the speed of standard MD, generalizes to unseen protein domains, and represents our first concrete step in that direction.
Connect with us: Valence is constantly seeking talented individuals with diverse backgrounds and expertise to join our team. Explore open roles here.
The Bottom-Up Promise and Its Fundamental Limits
In our previous post, we explored the concept of the virtual cell. Achieving this requires bridging an enormous gap through two primary philosophies: Top-Down, which models functions and phenotypes directly from data, and Bottom-Up, which attempts to build life by assembling it from its fundamental constituents, atoms.
At its core, the Bottom-Up promise is simple: build a virtual molecule first, then a virtual protein, eventually a virtual complex, until we reach a virtual cell. While grounded and elegant, such an approach breaks down early due to an insurmountable computational cost. In this post, we present our MarS-FM work, with emphasis on its role within our Bottom-Up take on the virtual cell and the key bottleneck it aims to break.
Virtual Proteins: Function Beyond Structure
The centerpiece of a virtual cell is understanding what proteins look like and how they function. That is, we need to build a virtual protein first, which we can probe and simulate to determine folding, binding, conformational changes, and ultimately function. Methods like AlphaFold have dramatically expanded our ability to predict the 3D structure of a protein as they learn to generate the most stable conformation a protein would naturally find itself in. However, proteins are not static. In fact, their function precisely arises from conformational changes, resulting in their dynamics. Identifying how many states a protein can assume before targeting it in a drug-discovery campaign, and the relative probability of each one of those, or how often a protein can misfold, are all key questions that a single structural snapshot will not be able to answer. A virtual protein needs to account for dynamics.
Over the last decades, the most promising tool developed to study the dynamics of proteins (and beyond) is Molecular Dynamics (MD). This powerful computational microscope allows us to model biomolecular systems and their motion at full atomistic scales, providing not only the most stable structure but highlighting the different conformations they can take due to environmental changes, natural fluctuations, or interactions with other systems.
The Bottleneck of the Atomic Microscope
In Molecular Dynamics, each atom’s position is updated using Newton’s law, whereby forces are derived from a potential energy function associated with the system (in case you’re interested in the technical details, we recommend the Amber Reference Manual). Sufficiently long simulations will provide samples from the underlying Boltzmann distribution of the system (defined below), allowing us to access all conformations that a given system may take once in equilibrium. The bottleneck lies in the requirement for these simulations to be sufficiently long.
The Boltzmann distribution
Events in biology occur over timescales several orders of magnitude longer than the MD integration timestep. A candidate drug can stay bound to a protein for seconds, while for MD equations to be solved, one needs step sizes on the order of the femtosecond. The issue is that potential energy surfaces are often rugged, with different minima being separated by high free-energy barriers. Crossing those requires many attempts. However, it is in the rare transitions across these minima and in their frequency that our understanding of mechanisms and functions exists.
How We Simulate vs. How Biology Works: Accelerating Boring Dynamics is Not Enough
Using ML to improve the MD computational microscope is an extremely active field of research. In terms of resolution, a key application for ML force field development is to run more energetically accurate simulations through models like Orb or MACE. Nonetheless, making the force computation more expensive further squeezes the bottleneck and still relies on the paradigm of simulating long scales using tiny integration steps.
Recent models like MDGen have leveraged generative models to accelerate MD sampling by learning to sample from the transition density p(xt + τ | xt) induced by MD trajectories. Given a lag-time τ and an input frame x(t), they are trained to generate the frame x(t + τ). While models like Microsoft’s BioEmu do not require a lag time because they learn to sample protein conformations directly from sequence, they are still constrained by the rare sampling problem inherent to MD data. Most of the capacity in such models is spent focusing on thermal fluctuations because there is no indication provided during training on which states are biologically more interesting. Rare transitions will be dominated by uninteresting ones. As a result, these approaches might miss sampling rare events, especially out-of-distribution.
A Different Abstraction: Interpolating Across States, Not Time
Our current simulation tools are built around time, iterating step-by-step to see how a system evolves. As such trajectories are overwhelmingly uneventful, there is little value in asking what happens in the next 10 or 100 steps. In fact, what matters to a cell is not the exact vibrating path an atom took to get from Point A to Point B, but rather which states exist and what the probability is of moving between them.
This is the logic behind Markov State Models (MSMs), which cluster MD frames into meaningful, discrete biological states Si such as Folded, Unfolded, or Intermediate. Historically, MSMs were a retrospective post-processing tool used after running a long, expensive simulation to make sense of the data. We are now shifting this paradigm from retrospective analysis to active generative modeling by leveraging MD data processed through these MSMs.

We introduce the class of MSM-Emulators, of which MarS-FM is our primary instantiation. Instead of training models to learn how a frame evolves after a fixed chronological lag time, we train them to understand the probability that a frame in state Si remains there or transitions to a different state Sj according to the Markov chain matrix T. Through this shift, MarS-FM decouples the generative process from the data imbalance issues that typically plague traditional MD-Emulators.
MarS-FM: Markov Space Flow Matching
MarS-FM represents a fundamental shift by using flow-based generative models to sample directly from an underlying Markov Space.
Once trajectories have been clustered into metastable states, we are no longer restricted to training only on frame pairs of the form x(t) → x(t + τ) obtained by slicing the trajectory at a fixed resolution. Instead, batches can benefit from more diversity while respecting the probability distribution induced by the MSM.
Architecture and Framework: We adopt an equivariant SE(3) transformer where proteins are represented as quaternions and translations for the backbone and as torsion angles for the side chains.
Training Logic: We train the model using Flow Matching where, given noise and an input state, the target state x1 is sampled according to the probability induced by the Markov chain matrix T rather than any time slicing. This decouples the generative process from the data imbalance MD-Emulators incur.
Breaking the Bottleneck: Results & Benchmarks
The shift from time transitions to state transitions allows MarS-FM to capture large, rare conformational changes far better than existing approaches.
Capturing the Rare: MarS-FM is able to sample unfolded states within 5 samples even starting from folded, whereas MD-Emulators remain trapped around the local minimum. MD simulations might take up to billions of steps to go from folded to unfolded (or vice versa).
Out-of-Distribution Performance: MarS-FM accurately reproduces physical observables like the radius of gyration and RMSD on new domains without introducing additional biases.
Generalization over Dissimilar Domains: We benchmark MarS-FM against MDGen and BioEmu using the MD-Cath dataset. We filter out any test protein sharing more than 20% sequence similarity with any training protein, a far less forgiving criterion than standard benchmarks. Our test set consists of 495 domains that are meaningfully diverse from those used during training.
600x Speedup: MarS-FM generated protein samples roughly 600 times faster than implicit-solvent MD simulations. For a 159-residue protein, a task that takes about five hours (18,000 seconds) using implicit-solvent MD is reduced to roughly 30 seconds with MarS-FM.
Hierarchical Tree Sampling: Because MarS-FM decouples generation from strict chronological ordering, the model can generate multiple structural frames in parallel from an initial state and then recursively expand them, allowing for far faster exploration and removing error compounding—which instead occurs when rolling out MD Emulators autoregressively.
Conclusion: A Shift, Not Just a Speedup
To deliver a virtual cell, we first need to build a virtual protein. As proteins’ function stems from their motion and conformational changes, a virtual protein needs to be able to efficiently explore the energy landscape to sample rare transitions. Accelerating uninteresting dynamics will not get us closer to our goals. Rethinking how we use our computational microscopes through generative models will. MarS-FM is our first successful step in this direction.
Want to learn more?
The code is public: https://github.com/valence-labs/mars-fm
Read the paper: https://arxiv.org/abs/2509.24779
Come talk to us at ICLR: https://iclr.cc/virtual/2026/poster/10007880
This post is part of “Inside Valence”, a series where you’ll get a behind-the-scenes look at our research, exploring new ways to predict, explain, and ultimately decode biology. If this resonates, consider subscribing!










