From Crystals to Drugs: What Drug Discovery Can Learn From Materials Science as Revealed at ICLR’s AI4Mat 2026
tl;dr: ICLR’s AI4Mat 2026 revealed that the tools built to simulate materials science are the same tools that could simulate biology. The problem is that a crystal structure has thousands of atoms whereas a realistic drug target has hundreds of thousands, moving and shifting in ways that take only milliseconds. While the methods aren't yet scalable to drug discovery, this blog post describes what it would actually take to close that gap, and why that is important for anyone trying to build simulators of biological complexity.
Connect with us: Valence is constantly seeking talented individuals with diverse backgrounds and expertise to join our team. Explore open roles here.
Two fields, one problem, one solution
The boundary between materials science and drug discovery is dissolving faster than most practitioners in either field might have noticed. In many cases, the same tools are being built for different domains within chemical space: generative models to propose novel crystal structures, flow-matching frameworks to optimize electrolyte compositions, and agentic systems to reason about synthesis routes.
AI4Mat 2026, a workshop held at ICLR in Rio De Janeiro, brought together work spanning both worlds. Looking across the accepted posters, it became clear that the materials discovery field is, whether knowingly or not, benefiting the drug discovery field as well. The question is whether it is building it at the right scale.
Several papers at this year’s workshop sit directly at the intersection of materials science and drug discovery. In drug discovery, both the spatial scale of the systems and the temporal scale of the relevant dynamics are usually much larger than in the materials settings on which many of these methods are developed. From that perspective, Synthesis-constrained molecular design with direct optimization of reaction conditions addresses one of the most persistent failure modes in computational drug discovery: generating molecules that are chemically interesting but experimentally inaccessible. By jointly optimizing molecular structure and the reaction conditions required to make it, the work highlights that no desirable drug-like property is meaningful if synthesis is inaccessible. Similarly, FragmentFlow: Scalable Transition State Generation for Large Molecules tackles reaction pathway modeling in a way that explicitly addresses distribution shifts across molecular size, a capability that matters enormously when you need to reason about metabolic stability or covalent warhead reactivity in a drug candidate. And When Does Context Help? A Systematic Study of Target-Conditional Molecular Property Prediction asks exactly the right question for structure-based drug design: how much does knowing the target actually improve your property predictions, and under what conditions?
Taken together, these papers point toward a broader issue that matters much more in drug discovery than in most current materials benchmarks: scale has two dimensions, spatial and temporal:
Spatial: the size and complexity of the molecular system itself, from small, well-ordered structures to heterogeneous biological assemblies whose behavior is shaped by their surrounding medium containing tens or hundreds of thousands of atoms.
Temporal: the timescale over which the relevant phenomena unfold, from local rearrangements that happen quickly to conformational transitions, binding events, and allosteric effects that emerge only over much longer dynamics.
Most methods still operate comfortably only when at least one of those axes remains limited.
The timescale problem and the two ways to attack it
Pointing at scale in terms of atom count is only half the story. The other axis is time. Even if you could simulate a 100,000-atom protein-ligand system at the right level of theory, classical molecular dynamics would still be a bottleneck: the relevant conformational changes happen on timescales that are simply inaccessible to femtosecond integration steps, no matter how many GPUs you throw at the problem.
The field has converged on two serious responses: either learn to sample the thermodynamic ensemble directly, bypassing dynamics altogether, or learn to take much larger steps through time without accumulating errors. Both strategies are represented at AI4Mat 2026, and together they paint a picture of how the field is beginning to engage with the timescale problem.
Sampling the ensemble and skipping frames
The first strategy is exemplified by Boltzmann Generators for Condensed Matter via Riemannian Flow Matching. The original Boltzmann Generator framework was conceived precisely for biomolecular systems: the goal was to train a generative model that could sample thermodynamic ensembles directly, bypassing the crippling timescale problem of classical molecular dynamics entirely. The core capability being developed here, learning to generate configurations that correctly represent a Boltzmann distribution over a complex energy landscape, is exactly what is required to model molecular properties that depend on ensembles rather than single structures. Such properties, whether equilibrium or non-equilibrium, are inherently dynamic: they emerge not from a single structure, but from the distribution of states a system explores and the transitions between them across time. A model that cannot sample that landscape is not modeling the physical property itself but instead just a snapshot of it.
The second strategy is the one that has received less attention but may be more immediately practical: instead of bypassing dynamics, learn to take much larger timesteps through them. Learning Hamiltonian Flow Maps: Mean Flow Consistency for Large-Timestep Molecular Dynamics is the clearest expression of this idea. The key insight is that you do not need every femtosecond frame to understand what a molecule is doing. If you can learn a flow map that propagates the system accurately over timescales orders of magnitude larger than what classical integrators can handle, you collapse the computational cost of reaching biologically relevant timescales from intractable to feasible. This is not a new idea (coarse-graining and enhanced sampling have been around for decades) but framing it as a learned flow map with proper consistency guarantees is a meaningful advancement in the context of this field. For drug discovery, the implication is clear: conformational transitions that would require milliseconds of simulation to observe classically might become accessible in minutes.
1,000 atoms is not 100,000 atoms: from materials to drug discovery
These two strategies share a common limitation that the field has not yet fully confronted. The systems featured across AI4Mat 2026, including the most technically sophisticated ones, are typically validated on structures of hundreds to a few thousand atoms, in clean, periodic, or otherwise idealized conditions. That is an appropriate regime for many materials problems. It is the wrong regime for drug discovery.
A realistic protein-ligand system does not look like a small organic crystal. It looks like a membrane-embedded GPCR with a ligand in its orthosteric pocket, surrounded by a lipid bilayer, explicit water, and physiological concentrations of ions, a system that routinely exceeds 100,000 atoms when set up for serious molecular dynamics. Until both the ensemble-sampling and the large-timestep approaches are shown to work reliably at that scale, they remain promising ideas rather than mature drug discovery tools.
But this scale gap also points to a deeper opportunity. The answer to the timescale problem is not simply to gather better data on slow biological dynamics. It is to use machine learning to build effective representations of those dynamics in the first place: models that can either sample the relevant thermodynamic ensemble directly or propagate a system across long stretches of time without resolving every microscopic step. In that sense, ML is not just an analysis layer placed on top of a simulation. It is a way of extending the simulation regime itself.
The benchmarks don’t match the problem
This is not a hardware problem that will quietly resolve itself as GPUs get faster. It is a distributional shift that runs through every layer of the stack: the training data, the model architecture, the evaluation protocol, and the scientific question being asked. A model trained on the Cambridge Structural Database and benchmarked on held-out crystals is not being tested on anything that resembles a flexible macromolecule in a heterogeneous biological environment. The community has become very good at building systems that score well on the benchmarks it has constructed for itself. Those benchmarks do not yet correspond to the problems that matter most.
What it would actually take
What would it mean to solve this ? It would mean treating machine learning as part of the dynamical stack itself. We need training data that includes not just structures but thermodynamic ensembles, and conformational distributions rather than single geometries. But we also need models that can extend those data by learning effective samplers, coarse-grained dynamics, and large-timestep flow maps that reach regimes which brute-force simulation cannot. It would mean evaluation protocols that measure free energy accuracy, not just RMSD to a crystal structure. It would mean test systems where the ground truth is defined by not only some held-out split of synthetic data but also experiment, isothermal titration calorimetry, surface plasmon resonance, or cryo-EM ensembles. Several threads at AI4Mat point toward this: the Boltzmann Generator work on thermodynamic sampling, the target-conditional property prediction study’s interrogation of when structural context actually helps, and the synthesis-constrained design work’s insistence that experimental realizability is part of the objective. These are the right instincts. The field needs to turn these ideas into a shared research objective.
Build for where it counts
AI4Mat 2026 demonstrates, convincingly, that machine learning for molecular and materials discovery is no longer a speculative endeavor. The methods are real, the benchmarks are improving, and the community is asking increasingly sharp scientific questions. What AI4Mat 2026 makes visible, however, is not only progress within materials discovery itself, but the emergence of a methodological toolkit that could matter far beyond it. The next step is to direct that complexity toward the central bottleneck in drug discovery: accessing the relevant thermodynamic and dynamical regimes of biologically realistic systems. That will require better benchmarks and better experimental grounding, and also something more ambitious: using ML not only to score molecules or interpolate known data, but to learn the effective dynamics and ensemble structure and scale to systems that brute-force simulation cannot reach on its own.
Drug discovery is one of the hardest test cases available because it demands exactly that combination of scale, physics, and experimental accountability. The Boltzmann Generator lineage shows what becomes possible when you insist on getting the physics right. That same insistence, at scale, on biological realism, on experimental accountability, should become the field’s north star.
This post is part of “Inside Valence”, a series where you’ll get a behind-the-scenes look at our research, exploring new ways to predict, explain, and ultimately decode biology. If this resonates, consider subscribing!



