Demystifying Assembly Theory

“It doesn’t make any difference how beautiful your guess is, it doesn’t matter how smart you are who made the guess, or what his name is … If it disagrees with experiment, it’s wrong. That’s all there is to it.” —Richard Feynman.

Posted Sep 12, 2025 Updated Oct 17, 2025

By Amahury J. L. Díaz

22 min read

Introduction

Science advances not only by discovering new facts but also by revising the lenses through which facts become intelligible. Those lenses have a natural ordering. Epistemology (how we form justified beliefs) grounds ontology (what we take to exist) and only then can we responsibly venture into metaphysics, i.e., what we claim reality is like at its most fine-grained level. In practice, the order often gets flipped: metaphysical convictions about “what the world must be” are smuggled in first, and only afterward are epistemic and ontological accounts retrofitted to match. That reversal breeds confusion. Scientists should be more versed in philosophy.

Only by scrutinizing our current epistemologies can we create alternative narratives that go beyond our contemporary paradigms. For example, modern physics is based on a particularly successful epistemology: the Newtonian state-space program. Basically one specifies a space of states and lawful transitions (often differential equations), then unfolds trajectories. This paradigm has successfully scaled from celestial mechanics to quantum field theory. Yet, success is not universality. When we turn to phenomena that are historically contingent and path-dependent (e.g., life, cognition, technology) trajectories are not merely solutions to equations; they are stories written in constraints, reuse, and memory.

Assembly Theory (AT) enters here as an explicit attempt to make “history” measurable in the structure of objects. It proposes a way to quantify how much directed construction is implicit in an object’s form and to connect that quantity to how often such objects appear. In short, it uses complexity plus recurrence as a signature of selection. The framework has galvanized attention, praise, and sharp criticism. In this essay I am going to reconstruct AT’s epistemology, ontology, and implied metaphysics, assembling a chronological map of critiques and counter-critiques, and rendering a clear verdict on what the theory genuinely contributes, where it overreaches, and how it might be improved.

What is Assembly Theory?

AT did not arrive overnight as a fully formed metaphysic of “time bound to matter.” It emerged by accretion; from pragmatic attempts to quantify molecular complexity, through formal work on assembly spaces, to more ambitious claims about selection and evolution. In any case, AT’s central intuition is disarmingly simple: some objects bear evidence of nontrivial construction histories. Complexity, in this view, is not a property of form alone but of history—the shortest constructive story that gets you there. However, AT earliest steps were methodological. In 2017, Marshall and colleagues proposed “pathway complexity” as a probabilistic framework for biosignatures. The idea is very simple: start with basic units, specify joining operations, and ask how many steps are minimally required to build a target object along some assembly pathway. Thus, if we fix (i) a basis of allowable building blocks and (ii) rules for joining them, we can define the assembly index (AI) of an object as the minimal number of joining operations along any valid construction pathway from the basis.

As expected, the first mature implementation of AT came in chemistry. In Identifying molecules as biosignatures with assembly theory and mass spectrometry, the authors introduced the Molecular Assembly index (MA), an integer defined as the length of the shortest sequence of joins (with interim reuse allowed) that constructs a molecule when bonds are treated as elementary building blocks. The epistemic move here is crucial: complexity becomes an experimentally tractable quantity because MA can be estimated; not by searching astronomical assembly trees in silico, but by correlating MA with the richness of tandem mass spectra peak patterns. That correlation (reported as ~0.89 for their calibration set) enables a practical detection protocol that infers high-MA molecules from their fragmentation signatures. This 2021 paper couples this with a copy-number criterion, arguing that molecules requiring many assembly steps yet occurring in high abundance (e.g., >10,000 identical copies for Taxol) should be taken as robust biosignatures. This establishes the AT epistemology in its original, modest form: infer historical contingency (non-random assembly) from a physically measurable proxy (spectral fragmentation patterns) plus abundance.

That same paper also formalizes the “assembly pathway” idea as a directed multigraph and derives probabilistic bounds: as MA increases, the most likely chance-formation probability plummets; for plausible parameterizations, MA in the range 15–20 yields chance frequencies lower than a molecule per mole—below detection thresholds, hence “unlikely” to arise abiotically in detectable abundance. This is not a metaphysical claim about life; it is a practical, model-based claim about detectability under unconstrained assembly. The ontology, at this stage, is minimalist: the world contains objects whose generative histories can be compressed into a shortest-join narrative; the epistemology is that such histories leave telltale fingerprints in spectra.

The 2023 expansion of these ideas is where AT’s ambitions widened. In Assembly theory explains and quantifies selection and evolution, AT is presented as a general explanatory framework for complex systems, extending beyond molecules. The paper emphasizes “historical contingency” as an empirical axis, proposes that selection leaves distinctive statistical signatures in assembly space, and introduces ensemble-level observables intended to quantify selection’s role in producing repeated, high-assembly objects across populations. The rhetoric moves from “an experimental measure for molecules” to “a generalizable inference framework” in which selection biases the exploration of assembly spaces in time, making history (repeat production of high-assembly objects) the core diagnostic. The metaphysical gesture creeps in here: time is not merely a parameter; it is understood as the extent along which selection accumulates structure, recorded in the multiplicity of objects and their assembly numbers.

Summarizing, AT’s epistemology begins with a computable/estimable index (MA) and empirically tied observables (mass spectrometry peaks; copy numbers); its ontology posits objects embedded in an “assembly space” structured by allowable joins and reuse; and its metaphysics asserts that history (selection acting through time) is as constitutive of complex objects as their material composition. That metaphysical escalation is precisely what later drew intense criticism; but as an account of origins and motivations, AT evolved from a chemometric tool for life detection into a more sweeping thesis about the arrow of time in complex systems. The remainder of this essay evaluates whether the data and mathematics warrant that expansion.

Criticisms and Counter-criticisms

Because AT straddles measurement, modeling, and metaphysics, its reception has unfolded as a dialogue across those layers. The chronology matters. As I am about to scrutinize, early friction centered on methodology and claims of exclusivity; later rounds concerned formal equivalence to known complexity measures, and the most recent critiques take aim at AT’s evolutionary interpretation.

Following the 2021 demonstration of MA as an experimental handle on complexity, critics pressed on two fronts: the evidential status of the “MA ≥ 15 in high abundance implies life” rule of thumb, and whether AT’s performance is unique. Hazen et al. directly tested the threshold by computing MA for mineral heteropolyanions (abiotic molecules) and reported values in the teens and beyond, concluding that MA ≥ 15 cannot serve as an unambiguous biosignature and identifying further issues (ambiguities about what counts as “high abundance,” disconnections between idealized assembly paths and real chemical kinetics). This struck at the strongest reading of the 2021 claim: high MA with abundance may often coincide with biology, but it is not unique to it. In 2024 assembly theorists made a reply to Hazen et al., and soon after that Hazen et al. reaffirmed their criticism.

Parallel to this, a line of critique argued that AT overstates novelty. In On the salient limitations of the methods of assembly theory and their classification of molecular biosignatures, Zenil and colleagues contended that the AT pathway method is “an encoding scheme widely used by popular statistical compression algorithms.” They reported that, on the same datasets, simple compression-based or algorithmic-probability-inspired metrics could match or outperform AT-derived separations, and that AT’s distinction between living and nonliving molecules had been anticipated in earlier work using related ideas. The critique’s epistemic thrust is not merely “AT is wrong,” but “AT is not privileged”: it does not add discriminative power beyond a family of established information-theoretic approaches.

On the other hand, Johannes Jaeger’s Assembly Theory: What It Does and What It Does Not Do took a clarifying line: AT, he argues, is best seen as detecting and quantifying biases induced by higher-level constraints in rule-based worlds; that is compatible with selection, but does not, by itself, identify selection or apportion causal weight among competing mechanisms. If true, the scope of AT narrows; it is a descriptive lens on bias in assembly rather than a theory of evolution’s dynamics. That observation especially targets the 2023 ambition to “quantify selection,” suggesting instead that AT measures a pattern (bias), not a process (selection).

As we will see later, assembly theorists have replied to both Zenil’s and Jaeger’s criticisms, but before that we should talk about Assembly theory is an approximation to algorithmic complexity based on LZ compression that does not explain selection or evolution, where Zenil’s team argued that the assembly index is effectively the size of a compressing context-free grammar (that is, an LZ-family compression) and thus bounded by Shannon entropy in stochastic settings, with limited capacity to capture generative (non-stochastic) structure. Zenil et al. further claimed that classic toy examples used to illustrate AT are canonical LZ teaching cases, reinforcing the point that AT parsing mirrors dictionary-based compression.

As anticipated, the AT founders replied with Assembly Theory and its Relationship with Computational Complexity, which situates AT as a theory of observables tied to historical constraints and explicitly argues that MA (and ensemble measures built on it) are not equivalent to Shannon entropy, Huffman coding, or Lempel–Ziv–Welch compression as claimed by Zenil et al. The preprint includes constructive counterexamples and complexity-theoretic observations (e.g., NP-completeness of certain assembly-related decision problems) to buttress the claim that AT sits outside simple compression formalisms. It reframes AT’s core commitment: not to compressibility per se, but to repeatable production of high-assembly objects in time under selection-like constraints. Even so, the main basis for distinguishing AT from compression algorithms is the difference between computation and construction. However, I don’t think the AT founders have yet provided a formal distinction between the two.

A counter-counter followed: Assembly Theory Reduced to Shannon Entropy and Rendered Redundant by Naive Statistical Algorithms reanalyzed the 2021 datasets, reporting that simple string-length or LZ-style metrics separate classes comparably and arguing that any copy-number-based measure will reduce to statistical compression under mild assumptions. Methodologically, they also contend that specific quantile choices in the 2021 regression were not justified, and that the “MA ≥ 15 + abundance” rule has analogues in arbitrary thresholds on correlated statistics. Again, the rhetorical center here is redundancy: if naive metrics do as well, AT’s claimed explanatory leverage diminishes.

Two additional lines of critique deepen the picture. Hazen et al. already showed that abiotic molecules can register MA in the “biosignature” regime, undercutting uniqueness claims and calling for a sharper theory of abundance and pathways; they also noted uncertainties about how “pathway complexity” maps onto real kinetics and thermodynamics. At a broader conceptual level, the review ‘Assembly Theory’ in life-origin models by David Lynn Abel surveys AT’s relevance to abiogenesis and catalogs practical limitations (from physiochemical controls to measurement issues), effectively urging a more conservative reading of what AT can presently establish about life’s onset. The point is not that AT is wrong, but that its scope is narrower than its most expansive slogans suggest.

On the other hand, Michael Lynch’s Complexity myths and the misappropriation of evolutionary theory shifts the debate to population genetics. He argues that the ensemble-level “assembly measure” promoted in the 2023 paper lacks clear biological grounding, conflates mutation with selection, and treats complexity as if selection relentlessly ratchets it upward, contrary to well-established theory and evidence. Where the earlier disputes centered on whether AT is just compression, Lynch questions whether the AT observables, even if distinct, track relevant evolutionary quantities at all. In his view, the indices are numerically and conceptually ill-posed for separating mutational input from selective filtering; thus, inferences about selection’s “magnitude” from the proposed assembly measure are suspect.

Taken together, the dialectic now looks like this. AT’s early chemometric claim (that MA can be estimated from spectra and used as part of an agnostic life-detection protocol) has real empirical bite but is not unique, and its strongest biosignature threshold needs revision. The theoretical claim (that AT is not just compression) remains actively contested: the founders provide constructive distinctions and problem-class separations; second order counter-critics provide reductions and equivalence proofs in stylized settings. Finally, the evolutionary claim (that AT “quantifies selection and evolution”) faces its own hurdle: even granting distinctiveness, do AT observables map onto the causal structure of selection in populations? On that last point, the burden of proof has shifted back to AT’s authors to provide analyses that explicitly separate generation (mutations, developmental constraints, chemical reachability) from sorting (selection, drift) in ways that standard evolutionary theory recognizes. Until then, AT’s most secure footing remains what it began as: a historically minded way to detect biased assembly. Sometimes biological, sometimes not.

Final Verdict

A fair verdict on AT must separate three questions that its reception has too often entangled: What, exactly, does the framework measure? What explanatory work can that measurement support (especially in the context of evolution)? And under what conditions does AT do something that existing information-theoretic or graph-theoretic tools do not?

Let’s recap. In Identifying molecules as biosignatures with assembly theory and mass spectrometry, the central quantity (MA) was introduced as a lower bound on the directed “making” required to construct a molecule from an explicit alphabet and join rules. Crucially, MA was tied to an experimental observable (mass spectra fragmentation patterns), which is what first made AT compelling: it promised a bridge from an abstract notion of historical burden to something a laboratory could actually read off. The subsequent theoretical enlargement in Assembly theory explains and quantifies selection and evolution aimed to elevate this bridge into a general principle: if objects with large assembly burden occur in high copy number, then selection must be at work. This is an evocative claim. It makes history empirically legible in matter and, by coupling burden to abundance, seeks a signature of selection that does not presuppose a Darwinian substrate. The promise, at least at this level, is real.

But promise is not uniqueness. Critics have shown that much of AT’s empirical leverage can be replicated with simpler tools. The strongest biosignature slogan implied by the early molecular paper—that above a certain assembly threshold, abundance signals life—was undercut by counterexamples in geochemistry: in Molecular assembly indices of mineral heteropolyanions: some abiotic molecules are as complex as large biomolecules, high MA values appear in unambiguously abiotic contexts. This does not refute the measurement; it refutes the exclusivity. It tells us that “large MA + abundance” is not, by itself, a decisive signature of life. A different family of critiques focused on formal identity: Assembly theory is an approximation to algorithmic complexity based on LZ compression that does not explain selection or evolution and On the salient limitations of the methods of assembly theory and their classification of molecular biosignatures argue that the assembly index effectively behaves like dictionary-based compression or Shannon-bounded statistics on relevant encodings, while reanalyses of the original datasets suggested that naive proxies (even string length) can reproduce separations attributed to MA.

Against this, the founders’ preprint Assembly Theory and its Relationship with Computational Complexity offers explicit counterexamples and complexity-class arguments to deny a strict reduction of assembly index to Huffman/LZ/Shannon; a subsequent rejoinder, Assembly Theory Reduced to Shannon entropy and Rendered Redundant by Naive Statistical Algorithms, reiterates the empirical redundancy claim. Meanwhile, Assembly Theory: What It Does and What It Does Not Do reframes AT more modestly: as a detector of bias induced by constraints in rule-based worlds, not as a measure that, by itself, identifies Darwinian selection. Complexity myths and the misappropriation of evolutionary theory pushes further, arguing that ensemble-level “selection indices” promoted in the highly-cited 2023 paper conflate mutation supply, drift, and selection and thus lack clear population-genetic meaning. These exchanges delineate the space of a responsible verdict: AT’s measurement is useful at the molecular level, but its strongest inferences require careful qualification. What follows from that?

First, it is critical—conceptually and practically—to incorporate historical contingency explicitly into the physics of complex systems. AT’s great service is to force researchers to declare an ontology (what counts as a part, what joins are allowed) and then to read history as a constrained walk in that declared space. This is not a rebranding of information theory; it is a distinct epistemic stance: the goal is not to compress a description but to lower-bound the directed causation necessary to make an object. The theoretical preprint defending AT is persuasive on this point at the level of formal separations. The assembly index, as defined, is not simply Shannon entropy or LZ complexity in disguise. Yet the critics have a point in practice: many estimators of the assembly index presently used on real data behave similarly to statistical compressors because they piggyback on the same signals (e.g., repetition, reuse) and are read out from observables (e.g., mass spectrometry peak patterns) that strongly correlate with simple statistics. The right lesson is not that AT is “just compression,” but that AT’s empirical distinctiveness must be earned by experimental designs and analyses where ontological structure matters and symbol statistics mislead. That is a high bar, but it is reachable. Still, a formal distinction between computation and construction by the assembly theorists remains elusive.

Second, AT must confront an evolutionary world in which the alphabet itself evolves. In living systems, the library of parts is not fixed: new catalytic motifs appear, old ones are co-opted, modules fuse and split. If the basis remains static, assembly indices can make difficult things look easy (once a new primitive is admitted) or easy things look difficult (when a primitive that emerged historically is disallowed by fiat). The cure is also what would most deepen AT’s physics: promote the basis to a time-indexed object and the assembly space to an expanding directed (hyper)graph. With that move, one can distinguish construction length (joins required given the current alphabet) from innovation depth (the number and magnitude of basis expansions required). This is where AT can genuinely advance beyond graph-theory routines. By explicitly coding alphabet growth as part of the state of the world, it can assign physical meaning to “stepping stones” as innovations that shorten vast swaths of future assembly. It also aligns the measurement with what evolution actually does; change the repertoire of available parts.

Third, probability cannot remain implicit. Copy number is a frequency, not a generative model. If we want to talk about selection rather than mere bias, we must articulate a probabilistic null over assembly paths. In the spirit of the 2021 and 2023 papers, the natural extension is to assign transition kernels (rates or propensities) to join rules and to define the likelihood of an object as the sum of the probabilities of its paths. From there, one can define a selective lift (the ratio of observed frequency to model-predicted frequency) and a path entropy that quantifies how many distinct histories plausibly lead to the same object. This converts AT’s rhetoric about “history and abundance” into a testable inference: one compares ensembles (neutral reachability, catalysis-biased kinetics, autocatalytic closure, Darwinian replication with heredity) and asks which explains the joint pattern of frequencies and path-statistics best. In that picture, AT does not declare selection; it helps to diagnose it by making the consequences of selection (and of its nulls) concrete in assembly space. It also resolves, rather than sidesteps, the concerns in Assembly Theory: What It Does and What It Does Not Do and Complexity myths and the misappropriation of evolutionary theory; selection becomes a model-comparison claim, not a scalar readout.

Fourth, AT sometimes presents itself as more “objective” than observer-centered information measures because it grounds numbers in physical reuse of parts rather than in descriptive encodings. There is something right here: careful ontological declaration does discipline interpretation. But the objectivity is conditional. The choice of alphabet and join rules is theory-laden; different, equally defensible alphabets can change indices and their empirical behavior. The proper conclusion is not that AT is subjective, but that its objectivity is relational: given a declared ontology motivated by chemistry, materials science, or technology (and justified by independent evidence), AT yields numbers that are intersubjectively stable and experimentally probeable. That is exactly the sense in which any physical theory is objective. The critics’ insistence on comparing AT to naive baselines is, from this angle, salutary: if AT’s ontology is doing real work, it should outperform baselines on tasks designed to expose that work.

Fifth, the MA threshold story has already been tempered by the mineral counterexamples. This should not be treated as a failure of the program but as a guide to better practice. What is needed are preregistered, adversarial tests that include “hard abiotic” cases alongside biological mixtures, with clear, physically motivated alphabets and transparent path-probability nulls. The 2021 paper’s experimental cleverness, tying MA to spectral richness, remains a model for how to operationalize an abstract index; the next step is to demonstrate that dynamic alphabets and path-likelihoods can differentiate cases where compression-style baselines probably fail. The rejoinders on computational complexity show that strict reductions to Shannon/LZ can be blocked in principle; the experimental literature must now show that AT’s additional structure matters in practice. Until then, the critics’ claims of redundancy, even if overstated, retain empirical bite.

Finally, what does AT offer that traditional evolutionary biology does not? Classical theory tells us how allele frequencies change under selection, drift, mutation, and migration; it excels at lineage-level dynamics given heredity. AT, at its best, complements that story at the materials level. It is a calculus for the historical burden etched into objects (not lineages) under constraints of composition, energetics, and reusability. That vantage is particularly valuable where the units of interest are composites with rich substructure (molecules, machines, code), where “evolutionary stepping stones” are not merely metaphorical but explicit innovations that reshape future possibility. In that domain, AT can make precise the intuition that history is stored in the architecture of things. The prize worth aiming for is not a grand unification of physics and biology, but a robust measurement theory of constructive history that interfaces cleanly with kinetics, thermodynamics, and population processes.

The verdict is mixed but constructive. AT is not a revolution that renders computation or information theory obsolete, nor is it a mere restatement of compression in a new idiom. It is a promising measurement-first framework that will earn its keep when it (i) treats alphabets as evolving, (ii) embeds explicit probabilistic nulls over assembly paths, and (iii) demonstrates empirical wins where ontological structure (not symbol statistics) does the explanatory lifting. The early molecular paper showed how to turn a philosophical idea into a lab protocol. The 2023 theoretical expansion showed the ambition—perhaps too quickly—of making selection legible in that protocol’s numbers. The best path forward is neither to canonize nor to discard AT, but to discipline it: retire universal thresholds, adopt dynamic assembly spaces, formalize stepping-stone probabilities, and prove superiority against strong baselines on hard cases. Do that, and AT will stand as what it plausibly always was: not a metaphysics of life, but a durable physics of how history becomes matter.

Conclusion

Assembly Theory is best understood as an epistemic proposal: make the past measurable in the present by quantifying directed construction under explicit ontological commitments. That epistemology is valuable. As an ontology, AT models a world of assembly spaces in which constraints, reuse, and symmetry shape what becomes abundant. As metaphysics, it suggests that time is etched into matter, not as dynamics alone, but as historical burden.

The controversy arises when this epistemology is mistaken for a universal metaphysics as the authors have made it so far. Threshold biosignatures fail on counterexamples; compression-based baselines often match or exceed performance; and scalar indices risk reifying selection rather than inferring it. The founders’ formal counterexamples and complexity-class arguments successfully block strict reductions to Shannon/LZ measures and clarify that their Assembly Index is not a Kolmogorov Complexity surrogate. But the critics are right that much of AT’s empirical practice can mirror compression unless its unique ontology is doing identifiable work.

Seen through the proper order (epistemology → ontology → metaphysics) AT is neither a revolution that unifies physics and biology nor a redundant rebranding of entropy. It is a promising measurement theory of constructive history. Strengthened with dynamic alphabets, explicit path probabilities, and rigorous model comparisons, AT could become a robust tool for detecting constraint-driven assembly (sometimes selection) in chemistry and beyond. Absent those upgrades, its grandest claims will continue to outpace what its indices can soundly show.

Opinion

essay

This post is licensed under CC BY 4.0 by the author.