The Question Before Correctness

Reading note: this essay uses the framework's technical category names throughout (Mathematical physics exploration, Mathematical physics theory-construction, Effective model, Physics proper). The public-facing labels used on the survey and about pages map as: Mathematical exploration = Mathematical physics exploration; Theory proposal = Mathematical physics theory-construction; the other two carry the same name in both registers. The names below are the load-bearing technical ones.

Most arguments about whether a paper is good are arguments about whether it's right. There's a prior question, older than AI but easier to ignore when production was slow: is this paper a candidate solution at all?

This isn't the same as whether it's correct. A claim that fails admissibility isn't wrong — it isn't yet a candidate. Admissibility is the entry condition to the category of "things worth evaluating on their merits." Most of what comes out of generative models, and a non-trivial fraction of what came out of human authors before them, fails it. Peer review never tested for admissibility because production cost used to make it implausible to write something inadmissible at length. That assumption is dead, and it was probably always weaker than it looked.

What follows is a framework that approaches the question ontology-first. Before describing how to evaluate a specific work, we have to be clear about what kinds of theoretical work exist and how they relate to each other. The audit follows from the ontology, not the other way around.

What admissibility is actually asking

The question is not "is this true?" It is "is this what it claims to be?" Every piece of theoretical work claims a kind — it presents itself as physics-proper unification, or as effective phenomenology, or as mathematical exploration. The admissibility question is whether the work's actual structure supports the kind-claim.

The framework catches misrepresentation of kind: work that occupies one category honestly while claiming another for institutional reasons. The cure is not to demote the work — it's to ask the work to be honest about which category it actually occupies. The same content may be perfectly admissible at a lower category; the failure is the category claim, not the underlying work.

The ontology — five kinds of theoretical work

Work that touches reality occupies one of five categories. These are kinds of work distinguished by what they claim, what counts as success for them, and what evaluation they have to pass.

Pure mathematics evaluates on internal consistency alone — no anchor to reality required. Its claim is internal; no deeper layer is required.

Mathematical physics exploration develops structures motivated by physics questions, evaluated primarily as mathematics. Its claim is conditional — if a certain structure describes reality, the mathematical consequences are these. Empirical anchoring is deferred until the work makes an unconditional physical claim.

Mathematical physics theory-construction presents structures as candidate descriptions of reality with predictions in principle derivable. The claim is unconditional but not yet anchored — the structures are claimed to describe reality, but empirical demonstration is in progress or open.

Effective models fit observed phenomena with parameters anchored in measurement, operating within stated regimes, explicitly not claiming to derive primitives from a deeper layer. Its claim is bounded — within this regime, with these parameters, these phenomena are described.

Physics proper derives from named anchors and recovers established physics at boundaries. Its claim is structural — the primitives are derived, the regime is bounded by physics rather than convenience, the established results are recovered.

These are not a hierarchy of worth. A great pure-mathematics paper is great. A great effective model is great. A great physics-proper theory is great. The error the framework catches is misrepresentation of category — claiming a higher category than the work supports, because higher categories carry more institutional reward. Most of string theory operates as mathematical physics theory-construction while positioning itself as physics-proper QG. Loop quantum gravity does the same. The category collapse is doing real damage to evaluation because the institutional category "theoretical physics" treats these as one bucket and funds them on physics-proper success metrics they don't meet — because they aren't doing that kind of work.

The lower-anchor categories are essential to physics. GR's discovery required Riemannian geometry to already exist as pure mathematics. QM's discovery required Hilbert space theory. Gauge theory required fiber bundles. The pure-mathematics work that enabled these discoveries was not being done as physics. The framework doesn't exclude lower-anchor work. It requires honesty about which category the work occupies.

Theories aren't monolithic

A nominal theory typically contains components occupying different categories in different regimes. The framework runs on components-in-regimes, not on theories as wholes.

The Dirac equation is the cleanest example. In flat Minkowski spacetime for free fermions, it is physics-proper: derived from relativistic covariance plus first-order time evolution, predicting spin-½, antimatter, and g = 2 before observation. In the SM matter sector, the same equation inherits effective-model parameters — Yukawa couplings fitted to mass spectra, mixing matrices tabulated. In condensed matter, the same mathematical form emerges as a derived low-energy effective theory of band structure. Three different categories, same equation, different regimes and roles.

This is the structural fact the framework's ontology has to honor. Theories aren't atoms. Components-in-regimes are. The unit of admissibility evaluation is the component-in-regime, not the theory.

Under this decomposition, QM and the SM are effective models with physics-proper components embedded — not the monolithic "established physics" the category-collapse view treats them as. QM's foundational scaffolding (Born rule, commutators, measurement) is operationally defined. Its derived components (Dirac equation in flat space, spin-statistics, identical-particle statistics) are physics-proper. The SM's content (gauge group choice, generations, Yukawa structure) is fitted. Its derived components (gauge invariance, anomaly cancellation, the structural form of the Higgs mechanism) are physics-proper. The empirical reach of both comes from the combination — derived nuclei doing the structural work, operational scaffolding making the frameworks usable.

This decomposition is not a downgrade. Effective models that work this well are extraordinary intellectual achievements. The decomposition clarifies what kind of achievement they are. The category-collapse view sets an impossible bar for any successor work (it has to "improve on the bar that wasn't actually being met") and obscures the structural project that would represent genuine progress (finding the physics-proper layer beneath the effective models).

What closure is

A derivation chain closes when it reaches ground rather than trailing off into posits. Closure comes in three modes.

A chain reaches ground when it terminates at named anchors — empirical, structural, or established theory in its domain of validity. A chain defers honestly when it explicitly hands off to a layer the work doesn't yet reach, naming the handoff at that layer; the deferral is the work, not a hidden gap. A chain dissolves the question when it derives that the question presupposed structure that doesn't exist past a certain regime, so the question stops applying.

The third mode is the one most often misread. A theory that derives why the question its critics are asking doesn't survive its own framework is closing the question, not ducking it. Wave Relativity's treatment of QM-in-curved-spacetime sub-Planck is the worked example: the framework derives that the matter-wave and metric are aspect-projections of a single field Ψ, available in the regime where the WKB factorisation is controlled, and that both projections fail together at the Planck scale because they're projections of the same underlying structure. The question "how do we reconcile QM and curved spacetime at the Planck scale" presupposes that QM and curved spacetime are two separate things needing reconciliation. The framework derives that this presupposition fails past a specific regime, and the question dissolves rather than deferring.

The three modes are what closure is. The audit (next section) is how we test whether closure has actually occurred.

The audit — three lenses on the category claim

Now the legs. They are not three separate audits. They are three lenses on one question: is the category claim supported by the work's actual structure? Each lens illuminates a different way the claim could be unsupported.

Leg A — Are the primitives at the right level? Each category implies a particular relationship between the work's primitives and a deeper layer. Effective models have primitives that are operationally defined within a stated regime, with the deeper layer honestly deferred. Physics-proper has primitives derived from a substrate the work commits to. Leg A interrogates the move from "asserted" to "derived" that the claimed category requires. A theory that introduces five undefined densities, operators, gates, or thresholds is not derived — it is asserted, and the category-claim of physics-proper is unsupported. The same work as honest theory-construction or effective model is admissible at the lower category.

Leg B1 — Does the work operate at the level its claimed category implies? Each category implies a level of description. A work claiming to address quantum-mechanical problems has to carry quantum machinery — operator algebra, probability or flux structure, measurement-theoretic content. A work claiming to address QM+GR problems has to carry both, in the regime where they interact. Leg B1 interrogates whether the work has earned operating at the level its claimed category requires. B1 cannot be evaded by self-categorisation. A claim addressing intrinsically QM+GR phenomena (Big Bang dynamics, Hubble tension, Hawking radiation, inflation) inherits B1 from what it explains, not from how it labels itself.

Leg B2 — Is the work correctly positioned relative to established physics? Each category implies a position in the stack of theories. A work claiming to derive what lies beneath the Standard Model cannot treat the Standard Model as primitive in its derivation. A work claiming to be the deeper layer of QM cannot use QM's results as fitted inputs. Leg B2 interrogates whether the work has earned its claimed position-in-the-stack.

The legs are conjunctive at the component level — failure on one leg fails that component at the claimed category. The diagnostic value is in seeing which component fails which lens. The verdict is the decomposition — not a single pass/fail but a structure of which components close at what category.

What the framework isn't

It is not a consensus test. Theories aligned with the mainstream that fail admissibility fail. Loop quantum gravity has institutional infrastructure and fails as physics-proper QG. Sociological position is independent of structural admissibility.

It is not a complexity test. Mathematical density is not derivation closure. A 600-page document of dense formal manipulation can have all its load-bearing content posited. A short clean derivation can close.

It is not a falsification test. Falsification asks whether the theory predicts something specific enough to be checked. Admissibility asks whether the theory has the structure to be predicting at all rather than fitting. A theory can be falsifiable and inadmissible — making specific predictions while having assumed all the structure it claims to derive.

It is not a mathematical-consistency test. A theory can be perfectly consistent as mathematics and structurally inadmissible as physics. The framework runs on the physics claim, including how the mathematics is deployed as a description of reality.

It is not a verdict-producer. Passing means the work is a candidate worth further evaluation in the category it claims. Failing means it isn't yet in the candidate category at the claimed level. The framework tells you what kind of failure is occurring.

The QG project, respecified

The framework reshapes what "quantum gravity" is asking for.

QG is usually framed as "unify QM and GR." Under the decomposition, this framing assumes QM and GR are physics-proper objects to be unified. Most of QM is effective at its foundations; the cosmological-scale and high-curvature regimes of GR are where it breaks down. The unification framing has produced eighty years of work and no closed result, which suggests the framing is itself part of the obstacle.

The component-level specification: QG is the project of identifying the physics-proper layer that produces, as derived consequences,

(a) the currently-derived components of QM — the Dirac equation, spin-statistics, gauge invariance, anomaly cancellation (b) the currently-derived components of GR — the vacuum field equations from equivalence principle and self-consistency (c) the currently-effective components — Born rule, commutators, measurement, the cosmological constant, ΛCDM parameters — with named regimes of validity (d) the currently-open structural items — back-reaction, the SM gauge group choice, generation count, strong CP problem, hierarchy problem — either as derived consequences or as honest deferrals

This specification is testable in a way the unification framing isn't. A candidate QG framework can be evaluated on whether it produces specific items from this enumeration. The structural admissibility of the candidate is the question of whether the components close at the candidate's claimed primitives. Candidates that posit geometry rather than deriving it fail (a). Candidates that posit matter content rather than deriving it fail (b). Candidates that operate only in restricted unphysical regimes don't address the question at all.

Why this matters now

AI lowers the production cost of work that looks like a candidate solution without being one. The supply of structurally-inadmissible work was always positive; what's changed is that the volume now exceeds what informal community filters can absorb. Reputation against volume, legibility against AI-generated polish — both scale poorly. The implicit filters that previously did structural work as a side effect were always weaker than they looked, and they don't survive being saturated.

The right intervention is not to harden peer review with another step. Peer review is built for adjudicating between admissible candidates and does that imperfectly but real. The right intervention is to make the admissibility framework run cheaply and publicly at a layer before peer review — fast triage on whether a paper has earned the right to be evaluated at all.

The framework will be misused. Any sufficiently sharp tool gets used by people who haven't earned the right to use it. "Fails leg A" can become a pejorative dropped without examination, the way "not falsifiable" became one after Popper. The honest position is that the current system — implicit sociological filters dressed as structural rigor — is already maximally bad. The misused explicit framework at least has failure modes that can be named and contested. The implicit filter is unfalsifiable because it doesn't acknowledge itself as a filter.

The limit on what the framework can do: it needs a reader who can evaluate the answers to its lenses. Leg A requires knowing what counts as a substrate in the domain. Leg B1 requires knowing the level the problems live at. Leg B2 requires knowing the established theory and its regime of validity. In domains where the reader is competent enough to evaluate but doesn't have time to read everything that crosses their desk — which is roughly every working researcher with adjacent inflow — the framework does useful work. In domains where the reader is being asked to evaluate work precisely because they're not yet expert, it doesn't help, and that's the harder problem this doesn't solve.

It will catch the cases where the structural failure is in the text, available to anyone who reads carefully. That is most cases.

What's at the end of this

A field where the categories are explicit, where theories are decomposed component-by-component rather than judged as wholes, and where the QG project has enumerated items rather than a vague unification target. Effective models are evaluated as effective models — extraordinary achievements when they predict well within their regimes, structurally honest when they name their fitted parameters. Mathematical physics exploration is evaluated on whether the math is rich and whether it eventually connects to empirical regimes. Physics-proper claims are evaluated on whether the components close at named anchors with the closure mode appropriate to the chain.

This isn't a smaller field. It's a sharper one, with each piece of work doing the work its category requires and being judged on that. The current state has work in the wrong categories getting evaluated by the wrong metrics, with institutional cover protecting the misclassification. The framework's value is in making the categories visible enough that the misclassification is harder to sustain.

That's the move. Ontology first — what kinds exist, how they decompose, what closure is — then the audit, as three lenses on whether a category claim is honest. Admissibility before correctness; ontology before audit; categories before legs. One framework, one question: has this work earned the right to be evaluated as what it claims to be?

The Question Before Correctness ​

What admissibility is actually asking ​

The ontology — five kinds of theoretical work ​

Theories aren't monolithic ​

What closure is ​

The audit — three lenses on the category claim ​

What the framework isn't ​

The QG project, respecified ​

Why this matters now ​

What's at the end of this ​