9 May 2026 · method · citation · epistemology

The citable-claim test

A simple test for whether a research tool produces output you can defend in a footnote: can you, in one click, see the page the claim came from? If not, the tool is for exploration, not for scholarship.

By Maitrayee Roychoudhury · Last updated 9 May 2026

On this page

What the test actually checks
Why this is a structural property, not a quality knob
What the test looks like in Archēglyph
Where this fits
Unfamiliar with a term?

There’s a quiet test I run on every new “AI for research” tool that crosses my desk, and it cuts through the marketing copy in about thirty seconds. It’s this:

Pick a single sentence in the tool’s output. Can you, in one click, see the page in the original source the sentence came from?

If yes, the tool is a candidate for citable scholarship. If no — if you have to manually re-search to verify it, or if the “source link” points at a paragraph that doesn’t quite say what the tool’s sentence claims — the tool is for exploration. Both are useful. They are not interchangeable.

This is the test we built Archēglyph to pass.

What the test actually checks

A footnote in academic writing makes a quiet but specific promise: if you go look at the cited source on the cited page, you will find the thing I am attributing to it. That’s the contract scholarship runs on. Peer review enforces it; tenure committees notice when it breaks; entire careers turn on whether the contract holds.

The citable-claim test asks: does this tool let me make that promise about anything I quote or summarise from its output?

Three failure modes are common in current “AI research” tooling:

Failure 1 — The source exists but doesn’t say it. The tool produces a sentence and footnotes it to a real document. You click through. The document is real. The page exists. But the sentence the tool composed isn’t quite what the source says. It might have shifted a tense, dropped a hedge, conflated two adjacent claims, or attributed a quotation to the wrong speaker. You only catch it if you read the source carefully — at which point the tool has saved you no time.

Failure 2 — The source link is to the document, not the page. You’re cited “Smith 2018” but the document is 280 pages. To verify, you read the whole thing or full-text search for the claim. Often the search misses because the tool paraphrased.

Failure 3 — There is no source. The tool produced a confident assertion with no citation at all. This is the cleanest failure because it’s obvious; it’s also the most common in casual chatbot output.

A tool that passes the citable-claim test avoids all three by construction: it shows you fragments from the source instead of generating new prose about them.

Why this is a structural property, not a quality knob

You can’t get a generative system to pass this test by being more careful. You can lower its failure rate with better retrieval, better prompting, better grounding — and the best deep-research agents have lowered it considerably. But the structure is still:

sources → model → new prose → footnote → sources

There’s a loss-of-fidelity step in the middle. The new prose is about the sources; it isn’t from them. Whether that loss is 5% or 35% depends on the day, the question, and the model. It’s never zero, because the model’s job is to write something new.

The structural alternative is:

sources → indexed fragments → fragments shown → click → source

No new prose in the middle. The tool’s job is to make the existing text findable, not to compose new text on top of it. The fragments the researcher reads are the sources, sliced, indexed, surfaced with context. You can quote them word-for-word; the audit trail is trivially short.

This is why the citable-claim test discriminates so cleanly: it’s asking which structure the tool uses, and the structure cannot be faked.

What the test looks like in Archēglyph

Open a search result. The snippet you see is the actual chunk text from the bundle’s SQLite, with the matched terms wrapped in <em>. Click it. You land on the review page for the document the chunk came from, scrolled to the region. The bbox is highlighted on the source image. There is a ProvenanceBadge showing the OCR engine that read it and the date. If you don’t trust the OCR, run it again with a different engine on that one region; the system keeps the audit trail.

Open a cluster. The exemplar quotations on the card are real chunks from the corpus. Click one — same trip back to the source page.

Now write your footnote. “Le Figaro, 11 January 1924, p. 3” — exactly what the chunk’s source-link gave you. The page is on disk; you (and your reader) can return to it forever.

That’s all the test asks. The tool either does this or it doesn’t.

Where this fits

Tools that pass the citable-claim test are good for the writing end of research — the parts where your name goes on the claim. Tools that fail are good for the exploring end — getting up to speed on a topic, finding sources you didn’t know existed, surfacing unexpected angles.

A serious research workflow probably uses both. The mistake is assuming they’re substitutes. They’re not. They’re complementary tools at different stages of the same workflow, and a tool that’s honest about which end it serves is a tool that respects how scholarship actually works.

We built Archēglyph for the writing end on purpose. Every choice in the pipeline — preserving the source image, recording the engine on every region, keeping the cluster card text-first, refusing to add a “summarise this corpus” button — is downstream of one commitment: everything you read in this tool, you can defend in a footnote.

If that’s the test you care about, you’ll find Archēglyph passes it. If you need a tool that synthesises, that’s a different tool, and we won’t pretend to be it.

Unfamiliar with a term?

Extractive QA — what “the claim is a span in the source” looks like as a technique.
Hallucination — the failure mode the citable-claim test is designed to catch.
Provenance — the chain of “who said this, with which engine” behind every claim.