Archeglyph
· positioning · method · llm · research-workflow

Orthogonal to LLM 'deep research'

Deep-research agents synthesise. Archeglyph indexes. They are different products solving different problems for different research workflows. Knowing which you need keeps your citations defensible.

By Dipankar Sarkar · Last updated

On this page
  1. What a deep-research agent does
  2. What Archeglyph does
  3. The orthogonal axis
  4. Where they actually meet (and where they don’t)
  5. Why this matters now

There is a category of products getting a lot of attention right now — “deep research” agents. They take a question, search the web (or a private collection), read a few dozen sources, and produce a written synthesis with footnote-style citations. Perplexity Research, ChatGPT Search, Claude’s research mode, Gemini Deep Research, and a long tail of vertical clones all sit in this category.

Researchers often ask us how Archeglyph compares. The honest answer is we don’t, because we’re not in that category. Deep-research agents and Archeglyph aren’t on the same axis. They solve different problems for different stages of research. This article tries to draw the orthogonal-axis distinction clearly, because choosing the wrong tool for the work in front of you is expensive.

What a deep-research agent does

A deep-research agent compresses many sources into one written answer. Workflow:

  1. Researcher asks a question.
  2. Agent reads several dozen sources it selected.
  3. Agent writes a synthesis — usually 800–3,000 words — with inline citations.
  4. The footnotes link back to the sources.

This is genuinely useful. For an early-stage scoping pass on an unfamiliar topic, or when you need a credible briefing in fifteen minutes, the workflow saves real time. The footnotes are a meaningful improvement over the previous generation of chatbots that gave you no sources at all.

But the output is a written paragraph that the agent composed. The sources fed it; it digested them; it produced new prose. The chain of custody between any specific sentence in the output and any specific sentence in a source is fragile in ways that are subtle and adversarial:

  • The cited source frequently doesn’t say quite what the synthesis claims. (Multiple recent audits put this rate at 15–40% on real research questions.)
  • A claim with no clear source in the cited material gets attributed anyway, often plausibly enough that nobody checks.
  • The selection of sources is itself opaque — why these forty, and not the other forty?

For exploratory work, this is acceptable. For citable work, it isn’t. A researcher writing a footnote needs to point at a specific page on a specific date. A philologist needs the passage as it actually appears in the manuscript, not a paraphrase that may have introduced a tense or a hedge.

What Archeglyph does

Archeglyph compresses one large corpus into a navigable index. Workflow:

  1. Researcher uploads a corpus they already chose. (We don’t pick sources for you. This is the scholar’s job, not ours.)
  2. Archeglyph reads each page (OCR or VLM, your choice, always disclosed). The text we extract is the text on the page; we show you the bbox.
  3. We index it for full-text search and group it semantically into clusters of related fragments.
  4. The output is a browsable view of the corpus: search returns real chunks, clusters surface real exemplar quotations, every fragment links back to the source page.

There is no “synthesis” anywhere in that pipeline. The deepest generative thing we do is name a cluster — “Migrations across the Bosphorus” — and even that comes with a “this is an LLM-generated label” badge, with the actual quotations beneath it.

The orthogonal axis

AxisDeep-research agentsArcheglyph
InputA questionA corpus
OutputA written synthesisA navigable index
Source selectionThe agent picksYou picked when you uploaded
What the user readsGenerated paragraphsReal fragments, in context
Provenance granularityFootnote per paragraphRegion + page + bbox per fragment
Confidence in any single lineProbabilisticDeterministic (it’s the OCR’d text)
Audit costManual re-reading per claimOne-time per region, then reusable
Best forExploration, briefingsCitable scholarship, longitudinal analysis

These are not competing rows. They’re different jobs.

Where they actually meet (and where they don’t)

In a complete research workflow, both can have a place:

  • Scoping (early) — a deep-research agent is a fast way to find out what conversation you’re walking into. Read its synthesis knowing it might mislead, then go read the actual sources.
  • Working with primary corpora (the middle) — a tool that lets you read your archive without reading it cover-to-cover. This is the hole Archeglyph fills.
  • Writing (later) — your own synthesis, with citations to the primary sources you found via Archeglyph’s search and clusters. Footnotes that point at page numbers, not at AI outputs.

What deep-research agents are not good at is being trusted at the writing end of that pipeline. The hallucination rate is high enough that any claim drawn from one needs to be re-verified against the original. At which point the labour-saving has been transferred, not removed — you’re now reading the original anyway.

What Archeglyph is not trying to do is the synthesis. We have no plans to add a chat interface, no plans to write summaries on your behalf, no plans to “answer the research question”. A different product can do that. We want to be the tool you cite from.

Why this matters now

The noise floor of “AI research tools” is rising fast. A new product launches every week claiming to do “everything for the researcher”. Most of them are minor variations on the same generative pattern: an LLM, a vector store, a chat interface, a hand-wave at hallucination. The pattern isn’t bad — it’s just one workflow. The danger is that researchers begin to assume this is what AI does to research: it synthesises, it asserts, it occasionally invents.

We think there’s an entire other axis to build along, and it’s the one humanities scholarship has always run on: read the source, ground the claim, cite the page. Make that easier — much easier — without changing what counts as a claim. Don’t synthesise on the researcher’s behalf. Don’t replace judgment. Amplify the corpus, keep the interpretation human.

That’s the orthogonal axis. We’re betting it matters.