Archeglyph
· transparency · ux · design

What a good provenance badge looks like

UX writing about the transparency contract: what goes inside the badge, what gets omitted, and why the re-run affordance lives next to it. With ASCII mockups of the patterns we use in the review screen, search results, and cluster cards.

By Dipankar · Last updated

On this page
  1. Anatomy
  2. Where badges appear
  3. In the review pane
  4. In search results
  5. In cluster cards
  6. What we decided not to do
  7. The re-run affordance
  8. The transparency contract, stated plainly

If a provenance badge is a promise that “this specific output was produced by this specific engine,” the badge has to be readable without training. It has to answer three questions at a glance: what model, what version, and can I try another? It has to do that in a row of search results without ballooning the row. And it has to mean the same thing whether it sits beside an extracted paragraph, a cluster title, or a vector search hit.

We have iterated on the badge several times during M0. These notes describe where it landed and why.

Anatomy

A badge is three fields rendered as one pill:

┌─────────────────────────────────────────┐
│ qwen3-vl:235b-cloud · v2025.03 · 02:14 │
└─────────────────────────────────────────┘
  • Engine id. The left-most field is the stable identifier used across the catalogue. Tesseract reads as tesseract, a VLM reads as its full Ollama tag. We never shorten qwen3-vl:235b-cloud to qwen — abbreviation was one of the first temptations and one of the first rejections, because “qwen” alone is not a citable reference.
  • Version. For binary engines (Tesseract) this is the upstream semver. For cloud-backed VLMs this is a date tag that we reconcile nightly against the provider. If the provider does not expose a version, we surface the date we first observed that model id in our engine catalogue.
  • Timestamp. HH:MM of when this specific block was produced. Not the full ISO-8601 (which clutters), but enough to disambiguate the pre-review output from a re-run.

A badge never carries confidence scores. Confidence is useful in the review pane and on the advanced panel of a cluster card, but folding it into the badge would pressure readers to treat it as the headline number, and the headline of a provenance badge is who produced this, not how sure they were.

Where badges appear

In the review pane

┌─ Region 14 ──────────────────────────────────────────────────────┐
│ "reported from the wharves of Galata that the Russian           │
│  steamer..."                                                    │
│                                                                 │
│ [ tesseract · 5.3.0 · 02:14 ]  [ accept ]  [ re-run with ⌄ ]    │
└──────────────────────────────────────────────────────────────────┘

The badge sits on the same row as the accept and re-run controls because those three things compose one decision: I have seen what produced this, I know my options, I choose to accept or rework. If the badge were in a tooltip, the action would lose the attribution that justifies it.

In search results

 #42  p=0.812   Document 117, p.3                                 
 "…reported from the wharves of Galata that the Russian steamer…" 
 [ tesseract · 5.3.0 ]  [ embed: bge-small-en-v1.5 ]              

Search results have two badges: the engine that extracted the text, and the model that embedded the chunk. We show both because a user comparing two search results can form a legitimate hypothesis like “the MiniLM rows rank differently from the BGE rows” only if both badges are visible side by side.

In cluster cards

┌─ Migrations across the Bosphorus ────────────────────────────────┐
│ Fourteen fragments, mostly port reporting from 1897–1901.       │
│ — theme_llm: gemma3:27b-cloud                                   │
│                                                                 │
│ "the wharves of Galata..."         — Doc 117, p.3  (tesseract)  │
│ "steamers inward bound..."         — Doc 204, p.1  (tesseract)  │
│ "lo riferiva il console..."        — Doc 91,  p.2  (qwen3-vl)   │
└──────────────────────────────────────────────────────────────────┘

On a cluster card, the theme-writing model is badged at the top of the card and each exemplar carries its extraction engine. The rule is that every human-readable string the product did not author with a keyboard has a badge somewhere within a one-glance radius.

What we decided not to do

  • No “AI generated” disclaimer. A badge that says qwen3-vl:235b-cloud is a piece of scholarly apparatus. A banner that says “generated by AI” is a legal posture. We made the mistake in an early prototype of bolting both on; readers ignored the banner entirely and dismissed the badge as redundant. We kept the badge.
  • No colour-coded risk. We tried a green/amber/red scheme where high-confidence extractions got a muted badge and low-confidence ones got a warn tint. Reviewers read the colour as a judgement on the engine rather than the region, and argued with it. We moved confidence to the region tint instead, where it belongs.
  • No vendor logos. A badge is text. Logos turn provenance into branding, and the moment a researcher sees a logo they stop treating the badge as information and start treating it as an endorsement.

The re-run affordance

The badge is paired with a re-run with… trigger that opens a popover. The popover is split into two tabs (OCR, VLM) with the current engine pre-selected and greyed out. Re-running produces a new row in the region’s history; the badge swaps to the new engine id but the previous row is still available from the row-history disclosure on the left edge of the region card.

The re-run button is never the default. In the review pane, Accept is the large button; re-run with… is a secondary. In search results and cluster cards, the badge is purely informational and the re-run affordance is gated behind clicking through to the review pane. We resisted every design iteration where a researcher could re-run a region from a search result, because the cost of mis-clicking a re-run in a scanning view is two minutes of compute and a brief jitter in their own mental model of the dataset.

The transparency contract, stated plainly

What the badge promises:

  1. Every textual output in the product carries, on the same screen, an attribution to the engine that produced it.
  2. Engine ids are stable: what appears in one snapshot resolves to the same model identity in every future snapshot.
  3. Every output paired with a badge has a re-run path that is one or two clicks away.

What the badge does not promise:

  1. That the engine is correct.
  2. That the engine’s weights will remain available upstream.
  3. That we have any editorial opinion about the engine’s output.

The badge is a pointer, not an endorsement. That is the whole shape of the transparency contract, and it is the reason we obsess about the pixels.