About Archēglyph

Archēglyph is a research platform for working with scanned documents — newspapers, archival prints, correspondence, manuscripts — at the scale a working researcher actually has. It covers the whole arc from raw page image to searchable, clusterable corpus, and it refuses to hide the machinery that gets you there.

Mission

We build tools that treat digital humanities researchers as the experts they are. That means three commitments that show up everywhere in the product:

Transparency. Every extracted text block, every search result, every cluster card names the model that produced it. If an output looks wrong, you can see who to blame.
Reading-first interfaces. Clusters are presented as quotations, not scatterplots. The goal is to make a researcher's intuition faster, not to teach them UMAP.
One-file datasets. A dataset — its lexical index, its vector index, its metadata — snapshots to a single archive you can download, back up, or share.

Who runs it

Archēglyph is a small project led by Dipankar, built alongside conversations with working researchers — historians, philologists, intellectual historians, archivists, and librarians. See the team page for current members.

Contact

Email [email protected] for research partnerships, technical questions, or access to the private beta.

Where to next

The pipeline — what happens from upload to cluster.
Your first dataset — the hands-on path in.
Transparency is a feature — why provenance shows up on every output.