# Archeglyph

> Reliable, interpretable tools for researchers working with image and PDF corpora.

## Start here

- https://www.archeglyph.com/about
- https://www.archeglyph.com/guides/pipeline
- https://www.archeglyph.com/articles/transparency-is-a-feature

## Recent public datasets

- https://www.archeglyph.com/datasets — the public directory (no datasets yet).

## Articles

- https://www.archeglyph.com/articles/choosing-an-embedding-model-for-dh — Choosing an embedding model for digital humanities
- https://www.archeglyph.com/articles/downstream-of-trove — Downstream of Trove: where analysis fits in the corpus stack
- https://www.archeglyph.com/articles/orthogonal-to-llm-deep-research — Orthogonal to LLM 'deep research'
- https://www.archeglyph.com/articles/reading-clusters-as-a-researcher — Reading clusters as a researcher
- https://www.archeglyph.com/articles/the-citable-claim-test — The citable-claim test
- https://www.archeglyph.com/articles/transparency-is-a-feature — Transparency is a feature
- https://www.archeglyph.com/articles/vlm-vs-ocr-when-to-pick-what — VLM vs OCR: when to pick what
- https://www.archeglyph.com/articles/what-a-good-provenance-badge-looks-like — What a good provenance badge looks like
- https://www.archeglyph.com/articles/why-archeglyph-cannot-hallucinate — Why Archeglyph cannot hallucinate
- https://www.archeglyph.com/articles/why-we-snapshot-per-dataset — Why we snapshot per dataset

## Guides

- https://www.archeglyph.com/guides/exporting-and-archiving-a-dataset — Exporting and archiving a dataset
- https://www.archeglyph.com/guides/ocr-vs-vlm — OCR vs VLM: a practical chooser
- https://www.archeglyph.com/guides/reviewing-a-noisy-scan — Reviewing a noisy scan
- https://www.archeglyph.com/guides/pipeline — The pipeline
- https://www.archeglyph.com/guides/first-dataset — Your first dataset

## Feeds

- https://www.archeglyph.com/rss.xml
- https://www.archeglyph.com/sitemap-index.xml