# Archeglyph > Reliable, interpretable tools for researchers working with image and PDF corpora. ## Start here - https://www.archeglyph.com/about - https://www.archeglyph.com/guides/pipeline - https://www.archeglyph.com/articles/transparency-is-a-feature ## Recent public datasets - https://www.archeglyph.com/datasets — the public directory (no datasets yet). ## Articles - https://www.archeglyph.com/articles/choosing-an-embedding-model-for-dh — Choosing an embedding model for digital humanities - https://www.archeglyph.com/articles/downstream-of-trove — Downstream of Trove: where analysis fits in the corpus stack - https://www.archeglyph.com/articles/orthogonal-to-llm-deep-research — Orthogonal to LLM 'deep research' - https://www.archeglyph.com/articles/reading-clusters-as-a-researcher — Reading clusters as a researcher - https://www.archeglyph.com/articles/the-citable-claim-test — The citable-claim test - https://www.archeglyph.com/articles/transparency-is-a-feature — Transparency is a feature - https://www.archeglyph.com/articles/vlm-vs-ocr-when-to-pick-what — VLM vs OCR: when to pick what - https://www.archeglyph.com/articles/what-a-good-provenance-badge-looks-like — What a good provenance badge looks like - https://www.archeglyph.com/articles/why-archeglyph-cannot-hallucinate — Why Archeglyph cannot hallucinate - https://www.archeglyph.com/articles/why-we-snapshot-per-dataset — Why we snapshot per dataset ## Guides - https://www.archeglyph.com/guides/exporting-and-archiving-a-dataset — Exporting and archiving a dataset - https://www.archeglyph.com/guides/ocr-vs-vlm — OCR vs VLM: a practical chooser - https://www.archeglyph.com/guides/reviewing-a-noisy-scan — Reviewing a noisy scan - https://www.archeglyph.com/guides/pipeline — The pipeline - https://www.archeglyph.com/guides/first-dataset — Your first dataset ## Feeds - https://www.archeglyph.com/rss.xml - https://www.archeglyph.com/sitemap-index.xml