Archēglyph

Chunk

The unit of analysis inside Archēglyph — a passage of several sentences, small enough for an embedding model to handle in one pass.

Last updated

The unit of analysis inside Archēglyph. A chunk is a passage of text — usually several sentences — sized so that it fits comfortably through embedding models and carries enough context to be meaningful on its own. Documents become chunks; chunks get embedded, clustered, and indexed.

Why it matters for your research. Much of how Archēglyph “thinks” about your corpus — clusters, search results, neighbour passages — is in units of chunks. Too small and context is lost; too large and precision drops. The default sizing is tuned for prose, and researchers rarely need to change it.

In Archēglyph. Chunks are stored in the bundle’s metadata SQLite alongside their parent document and region. Every chunk carries its own model provenance.

Not to be confused with. A document is the source artefact (one PDF, one issue, one letter); a chunk is a slice of that document. Search returns chunks, not documents — with a breadcrumb back to the document.

Related terms

← Back to the glossary