Archēglyph

Embedding model

The specific neural network that turns a chunk of text into its embedding vector. Different embedding models produce different geometries from the same text.

Last updated

The specific neural network that turns a chunk of text into an embedding vector. Different embedding models trained on different corpora produce subtly different geometries — and therefore different nearest neighbours, different clusters, and different search results from the same source text.

Why it matters for your research. Picking an embedding model is a research choice, not a default. A model trained mostly on modern web text may handle 19th-century diction oddly; a multilingual model may be worse at subtle English than a monolingual one. It’s worth running a small pilot on your actual corpus before committing.

In Archēglyph. Selectable per dataset, and the choice is recorded in the dataset note. See the article Choosing an embedding model for DH.

Not to be confused with. Generative models that produce text — embedding models produce vectors, never prose.

Related terms

References

← Back to the glossary