Glossary

Generative AI

What people usually mean by "AI" in 2026 headlines — and how it differs from what Archēglyph does.

Fine-tuning

Taking a pre-trained model and continuing to train it on a narrower dataset so its behaviour shifts towards that domain.
Generative AI

Any model whose output is newly produced content — text, image, audio — rather than a classification, ranking, or extracted span.
Hallucination

A generative model's output that is fluent, confident, and wrong.
Large language model
LLM

A very large neural network trained to predict the next token of text — the thing meant by 'AI' in most 2020s headlines.
Prompt

The text you hand to a generative model to shape its output — instructions, context, worked examples, and the user's question.
Retrieval-augmented generation
RAG

A pattern where a generative model is first fed the results of a search over your documents, then asked to write an answer grounded in what it retrieved.
Training vs inference

Training is how a model learns — done once, expensively. Inference is how the model is used — each prediction is a forward pass over frozen weights.

Machine learning basics

The vocabulary that shows up once per article: model, embedding, transformer, token.

Language techniques

Specific NLP methods you will see referenced in DH work — NER, stylometry, topic modelling.

Search and organisation

How a corpus gets indexed, ranked, clustered, and visualised.

Digital humanities

Terms that predate the computational turn and still carry the argument — corpus, provenance, diachronic.

Archēglyph vocabulary

Names for things that are specific to this platform — bundle, versioning, analysis plugin.

Generative AI

Fine-tuning

Generative AI

Hallucination

Large language model

Prompt

Retrieval-augmented generation

Training vs inference

Machine learning basics

Embedding

Embedding model

Layout analysis

Model

Tokenization

Transformer

Vision-language model

Language techniques

Language detection

MinHash + LSH

Named-entity recognition

Sentence segmentation

Stylometry

TF-IDF

Topic modelling

Search and organisation

BM25

Clustering

Cosine similarity

Extractive question answering

HDBSCAN

OCR

Semantic search

UMAP

Vector search

Digital humanities

Citation extraction

Co-occurrence

Corpus

Diachronic analysis

Provenance

Archēglyph vocabulary

Analysis plugin

Artefact

Bundle

Chunk

Dataset note

Public dataset

Versioning