Clustering

Grouping items by similarity without pre-specifying what the groups should be. An unsupervised view of the corpus.

Last updated 20 April 2026

Grouping items by similarity without pre-specifying what the groups should be. Unlike classification — which learns known labels from examples — clustering is an unsupervised view of the corpus: the algorithm surfaces structure that was not asked for.

Why it matters for your research. Clustering is the right tool for the “what is even in this corpus?” question. It surfaces themes, subgenres, and noise. Because it is unsupervised, every cluster needs a human read — the algorithm doesn’t know what the clusters mean.

In Archēglyph. Via HDBSCAN on embeddings, presented text-first rather than as a scatterplot. See Reading clusters as a researcher.

Not to be confused with. Topic modelling is a form of clustering that specifically produces word-list-labelled topics; plain clustering may or may not.

Related terms

References

← Back to the glossary