Archēglyph

Topic modelling

Also: Topic modeling

An unsupervised technique that groups documents or passages by the themes they share, surfaced as lists of related words per topic.

Last updated

Topic modelling groups documents or passages by the themes they share, surfaced as lists of related words per topic. Classical methods (LDA, NMF on TF-IDF) treat documents as bags of words; modern methods like BERTopic use embeddings plus clustering.

Why it matters for your research. Topic modelling and embedding-clustering answer overlapping but not identical questions. Classical LDA highlights distinctive vocabulary; embedding methods highlight meaning. Running both on the same corpus often surfaces structure that either alone would miss.

In Archēglyph. On the roadmap as an alternative cluster view that sits beside the HDBSCAN output.

Not to be confused with. Clustering groups by similarity in any space; topic modelling specifically yields labelled topics described by word lists.

Related terms

References

← Back to the glossary