BM25
The standard ranking function for lexical search — what Lucene, Tantivy, and Elasticsearch use under the hood.
Last updated
BM25 is a ranking function for lexical search: given a query and a document, it scores the document using term frequency (how often the query words appear), inverse document frequency (how rare those words are across the corpus), and document length. It’s the successor to TF-IDF and is what search libraries like Lucene and Tantivy use.
Why it matters for your research. Lexical search remains the right tool when the researcher wants literal matches — a named place, an exact phrase, an archaic spelling — which semantic search blurs out. “The word Trove appears 43 times” is a question for BM25.
In Archēglyph. Powers the full-text search half of every dataset, via Tantivy. Semantic search sits alongside it, and both result lists are visible to the researcher.
Not to be confused with. Semantic search scores by meaning, not by word overlap. Different questions, complementary tools.