Archēglyph

Cosine similarity

A number between -1 and 1 that measures the angle between two vectors. 1 is identical direction, 0 is unrelated.

Last updated

A number between -1 and 1 that measures the angle between two vectors: 1 is identical direction, 0 is unrelated, -1 is opposite. For embeddings, cosine similarity is the standard way to say “how close in meaning are these two chunks?”.

Why it matters for your research. Most reported embedding “similarity scores” are cosine. Knowing that it’s an angle — not a distance, not a probability — helps calibrate expectations: cosine of 0.8 is close, cosine of 0.3 is weak, and there is no universal threshold across models. A score that means “near-duplicate” for one embedding model means “vaguely related” for another.

In Archēglyph. Used throughout cluster and neighbour computations on top of the zvec index.

Not to be confused with. Euclidean distance is also used on embeddings, but on length-normalised vectors it’s equivalent to cosine similarity up to a monotonic transform.

Related terms

References

← Back to the glossary