Training vs inference

Training is how a model learns — done once, expensively. Inference is how the model is used — each prediction is a forward pass over frozen weights.

Last updated 20 April 2026

Training is how a model learns — it happens once (or a handful of times), costs a lot of compute, and produces the set of weights that define the model. Inference is how the model is used — each prediction is a forward pass over those frozen weights. Inference is much cheaper per call, but a busy service pays inference costs continuously.

Why it matters for your research. When you read “AI is expensive”, distinguish which phase is meant. Training a frontier LLM costs millions and happens at one or two labs. Using one costs cents per query. Privacy also lives mostly at inference time: “your data went to OpenAI” means their inference servers saw it, not that they retrained on it.

In Archēglyph. Inference only. We never retrain models on your corpus; the corpus stays in your bundle. Inference happens either locally (sentence-transformers for embeddings) or via a cloud provider with our API key, and the model id is recorded either way.

Not to be confused with. Fine-tuning is a small, targeted slice of training, not inference.

Related terms

References

Wikipedia — Inference

← Back to the glossary