A to Z of language model terms

A glossary of common language model terms.

It’s all you need.

Backpropagation is a fundamental algorithm in neural networks, used for training by adjusting weights based on the error rate obtained in the previous epoch (cycle).

These are large language models that have been trained on raw text. The model takes a prompt and predicts a token, then uses the prompt+token to predict the next token, and so on.

Read more about the different types of language model

The number of things computed at once. In training it refers to the number of examples used in one iteration.

A collection of Llama 2 language models fine-tuned for code completion. There are base model and instruct tuned variants.

View our collection of Code Llama models

The context window in language models refers to the maximum amount of text the model can consider at once, impacting its ability to reference earlier parts in a conversation or document.

View our collection of embedding models

Numerical representations of content, formatted as a large list of floating point numbers.

OpenAI paper

Adjusting a pre-trained model for a specific task.

The model’s ability to perform well on unseen data.

An attention technique used by Llama 2 that distributes attention across the entire input. It helps to prevent a language model from forgetting its initial instructions.

An optimization algorithm to minimize the loss function.

Models that can follow instructions, and are easier to prompt. They are base models that have been fine-tuned on instruction-answer pairs.

Read more about the different types of language model

The step size at each iteration while moving toward a minimum of a loss function.

A popular large-multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding.

It is similar to the vision capabilities of GPT-4V.

Try out the model

Meta’s popular open source LLMs.

View Meta models on Replicate

Measures how well the model performs during training.

A company known for their capable open-source language models.

View Mistral AI models on Replicate

A mixture of experts language model from Mistral AI. Announcement blog post

A technique that allows models to manage many parameters, increasing its capacity and specialization, while maintaining the efficiency and speed of a smaller model.

It does this using groups of parameters known as “experts”. They are selectively activated for processing different parts of the data.

For example, in the case of Mixtral 8x7B, there are 8 experts with a total of 46.7B parameters operating at the speed of a 12.9B model.

A multimodal model is one that is designed to understand and generate information across multiple types of data, such as text, images, or sound. A common example is being able to upload an image, and ask a question about what is happening in the image.

The number of individual weights and biases that a language model has learned through training. Common examples are 7B, 13B and 70B.

More parameters mean a larger, more capable and more accurate model. But it also means a slower model that is more expensive to run.

Overtraining. It happens when a model learns its training data too thoroughly. An overfit model will perform poorly on new, unseen data, as it fails to generalize from the specific examples it was trained on.

If you are fine-tuning, try to use a more diverse training dataset or train with fewer steps.

A technique that reduces the precision of a model’s parameters to decrease the overall size of a model. Smaller models are more portable and faster to run. It comes at the cost of capability and accuracy.

Common quantizations are:

  • 16bit
  • 8bit
  • 4bit

A guide to quantization in language models

On Huggingface, TheBloke is well known for producing high quality quantized versions of models soon after their release.

A technique for reducing the size of a model by quantizing its weights. AWQ recognizes that some weights are more important than others. It protects the most important weights to reduce quantization errors.

A technique for enriching your language model outputs by retrieving contextual information from an external data source.

Read our blog post about RAG

A technique used to prevent overfitting by adding a penalty to the model for complexity. It helps in simplifying the model, ensuring it performs well not just on the training data, but also on new, unseen data.

Server-sent event streams for language models.

See our list of models that support streaming

A system prompt is text that is prepended to the prompt. It’s used in a chat context to help guide or constrain model behavior.

Temperature in language models adjusts randomness in text generation, with lower values yielding more predictable output and higher values increasing creativity and variability.

See how to use open source language models for a more detailed explanation.

A token in natural language processing represents the smallest unit of data, like a word, character, or symbol, used as input for language models.

See how to use open source language models for a more detailed explanation.

Tokenization is the process of breaking down text into smaller units (tokens), such as words or phrases, for easier processing in natural language models. When you tokenize text, you convert it into a numerical representation that can be used as input for language models.

Top K in language models limits the number of candidate words considered at each step of text generation, focusing on the most likely options for coherence.

See how to use open source language models for a more detailed explanation.

Top P, or nucleus sampling, in language models selects the next word from a subset of the vocabulary with a cumulative probability exceeding a set threshold, enhancing text variety.

See how to use open source language models for a more detailed explanation.

A type of model architecture used in many modern LLMs.

Transformers in the Hugging Face library are a collection of state-of-the-art machine learning models, primarily for natural language processing, offering a wide range of pre-trained and customizable options.

When a model fails to capture the underlying trend of the data.

A vector is a numerical representation of data, like words or phrases, in a high-dimensional space, capturing semantic and contextual relationships.

A type of database where we can store and query embeddings, and their associated documents.

They are useful when you are searching for things that are “semantically similar”.

Popular vector stores: