Glossary Core Concepts 1 min read

Tokens

Also known as: Token, Subword Token, BPE Token

The basic units of text that language models process, typically representing words, subwords, or characters. Token counts determine context window usage and API costs.

Tokenization, subword tokens, BPE, byte-pair encoding, token count, token limit, wordpiece, sentencepiece, vocabulary, token embedding, detokenization, special tokens, token budget, token efficiency, tokenizer

Sources & References

OpenAI Tokenizer

OpenAI

Documentation

Byte Pair Encoding Tokenization

Hugging Face

Neural Machine Translation of Rare Words with Subword Units

University of Edinburgh

Research

Related Terms

Context Window

The maximum amount of text (measured in tokens) that a large language model can process in a single interaction, encompassing both the input prompt and the generated output. Managing context windows effectively is critical for enterprise AI deployments where complex queries require extensive background information.

Large Language Model

A type of AI model trained on vast amounts of text data that can understand, generate, and manipulate human language, typically based on the transformer architecture with billions of parameters.

Previous Token Budget Allocation

Next Training Data

Back to Glossary

MCP Tutorials

RAG Cookbook

Library Integrations

Context Window Engineering

Embeddings & Retrieval

Tool Use & Function Calling

Tokens

Sources & References

Related Terms

Context Window

Large Language Model