Attention Mechanism
Also known as: Self-Attention, Scaled Dot-Product Attention, Multi-Head Attention
A neural network component that allows models to selectively focus on the most relevant parts of their input, dynamically weighting the importance of different elements in a sequence.
Sources & References
Yoshua Bengio et al.
Jay Alammar
Related Terms
Context Window
The maximum amount of text (measured in tokens) that a large language model can process in a single interaction, encompassing both the input prompt and the generated output. Managing context windows effectively is critical for enterprise AI deployments where complex queries require extensive background information.
Large Language Model
A type of AI model trained on vast amounts of text data that can understand, generate, and manipulate human language, typically based on the transformer architecture with billions of parameters.
Neural Network
A computing system inspired by biological neural networks, consisting of interconnected nodes (neurons) organized in layers that process information using learnable weights and activation functions.
Transformer
A neural network architecture based on self-attention mechanisms that processes input sequences in parallel, forming the foundation of virtually all modern large language models.