Transformer
Also known as: Transformer Architecture, Transformer Model
A neural network architecture based on self-attention mechanisms that processes input sequences in parallel, forming the foundation of virtually all modern large language models.
Sources & References
Jay Alammar
Related Terms
Attention Mechanism
A neural network component that allows models to selectively focus on the most relevant parts of their input, dynamically weighting the importance of different elements in a sequence.
Deep Learning
A subset of machine learning based on artificial neural networks with multiple layers (deep architectures) that can learn hierarchical representations of data for complex pattern recognition.
Large Language Model
A type of AI model trained on vast amounts of text data that can understand, generate, and manipulate human language, typically based on the transformer architecture with billions of parameters.
Neural Network
A computing system inspired by biological neural networks, consisting of interconnected nodes (neurons) organized in layers that process information using learnable weights and activation functions.