Glossary Architecture 1 min read

Transformer

Also known as: Transformer Architecture, Transformer Model

A neural network architecture based on self-attention mechanisms that processes input sequences in parallel, forming the foundation of virtually all modern large language models.

Transformer architecture, self-attention, encoder-decoder, BERT, GPT, T5, positional encoding, feed-forward network, layer normalization, residual connections, multi-head attention, transformer blocks, pre-training, decoder-only, encoder-only

Sources & References

Attention Is All You Need

Google Research / Google Brain

Research

The Illustrated Transformer

Jay Alammar

Transformers (Hugging Face Documentation)

Hugging Face

Documentation

Related Terms

Attention Mechanism

A neural network component that allows models to selectively focus on the most relevant parts of their input, dynamically weighting the importance of different elements in a sequence.

Deep Learning

A subset of machine learning based on artificial neural networks with multiple layers (deep architectures) that can learn hierarchical representations of data for complex pattern recognition.

Large Language Model

A type of AI model trained on vast amounts of text data that can understand, generate, and manipulate human language, typically based on the transformer architecture with billions of parameters.

Neural Network

A computing system inspired by biological neural networks, consisting of interconnected nodes (neurons) organized in layers that process information using learnable weights and activation functions.

Previous Training Data

Next Trust Boundary Validation Engine

Back to Glossary

MCP Tutorials

RAG Cookbook

Library Integrations

Context Window Engineering

Embeddings & Retrieval

Tool Use & Function Calling