Glossary Architecture 1 min read

Transformer

Also known as: Transformer Architecture, Transformer Model

A neural network architecture based on self-attention mechanisms that processes input sequences in parallel, forming the foundation of virtually all modern large language models.

Transformer architecture, self-attention, encoder-decoder, BERT, GPT, T5, positional encoding, feed-forward network, layer normalization, residual connections, multi-head attention, transformer blocks, pre-training, decoder-only, encoder-only

Sources & References

1
Attention Is All You Need

Google Research / Google Brain

Research
3 Documentation