Skip to main content

During Training an LLM

Start of training - randomly initialised

Embedding Layer

Attention Layer

Feed Forward Network (FFN) Layer

LayerNorm parameter

The training loop - how learning actually happens

Step 1 - Forward pass

Step 2 - Compute the loss

Step 3 - Backpropogation

Step 4 - Optimiser step

Repeat - millions/trillions of times

What each component learns

Token Embeddings - lexical meaning

Attention weights - relationships

Feed Forward Networks(FFNs) - transformation/thinking

LayerNorm - signal conditinonality and stability

Putting it all together

Start of training - randomly initialised
The training loop - how learning actually happens
What each component learns
Putting it all together