Transformer Visualizer
Interactive 3D Architecture Explorer
Explanation Depth
ELI5
Getting into LLMs
Deep Math
Getting into LLMs
Training Mode
Show what happens during training
Learned parameters (update via backprop)
Gradient flow (backward pass)
Dropout (active during training)
Loss computation (cross-entropy)
Teacher forcing (ground truth input)
☰
Your text
6 tokens
←
Back to Overview
i
×
Overview
▸
Full Architecture
Color Legend
Attention
Feed-Forward
Layer Norm
Embeddings
Residual / Skip
Softmax / Output