i
Data Configuration
Load and edit your training text
No data
No data loaded
Model Configuration
Transformer hyperparameters
Computational Cost
Context Length chars model sees
Embedding Dim width of vectors
Num Layers 2 layers
stack depth — cost increases linearly
Num Heads 4 heads
head dim = 16
FFN Multiple ×2
embDim → embDim×k → embDim
Dropout 0.1
regularisation — 0.1 is a good default
Architecture Preview
Epoch — / —
0%
Batch
0%
Press Build & Train to start. Samples appear after each epoch.
Inference Type a prompt — the model autocompletes. Context window highlighted.