i
CharacterGPT Trainer
Configure
Train
No model
Data Configuration
Load and edit your training text
›
Load Harry Potter
Upload file
No data
Loaded Region
0%
–
100%
Start
0%
End
100%
No data loaded
Model Configuration
Transformer hyperparameters
›
Computational Cost
—
Context Length
chars model sees
32 tokens
64 tokens
128 tokens
256 tokens
512 tokens
Embedding Dim
width of vectors
16 dims
32 dims
64 dims
128 dims
256 dims
Num Layers
2 layers
stack depth — cost increases linearly
Num Heads
4 heads
head dim = 16
FFN Multiple
×2
embDim → embDim×k → embDim
Dropout
0.1
regularisation — 0.1 is a good default
Architecture Preview
Open Train Page
Epoch — / —
0%
Batch
0%
Press
Build & Train
to start. Samples appear after each epoch.
Inference
Type a prompt — the model autocompletes. Context window highlighted.
Generate:
150 chars
Clear
Generate
More
Stop
Controls
Build & Train
Train Further
Stop Training
← Back to Configure
Live Stats
—
WMA Loss (50-batch)
Learning Rate
—
Test Loss
—
Training Parameters
⌄
Epochs
5
Batch Size
32
Temperature
0.9
Sample Length
200 chars
LR Start
0.003
LR End
0.0005
Loss Curve
Epoch History
Epoch
Test Loss
No epochs yet