CharacterGPT — Browser Transformer Trainer

Data Configuration

Load and edit your training text

›

No data loaded

Model Configuration

Transformer hyperparameters

›

Computational Cost —

Context Length chars model sees

Embedding Dim width of vectors

Num Layers 2 layers

stack depth — cost increases linearly

Num Heads 4 heads

head dim = 16

FFN Multiple ×2

embDim → embDim×k → embDim

Dropout 0.1

regularisation — 0.1 is a good default

Architecture Preview

Epoch — / —

Batch

Press Build & Train to start. Samples appear after each epoch.

Inference Type a prompt — the model autocompletes. Context window highlighted.

Generate: 150 chars

Live Stats

—

WMA Loss (50-batch)

Learning Rate —

Test Loss —

Epochs 5

Batch Size 32

Temperature 0.9

Sample Length 200 chars

LR Start 0.003

LR End 0.0005

Epoch History

Epoch	Test Loss
No epochs yet