English to French Transformer
Python
PyTorch

A Pre-LN Transformer built from scratch in PyTorch, trained to translate English into French. Implements the full architecture from "Attention Is All You Need" (Vaswani et al., 2017) — including multi-head self-attention, cross-attention, positional encoding, and a shared BPE vocabulary across both languages.
You can find a live demo here.
Trained for 15 epochs on the opus_books dataset (~200k sentence pairs) on an A100 GPU in approximately 1 hour.
| Decoding | BLEU |
|---|---|
| Beam Search (k=5) | 17.76 |
| English | Predicted French |
|---|---|
| The cat sat on the mat. | Le chat était assis sur le tapis. |
| I would like a cup of coffee, please. | Je voudrais une tasse de café, s'il vous plaît. |
| Where is the nearest train station? | Où est la gare la plus proche? |
| She opened the window to let in the fresh air. | Elle ouvrit la fenêtre pour laisser entrer l'air frais. |
| The meeting has been postponed to next Monday. | La réunion a été reportée à lundi prochain. |
The model follows the original Transformer encoder-decoder architecture with Pre-Layer Normalisation applied before each sub-layer rather than after — improving training stability at deeper depths.
| Hyperparameter | Value |
|---|---|
| d_model | 512 |
| num_heads | 8 |
| ffn_hidden | 2,048 |
| num_layers | 6 |
| vocab_size | 32,000 (shared BPE) |
The full architecture was implemented from scratch in Python using PyTorch — no pre-built transformer blocks. Key components include:
opus_books — a parallel English/French literary corpus