Chris Jones

English to French Transformer

Python

PyTorch

Overview

A Pre-LN Transformer built from scratch in PyTorch, trained to translate English into French. Implements the full architecture from "Attention Is All You Need" (Vaswani et al., 2017) — including multi-head self-attention, cross-attention, positional encoding, and a shared BPE vocabulary across both languages.

You can find a live demo here.

Results

Trained for 15 epochs on the opus_books dataset (~200k sentence pairs) on an A100 GPU in approximately 1 hour.

Decoding	BLEU
Beam Search (k=5)	17.76

Example Translations

English	Predicted French
The cat sat on the mat.	Le chat était assis sur le tapis.
I would like a cup of coffee, please.	Je voudrais une tasse de café, s'il vous plaît.
Where is the nearest train station?	Où est la gare la plus proche?
She opened the window to let in the fresh air.	Elle ouvrit la fenêtre pour laisser entrer l'air frais.
The meeting has been postponed to next Monday.	La réunion a été reportée à lundi prochain.

Architecture

The model follows the original Transformer encoder-decoder architecture with Pre-Layer Normalisation applied before each sub-layer rather than after — improving training stability at deeper depths.

Hyperparameter	Value
d_model	512
num_heads	8
ffn_hidden	2,048
num_layers	6
vocab_size	32,000 (shared BPE)

Implementation

The full architecture was implemented from scratch in Python using PyTorch — no pre-built transformer blocks. Key components include:

Multi-head self-attention and cross-attention with scaled dot-product scoring
Positional encoding using sinusoidal functions
Shared BPE vocabulary across source and target languages
Beam search decoding with k=5 for inference

Training

Dataset: opus_books — a parallel English/French literary corpus
Hardware: A100 GPU
Duration: ~1 hour across 15 epochs
Objective: Cross-entropy loss with label smoothing

On this page

Links

Github Live Demo