English to French Transformer

Python

PyTorch

Overview

A Pre-LN Transformer built from scratch in PyTorch, trained to translate English into French. Implements the full architecture from "Attention Is All You Need" (Vaswani et al., 2017) — including multi-head self-attention, cross-attention, positional encoding, and a shared BPE vocabulary across both languages.

You can find a live demo here.

Results

Trained for 15 epochs on the opus_books dataset (~200k sentence pairs) on an A100 GPU in approximately 1 hour.

DecodingBLEU
Beam Search (k=5)17.76

Example Translations

EnglishPredicted French
The cat sat on the mat.Le chat était assis sur le tapis.
I would like a cup of coffee, please.Je voudrais une tasse de café, s'il vous plaît.
Where is the nearest train station?Où est la gare la plus proche?
She opened the window to let in the fresh air.Elle ouvrit la fenêtre pour laisser entrer l'air frais.
The meeting has been postponed to next Monday.La réunion a été reportée à lundi prochain.

Architecture

The model follows the original Transformer encoder-decoder architecture with Pre-Layer Normalisation applied before each sub-layer rather than after — improving training stability at deeper depths.

HyperparameterValue
d_model512
num_heads8
ffn_hidden2,048
num_layers6
vocab_size32,000 (shared BPE)

Implementation

The full architecture was implemented from scratch in Python using PyTorch — no pre-built transformer blocks. Key components include:

  • Multi-head self-attention and cross-attention with scaled dot-product scoring
  • Positional encoding using sinusoidal functions
  • Shared BPE vocabulary across source and target languages
  • Beam search decoding with k=5 for inference

Training

  • Dataset: opus_books — a parallel English/French literary corpus
  • Hardware: A100 GPU
  • Duration: ~1 hour across 15 epochs
  • Objective: Cross-entropy loss with label smoothing

On this page

OverviewResultsExample TranslationsArchitectureImplementationTraining

Links

GithubLive Demo