Hands-on reimplementations with annotated notebooks. Each entry below is a self-contained notebook.
Backpropagation & Autograd Engine
Backpropagation is the algorithm that computes gradients through a neural network by applying the chain rule backwards through the computation graph. Built here as a scalar-valued autograd engine — each operation tracks its inputs and knows how to propagate gradients — plus a minimal neural net library on top. ~200 lines of Python, zero dependencies.
9 notebooks: backpropagation and autograd
Notebook Summary 01Derivatives & AutogradLimit definition; manual derivative computation 02Data Struct & Forward PassBuilding the Valueclass; first forward pass03Backpropagation (by hand)Hand-assigning gradients at each node 04Training a Neurontanhas an activation function05Manual Backward PassRecursive ._backward()calls (compute each node’s gradient)06Automated Backward PassSingle .backward()call, propagates via topological sort
Ensure gradients accumulate (multivariate:+=not=)07Express tanh more simplyExercise: re-implement tanhusing constituent operations08Pytorch: BackpropagationExercise: Perform backpropagation to update gradients in PyTorch 09PyTorch: Gradient DescentExercise: Perform gradient descent (train the neural net) in PyTorch Sources: Andrej Karpathy - Zero to Hero · micrograd · lectures · exercises
Bigram Language Model & Neural Net Equivalent
A bigram model predicts the next character based only on the current one — the simplest possible language model. Built first as a counting model (character pair frequencies → probabilities), then rebuilt as an equivalent single-layer neural network trained with gradient descent, showing both approaches converge to the same solution.
5 notebooks + 2 helpers: bigram model and NN equivalent
Notebook Summary 01Define Bigram ModelCharacter pair counting; building the probability table 02SamplingGenerating names by sampling from the distribution helperBroadcasting TensorsPyTorch broadcasting rules with worked examples 03Loss & SmoothingNegative log-likelihood loss; Laplace smoothing 04Bigrams → Neural NetOne-hot input → linear layer → softmax; mathematically equivalent to counting helperOne-Hot EncodingHow one-hot vectors encode categorical inputs 05OptimisationTraining loop; gradient descent convergence Sources: Andrej Karpathy - Zero to Hero
JPEG Compression
JPEG is a lossy image compression standard. Built here from scratch: colour space conversion, 8×8 DCT blocks, quantization, zig-zag scanning, RLE, and Huffman coding.
1 notebook: JPEG compression
Notebook Summary 01JPEG CompressionFull encode/decode pipeline; quality comparison and PSNR Related: jpeg-compression · discrete-cosine-transform
Audio Fingerprinting
Audio fingerprinting identifies a song from a short, noisy clip by hashing the time-frequency peaks in its spectrogram. Built here from scratch — the same approach Shazam uses: FFT, STFT spectrogram, peak detection, and a hash-based song database.
2 notebooks: DFT/FFT + audio fingerprinting
Notebook Summary 01DFT and FFTNaive O(N²) DFT → Cooley-Tukey FFT; benchmark vs numpy 02Audio FingerprintingSTFT → constellation map → hash DB → noisy clip matching Related: audio-fingerprinting · spectrogram · fast-fourier-transform