Implementations

Hands-on reimplementations with annotated notebooks. Each entry below is a self-contained notebook.

Backpropagation & Autograd Engine

Backpropagation is the algorithm that computes gradients through a neural network by applying the chain rule backwards through the computation graph. Built here as a scalar-valued autograd engine — each operation tracks its inputs and knows how to propagate gradients — plus a minimal neural net library on top. ~200 lines of Python, zero dependencies.

9 notebooks: backpropagation and autograd

Notebook Summary
01 Derivatives & Autograd Limit definition; manual derivative computation
02 Data Struct & Forward Pass Building the Value class; first forward pass
03 Backpropagation (by hand) Hand-assigning gradients at each node
04 Training a Neuron tanh as an activation function
05 Manual Backward Pass Recursive ._backward() calls (compute each node’s gradient)
06 Automated Backward Pass Single .backward() call, propagates via topological sort
Ensure gradients accumulate (multivariate: += not =)
07 Express tanh more simply Exercise: re-implement tanh using constituent operations
08 Pytorch: Backpropagation Exercise: Perform backpropagation to update gradients in PyTorch
09 PyTorch: Gradient Descent Exercise: Perform gradient descent (train the neural net) in PyTorch

Sources: Andrej Karpathy - Zero to Hero · micrograd · lectures · exercises

Notebook	Summary
`01` Derivatives & Autograd	Limit definition; manual derivative computation
`02` Data Struct & Forward Pass	Building the `Value` class; first forward pass
`03` Backpropagation (by hand)	Hand-assigning gradients at each node
`04` Training a Neuron	`tanh` as an activation function
`05` Manual Backward Pass	Recursive `._backward()` calls (compute each node’s gradient)
`06` Automated Backward Pass	Single `.backward()` call, propagates via topological sort Ensure gradients accumulate (multivariate: `+=` not `=`)
`07` Express tanh more simply	Exercise: re-implement `tanh` using constituent operations
`08` Pytorch: Backpropagation	Exercise: Perform backpropagation to update gradients in PyTorch
`09` PyTorch: Gradient Descent	Exercise: Perform gradient descent (train the neural net) in PyTorch

Bigram Language Model & Neural Net Equivalent

A bigram model predicts the next character based only on the current one — the simplest possible language model. Built first as a counting model (character pair frequencies → probabilities), then rebuilt as an equivalent single-layer neural network trained with gradient descent, showing both approaches converge to the same solution.

5 notebooks + 2 helpers: bigram model and NN equivalent

Notebook Summary
01 Define Bigram Model Character pair counting; building the probability table
02 Sampling Generating names by sampling from the distribution
helper Broadcasting Tensors PyTorch broadcasting rules with worked examples
03 Loss & Smoothing Negative log-likelihood loss; Laplace smoothing
04 Bigrams → Neural Net One-hot input → linear layer → softmax; mathematically equivalent to counting
helper One-Hot Encoding How one-hot vectors encode categorical inputs
05 Optimisation Training loop; gradient descent convergence

Sources: Andrej Karpathy - Zero to Hero

Notebook	Summary
`01` Define Bigram Model	Character pair counting; building the probability table
`02` Sampling	Generating names by sampling from the distribution
`helper` Broadcasting Tensors	PyTorch broadcasting rules with worked examples
`03` Loss & Smoothing	Negative log-likelihood loss; Laplace smoothing
`04` Bigrams → Neural Net	One-hot input → linear layer → softmax; mathematically equivalent to counting
`helper` One-Hot Encoding	How one-hot vectors encode categorical inputs
`05` Optimisation	Training loop; gradient descent convergence

JPEG Compression

JPEG is a lossy image compression standard. Built here from scratch: colour space conversion, 8×8 DCT blocks, quantization, zig-zag scanning, RLE, and Huffman coding.

1 notebook: JPEG compression

Notebook Summary
01 JPEG Compression Full encode/decode pipeline; quality comparison and PSNR

Related: jpeg-compression · discrete-cosine-transform

Notebook	Summary
`01` JPEG Compression	Full encode/decode pipeline; quality comparison and PSNR

Audio Fingerprinting

Audio fingerprinting identifies a song from a short, noisy clip by hashing the time-frequency peaks in its spectrogram. Built here from scratch — the same approach Shazam uses: FFT, STFT spectrogram, peak detection, and a hash-based song database.

2 notebooks: DFT/FFT + audio fingerprinting

Notebook Summary
01 DFT and FFT Naive O(N²) DFT → Cooley-Tukey FFT; benchmark vs numpy
02 Audio Fingerprinting STFT → constellation map → hash DB → noisy clip matching

Related: audio-fingerprinting · spectrogram · fast-fourier-transform

Implementations

Backpropagation & Autograd Engine

Bigram Language Model & Neural Net Equivalent

JPEG Compression

Audio Fingerprinting

Related hubs

Graph View

Backlinks

Explorer