Related: jpeg-compression, discrete-cosine-transform, fourier-transform
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from pathlib import Path
%matplotlib inline
IMAGE_PATH = "path/to/image.png" # override, or drop an image into this folder1. Load image and inspect channels
Load a PNG or BMP (not JPEG — we want the raw pixel values). Inspect the shape and value range of each RGB channel.
# ── image discovery ───────────────────────────────────────────────────────────
# Drop any uncompressed file (TIFF, BMP, PNG, or camera RAW: DNG/CR2/NEF/ARW)
# into the same folder as this notebook. First match wins.
HERE = Path(".")
_uncompressed = [*HERE.glob("*.tif"), *HERE.glob("*.tiff"),
*HERE.glob("*.bmp"), *HERE.glob("*.png")]
_raw = [*HERE.glob("*.dng"), *HERE.glob("*.cr2"),
*HERE.glob("*.nef"), *HERE.glob("*.arw")]
if IMAGE_PATH == "path/to/image.png":
if _uncompressed:
IMAGE_PATH = _uncompressed[0]
elif _raw:
IMAGE_PATH = _raw[0]
# ── load ──────────────────────────────────────────────────────────────────────
img = None
if IMAGE_PATH and Path(IMAGE_PATH).exists():
p = Path(IMAGE_PATH)
if p.suffix.lower() in {".dng", ".cr2", ".nef", ".arw"}:
import rawpy
with rawpy.imread(str(p)) as raw:
img = raw.postprocess().astype(np.float32) # (H, W, 3)
else:
img = np.array(Image.open(p).convert("RGB"), dtype=np.float32)
print(f"Loaded: {p.name} shape={img.shape} range=[{img.min():.0f}, {img.max():.0f}]")
fig, axes = plt.subplots(1, 4, figsize=(14, 4))
axes[0].imshow(img.astype(np.uint8)); axes[0].set_title("RGB")
for i, (ch, name) in enumerate(zip(img.transpose(2, 0, 1), ["R", "G", "B"]), 1):
axes[i].imshow(ch, cmap="gray", vmin=0, vmax=255)
axes[i].set_title(f"{name} μ={ch.mean():.1f}")
for ax in axes: ax.axis("off")
plt.tight_layout(); plt.show()
else:
print("No image found — drop a TIFF/BMP/PNG/RAW into this folder and re-run, or set IMAGE_PATH manually.")
# ── toy 128×128 RGB matrix ────────────────────────────────────────────────────
# skimage.data.astronaut() is the standard photographic test image used in image
# compression courses: smooth skin tones (low-freq) + fine helmet text (high-freq).
from skimage import data as skdata
_full = Image.fromarray(skdata.astronaut()) # 512×512 PIL RGB
toy = np.array(_full.resize((128, 128), Image.LANCZOS), dtype=np.float32) # (128, 128, 3)
fig, axes = plt.subplots(1, 4, figsize=(14, 4))
axes[0].imshow(toy.astype(np.uint8)); axes[0].set_title("Toy: astronaut (128×128)")
for i, (ch, name) in enumerate(zip(toy.transpose(2, 0, 1), ["R", "G", "B"]), 1):
axes[i].imshow(ch, cmap="gray", vmin=0, vmax=255)
axes[i].set_title(f"{name} μ={ch.mean():.1f}")
for ax in axes: ax.axis("off")
plt.tight_layout(); plt.show()
print(f"toy.shape={toy.shape} dtype={toy.dtype}")
print("Use 'toy' throughout the pipeline for quick iteration; swap to 'img' for the real image.")No image found — drop a TIFF/BMP/PNG/RAW into this folder and re-run, or set IMAGE_PATH manually.toy.shape=(128, 128, 3) dtype=float32
Use 'toy' throughout the pipeline for quick iteration; swap to 'img' for the real image.2. RGB → YCbCr colour space conversion
Apply the ITU-R BT.601 matrix transform (from scratch — a matrix multiply + offset). Visualise Y, Cb, Cr channels separately.
Then apply 4:2:0 chroma subsampling: downsample Cb and Cr by 2× in both dimensions.
print(type(toy))
print(f"toy.shape={toy.shape} dtype={toy.dtype}")
<class 'numpy.ndarray'>
toy.shape=(128, 128, 3) dtype=float32array([[[154., 150., 158.],
[169., 165., 163.],
[137., 128., 129.],
...,
[129., 121., 114.],
[126., 118., 112.],
[125., 117., 111.]],
[[234., 227., 224.],
[206., 200., 192.],
[ 78., 68., 96.],
...,
[128., 119., 114.],
[126., 118., 112.],
[125., 117., 111.]],
[[229., 223., 218.],
[199., 195., 185.],
[ 74., 67., 96.],
...,
[129., 120., 116.],
[127., 118., 114.],
[125., 117., 112.]],
...,
[[187., 171., 175.],
[194., 177., 184.],
[171., 152., 152.],
...,
[ 32., 30., 24.],
[ 53., 48., 49.],
[108., 104., 106.]],
[[188., 172., 178.],
[180., 162., 166.],
[150., 129., 124.],
...,
[ 59., 55., 50.],
[ 71., 64., 63.],
[ 29., 28., 27.]],
[[184., 167., 172.],
[160., 141., 140.],
[140., 119., 112.],
...,
[ 53., 50., 43.],
[ 53., 48., 46.],
[ 9., 8., 7.]]], shape=(128, 128, 3), dtype=float32)3. 8×8 block splitting and level shift
Split each channel into non-overlapping 8×8 blocks (pad if image dimensions are not multiples of 8). Subtract 128 from each value to centre on zero.
4. 2D DCT-II — from scratch
Implement 1D DCT-II, then apply it separably (rows then columns) to get the 2D DCT.
Verify against scipy.fft.dct on a random block.
4.1 Energy compaction demo
For a sample 8×8 block, plot the DCT coefficient matrix as a heatmap. Show how much energy is concentrated in the top-left corner.
5. Quantization
Define the standard JPEG luminance and chrominance quantization tables.
Apply: Q_hat[k] = round(X[k] / Q[k]) per block.
Implement a quality factor q (1–100) that scales the table:
q > 50:Q_scaled = Q * (100 - q) / 50q < 50:Q_scaled = Q * 50 / q
Inspect how many coefficients round to zero at q=50 vs q=10.
6. Zig-zag scan
Reorder each 8×8 quantized block into a 64-element 1D array following the JPEG zig-zag pattern. Verify the ordering by checking the index table.
Observe that the tail of the array is mostly zeros for a typical block.
7. Entropy coding: RLE + Huffman
DC coefficients: delta coding
Encode the DC coefficient as the difference from the previous block’s DC value.
AC coefficients: run-length encoding
Encode the 63 AC values as (zero_run_length, value) pairs.
Special codes: (0,0) = end-of-block; (15,0) = 16 zeros.
Huffman coding
Build a Huffman tree from scratch:
- Compute symbol frequencies across all blocks
- Build the tree (min-heap)
- Assign bit codes
- Encode the RLE stream
Report the compressed bitstream length.
8. Decoder — reverse pipeline
- Huffman decode → RLE symbols → DC + AC coefficients
- Inverse zig-zag → 8×8 matrix
- Dequantize: multiply by
Q[k] - Inverse 2D DCT
- Add 128 (undo level shift)
- Upsample Cb, Cr (bilinear or nearest)
- YCbCr → RGB
9. Quality comparison and PSNR
Run the full encode → decode pipeline at quality levels [95, 80, 60, 30, 10].
For each:
- Show original vs reconstructed image side-by-side
- Compute PSNR:
10 * log10(255² / MSE)(higher = better) - Report compressed size in bytes (bitstream length)
Observe blocking artefacts at low quality.
Sources
- jpeg-compression
- discrete-cosine-transform
- JPEG standard: ITU-T T.81 / ISO/IEC 10918-1 (1992)