Related: jpeg-compression, discrete-cosine-transform, fourier-transform

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from pathlib import Path
%matplotlib inline
 
IMAGE_PATH = "path/to/image.png"  # override, or drop an image into this folder

1. Load image and inspect channels

Load a PNG or BMP (not JPEG — we want the raw pixel values). Inspect the shape and value range of each RGB channel.

# ── image discovery ───────────────────────────────────────────────────────────
# Drop any uncompressed file (TIFF, BMP, PNG, or camera RAW: DNG/CR2/NEF/ARW)
# into the same folder as this notebook. First match wins.
HERE = Path(".")
_uncompressed = [*HERE.glob("*.tif"), *HERE.glob("*.tiff"),
                 *HERE.glob("*.bmp"), *HERE.glob("*.png")]
_raw          = [*HERE.glob("*.dng"), *HERE.glob("*.cr2"),
                 *HERE.glob("*.nef"), *HERE.glob("*.arw")]
 
if IMAGE_PATH == "path/to/image.png":
    if _uncompressed:
        IMAGE_PATH = _uncompressed[0]
    elif _raw:
        IMAGE_PATH = _raw[0]
 
# ── load ──────────────────────────────────────────────────────────────────────
img = None
if IMAGE_PATH and Path(IMAGE_PATH).exists():
    p = Path(IMAGE_PATH)
    if p.suffix.lower() in {".dng", ".cr2", ".nef", ".arw"}:
        import rawpy
        with rawpy.imread(str(p)) as raw:
            img = raw.postprocess().astype(np.float32)  # (H, W, 3)
    else:
        img = np.array(Image.open(p).convert("RGB"), dtype=np.float32)
 
    print(f"Loaded: {p.name}  shape={img.shape}  range=[{img.min():.0f}, {img.max():.0f}]")
 
    fig, axes = plt.subplots(1, 4, figsize=(14, 4))
    axes[0].imshow(img.astype(np.uint8)); axes[0].set_title("RGB")
    for i, (ch, name) in enumerate(zip(img.transpose(2, 0, 1), ["R", "G", "B"]), 1):
        axes[i].imshow(ch, cmap="gray", vmin=0, vmax=255)
        axes[i].set_title(f"{name}  μ={ch.mean():.1f}")
    for ax in axes: ax.axis("off")
    plt.tight_layout(); plt.show()
else:
    print("No image found — drop a TIFF/BMP/PNG/RAW into this folder and re-run, or set IMAGE_PATH manually.")
 
# ── toy 128×128 RGB matrix ────────────────────────────────────────────────────
# skimage.data.astronaut() is the standard photographic test image used in image
# compression courses: smooth skin tones (low-freq) + fine helmet text (high-freq).
from skimage import data as skdata
_full = Image.fromarray(skdata.astronaut())                          # 512×512 PIL RGB
toy = np.array(_full.resize((128, 128), Image.LANCZOS), dtype=np.float32)  # (128, 128, 3)
 
fig, axes = plt.subplots(1, 4, figsize=(14, 4))
axes[0].imshow(toy.astype(np.uint8)); axes[0].set_title("Toy: astronaut  (128×128)")
for i, (ch, name) in enumerate(zip(toy.transpose(2, 0, 1), ["R", "G", "B"]), 1):
    axes[i].imshow(ch, cmap="gray", vmin=0, vmax=255)
    axes[i].set_title(f"{name}  μ={ch.mean():.1f}")
for ax in axes: ax.axis("off")
plt.tight_layout(); plt.show()
 
print(f"toy.shape={toy.shape}  dtype={toy.dtype}")
print("Use 'toy' throughout the pipeline for quick iteration; swap to 'img' for the real image.")
No image found — drop a TIFF/BMP/PNG/RAW into this folder and re-run, or set IMAGE_PATH manually.
plot
toy.shape=(128, 128, 3)  dtype=float32
Use 'toy' throughout the pipeline for quick iteration; swap to 'img' for the real image.

2. RGB → YCbCr colour space conversion

Apply the ITU-R BT.601 matrix transform (from scratch — a matrix multiply + offset). Visualise Y, Cb, Cr channels separately.

Then apply 4:2:0 chroma subsampling: downsample Cb and Cr by 2× in both dimensions.

print(type(toy))
print(f"toy.shape={toy.shape}  dtype={toy.dtype}")
 
 
<class 'numpy.ndarray'>
toy.shape=(128, 128, 3)  dtype=float32
array([[[154., 150., 158.],
        [169., 165., 163.],
        [137., 128., 129.],
        ...,
        [129., 121., 114.],
        [126., 118., 112.],
        [125., 117., 111.]],
 
       [[234., 227., 224.],
        [206., 200., 192.],
        [ 78.,  68.,  96.],
        ...,
        [128., 119., 114.],
        [126., 118., 112.],
        [125., 117., 111.]],
 
       [[229., 223., 218.],
        [199., 195., 185.],
        [ 74.,  67.,  96.],
        ...,
        [129., 120., 116.],
        [127., 118., 114.],
        [125., 117., 112.]],
 
       ...,
 
       [[187., 171., 175.],
        [194., 177., 184.],
        [171., 152., 152.],
        ...,
        [ 32.,  30.,  24.],
        [ 53.,  48.,  49.],
        [108., 104., 106.]],
 
       [[188., 172., 178.],
        [180., 162., 166.],
        [150., 129., 124.],
        ...,
        [ 59.,  55.,  50.],
        [ 71.,  64.,  63.],
        [ 29.,  28.,  27.]],
 
       [[184., 167., 172.],
        [160., 141., 140.],
        [140., 119., 112.],
        ...,
        [ 53.,  50.,  43.],
        [ 53.,  48.,  46.],
        [  9.,   8.,   7.]]], shape=(128, 128, 3), dtype=float32)

3. 8×8 block splitting and level shift

Split each channel into non-overlapping 8×8 blocks (pad if image dimensions are not multiples of 8). Subtract 128 from each value to centre on zero.

4. 2D DCT-II — from scratch

Implement 1D DCT-II, then apply it separably (rows then columns) to get the 2D DCT.

Verify against scipy.fft.dct on a random block.

4.1 Energy compaction demo

For a sample 8×8 block, plot the DCT coefficient matrix as a heatmap. Show how much energy is concentrated in the top-left corner.

5. Quantization

Define the standard JPEG luminance and chrominance quantization tables. Apply: Q_hat[k] = round(X[k] / Q[k]) per block.

Implement a quality factor q (1–100) that scales the table:

  • q > 50: Q_scaled = Q * (100 - q) / 50
  • q < 50: Q_scaled = Q * 50 / q

Inspect how many coefficients round to zero at q=50 vs q=10.

6. Zig-zag scan

Reorder each 8×8 quantized block into a 64-element 1D array following the JPEG zig-zag pattern. Verify the ordering by checking the index table.

Observe that the tail of the array is mostly zeros for a typical block.

7. Entropy coding: RLE + Huffman

DC coefficients: delta coding

Encode the DC coefficient as the difference from the previous block’s DC value.

AC coefficients: run-length encoding

Encode the 63 AC values as (zero_run_length, value) pairs. Special codes: (0,0) = end-of-block; (15,0) = 16 zeros.

Huffman coding

Build a Huffman tree from scratch:

  1. Compute symbol frequencies across all blocks
  2. Build the tree (min-heap)
  3. Assign bit codes
  4. Encode the RLE stream

Report the compressed bitstream length.

8. Decoder — reverse pipeline

  1. Huffman decode → RLE symbols → DC + AC coefficients
  2. Inverse zig-zag → 8×8 matrix
  3. Dequantize: multiply by Q[k]
  4. Inverse 2D DCT
  5. Add 128 (undo level shift)
  6. Upsample Cb, Cr (bilinear or nearest)
  7. YCbCr → RGB

9. Quality comparison and PSNR

Run the full encode → decode pipeline at quality levels [95, 80, 60, 30, 10]. For each:

  • Show original vs reconstructed image side-by-side
  • Compute PSNR: 10 * log10(255² / MSE) (higher = better)
  • Report compressed size in bytes (bitstream length)

Observe blocking artefacts at low quality.

Sources