Summary: JPEG compresses images by converting to a perceptual colour space, splitting into 8×8 blocks, applying the discrete-cosine-transform, discarding high-frequency DCT coefficients via quantization, and entropy coding the result.

Overview

JPEG (1992) exploits two properties of human vision:

  1. We are more sensitive to luminance (brightness/intensity) than chrominance (colour).
  2. We are more sensitive to low spatial frequencies (large shapes) than high frequencies (fine detail).
    1. Refers to how frequently a bit of an image changes in intensity of brightness or colour

The algorithm is lossy: the original pixel values cannot be recovered exactly. Loss is entirely concentrated in the quantization step; all other steps are lossless. Chroma downsampling is optional, and also lossy.

Encoder pipeline

Running dimensionality count (illustrative only)

  • Imagine an input image with pixels split across 3 channels:
  • Matrix entry values for , , and range between
  • Total: values per channel, total

1. Colour space conversion: RGB → YCbCr

Separate the luminosity (brightness) of an image from the chrominance (colour).

  • = luminance (roughly perceived brightness). Coefficients match the eye’s spectral sensitivity (green dominates).
  • and = blue-difference (i.e. blueness) and red-difference (i.e. redness) chroma channels

Chroma downsampling (optional, also lossy!)

Human eyes are not very sensitive to colour ( and ), compared to brightness (), so downsample the colour information (i.e. reduce colour resolution):

  • 4:4:4 - No downsampling

  • 4:2:2 - Downsample and by 2× horizontally only.

  • 4:2:0 - Downsample and by 2× horizontally and 2× vertically. 4× reduction in total

    • Four pixels share one and one value.
    • This quarters the chroma data with minimal visible impact, halving the total data (~2× compression) before any DCT.

    JPEG subsampling ratios.

2. Block splitting and level shift

For each channel , , and

  1. Block splitting: Divide the channel (, , and ) into non-overlapping pixel blocks, or sub-images.
  2. Level shift: Subtract 128 from each value (shift unsigned signed ).
    1. This centres the data for DCT: is black, is white. This matches the cosine wave’s interval

If the image dimensions are not multiples of 8, pad the edges by repeating the boundary pixels (then discard the padding after decoding).

Visualise matrices: Level shift on one sub-image ( pixel block)

3. 2D DCT per block

Analytic calculation

Apply the 2D DCT-II to each 8×8 block, individually for each channel , , and . Element-wise formula:

  • where:
    • : pixel value at row , column in the level-shifted block
    • : DCT coefficient at row frequency , column frequency
    • : spatial row index (vertical)
    • : spatial column index (horizontal)
    • : vertical spatial frequency index
    • : horizontal spatial frequency index
    • : block size
  • This is the unnormalized DCT-II (see wikipedia for normalized). The 64 pixel values become 64 DCT coefficients:
    • : the DC coefficient — equals 8 times the block mean; corresponds to the flat (constant) basis image
    • for : the 63 AC coefficients — capture spatial frequency content at increasing row and column frequencies

Visualise matrices: 2D DCT on one sub-image ( pixel block)

In-practice, dot product with precomputed basis image blocks

  1. In practice, the 64 DCT coefficients are just 64 dot products. Each DCT coefficient is the dot product of the pixel block with the corresponding DCT basis image :
  • where is an array of cosine values that can be precomputed once.
  1. To compute, flatten the pixel block into a 64-element vector , and each basis image likewise. Then:
  • Each DCT coefficient answers: how much of basis image is needed to reconstruct this block ?
  1. Concatenate all 64 flattened basis images as rows of a matrix , and the entire DCT becomes one matrix multiply:
  • Because the basis images are orthonormal, is orthogonal and the inverse DCT is simply . In practice, is precomputed once — no cosines at runtime.

4. Quantization — the lossy step

  • Divide each DCT coefficient by a quantization value and round to the nearest integer:
  • The JPEG standard defines a quantization table where increases with frequency. High-frequency coefficients are divided by large numbers → they round to zero. The low-frequency DC coefficient has a small value (precise).
  • There are separate tables for luminance () and chrominance (, ) — the chroma tables use larger quantization values, again exploiting lower sensitivity.
  • JPEG Quality factor (1–100): Scales the quantization table. At , the standard tables are used.
    • Higher quality: At : (smaller entries → finer quantization → less loss).
    • Lower quality: At : (larger entries → more loss).

This division and rounding is the only place information is destroyed. Everything else in the pipeline is exactly reversible.

Visualise matrices: Quantize the luminance ( ) DCT coefs for one sub-image ( pixel block)

  • Example: The DC coefficient:
  • Top left quantized coefficients are much larger. Our eyes perceive low frequency patterns well.
  • Bottom right quantized coefficients are smaller. Our eyes dont really perceive high frequency patterns anyway.

5. Serialization: Zig-zag scan

Reorder the 8×8 quantized coefficient block into a 1D array following a zig-zag path:

Image of serialization path

JPEG zig-zag ordering

This orders coefficients from low to high 2D frequency. After quantization, the array typically ends with a long run of zeros (high-frequency coefficients that rounded to zero) — ideal for run-length encoding.

6. DC coefficient: delta coding

The DC coefficient of each block (the block mean) changes slowly across the image. Rather than encoding it absolutely, encode the difference from the previous block’s DC:

This reduces the number of bits needed for smooth images.

7. AC coefficients: Run Length Encoding (RLE)

Encode the remaining 63 AC coefficients (after zig-zag) as the following 2 symbols:

Symbol 1Symbol 2
(RUNLENGTH, SIZE)(AMPLITUDE)
  • : non-zero, quantized AC coefficient
  • Symbol 1 (concatenated and Huffman-coded together):
    • RUNLENGTH: number of zeros preceeding the current coefficient (0–15)
    • SIZE: number of bits required to represent :
    • Special symbols:
      • (0,0) = end-of-block — no AMPLITUDE follows, all remaining coefficients are zero;
      • (15,0) = zero-run-length marker (ZRL) — 16 consecutive zeros, no AMPLITUDE follows.
  • Symbol 2:
    • AMPLITUDE: the actual coefficient value (bit representation of )
      • appended as SIZE raw bits, after Huffman code; not Huffman-coded itself

8. Huffman coding

Encode the (RUNLENGTH, SIZE) and symbols using Huffman codes. The JPEG standard defines default Huffman tables (alternatively, the encoder can derive optimal tables from the image — “optimised Huffman”). The Huffman coder is lossless: it assigns shorter bit sequences to more frequent symbols.

The output is a compressed bitstream with a JPEG File Interchange Format (JFIF/EXIF) header describing the tables used. JFIF is a wrapper holding the compressed data.

Dimensionality count — final tally

StageData size
Raw RGB pixels values × 8 bits = 240 kbits
After 4:2:0 values × 8 bits = 120 kbits
After DCT + quantization (quality 50) non-zero coefficients × ~4 bits avg ≈ ~51 kbits
After Huffmantypically ~45–60 kbits for a photographic image

Compression ratio: roughly 4:1 at quality 50 for this 100×100 example.

Decoder pipeline

Exactly reverse:

  1. Huffman decode → RLE symbols → AC + DC coefficients
  2. Zig-zag inverse → 8×8 coefficient matrix
  3. Dequantize: multiply by
  4. Inverse 2D DCT → pixel block
  5. Add 128 (undo level shift)
  6. Upsample chroma channels (bilinear or nearest)
  7. YCbCr → RGB

The dequantization step cannot recover the lost precision — if rounded to 0, multiplying 0 back by gives 0, not the original value.

Compression artefacts

Blocking: At low quality, quantization introduces large differences between adjacent blocks. Since DCT is applied independently per block, there is no information across block boundaries → visible 8×8 grid pattern.

Ringing (Gibbs phenomenon): Near sharp edges, the DCT is being asked to represent a discontinuity with a truncated frequency series. This causes oscillation on both sides of the edge, similar to the Gibbs phenomenon in Fourier series.

Colour bleeding: Chroma subsampling plus low-quality chrominance DCT causes colour to bleed across sharp luminance edges.

Typical compression ratios

QualityRatioUse case
95~3:1Archival, print
80~8:1Web photos
60~15:1Thumbnails
30~30:1Preview images

JPEG is poorly suited for graphics with sharp edges, text, or flat colour regions — PNG (lossless) is better there. JPEG excels at photographic images where high-frequency DCT coefficients are genuinely small.