Summary: JPEG compresses images by converting to a perceptual colour space, splitting into 8×8 blocks, applying the discrete-cosine-transform, discarding high-frequency DCT coefficients via quantization, and entropy coding the result.
Overview
JPEG (1992) exploits two properties of human vision:
- We are more sensitive to luminance (brightness/intensity) than chrominance (colour).
- We are more sensitive to low spatial frequencies (large shapes) than high frequencies (fine detail).
- Refers to how frequently a bit of an image changes in intensity of brightness or colour
The algorithm is lossy: the original pixel values cannot be recovered exactly. Loss is entirely concentrated in the quantization step; all other steps are lossless. Chroma downsampling is optional, and also lossy.
Encoder pipeline
Running dimensionality count (illustrative only)
- Imagine an input image with pixels split across 3 channels:
- Matrix entry values for , , and range between
- Total: values per channel, total
1. Colour space conversion: RGB → YCbCr
Separate the luminosity (brightness) of an image from the chrominance (colour).
- = luminance (roughly perceived brightness). Coefficients match the eye’s spectral sensitivity (green dominates).
- and = blue-difference (i.e. blueness) and red-difference (i.e. redness) chroma channels
Images: Colour space conversion (RBG → YCbCr)
Note: To human eyes, luminance is easy to perceive, but both chrominances are difficult to perceive.
Image 1:
Image 2:
Chroma downsampling (optional, also lossy!)
Human eyes are not very sensitive to colour ( and ), compared to brightness (), so downsample the colour information (i.e. reduce colour resolution):
-
4:4:4 - No downsampling
-
4:2:2 - Downsample and by 2× horizontally only.
-
4:2:0 - Downsample and by 2× horizontally and 2× vertically. 4× reduction in total
- Four pixels share one and one value.
- This quarters the chroma data with minimal visible impact, halving the total data (~2× compression) before any DCT.
Images: Chrominance downsampling (4:2:0)
Note: Chrominance matrices are now a quarter of their original size. Luminance is unchanged.
Image 1:
Image 2:
Dimensionality count (illustrative only)
- After conversion: — same shape, different semantics.
- After 4:2:0 subsampling, and are a quarter of their original sizes:
- Luminance is unchanged: (i.e. 10k pixels),
- But chrominance is reduced: (i.e. 2.5k pixels each).
- Total values: — already half the original .
2. Block splitting and level shift
For each channel , , and
- Block splitting: Divide the channel (, , and ) into non-overlapping pixel blocks, or sub-images.
- Level shift: Subtract 128 from each value (shift unsigned → signed ).
- This centres the data for DCT: is black, is white. This matches the cosine wave’s interval
If the image dimensions are not multiples of 8, pad the edges by repeating the boundary pixels (then discard the padding after decoding).
Visualise matrices: Level shift on one sub-image ( pixel block)
Running example
- 100 is not a multiple of 8, so pad each dimension to 104 (next multiple of 8).
- , after padding.
- Block counts: gives blocks; give blocks each.
- Total: blocks, each an matrix.
3. 2D DCT per block
Analytic calculation
Apply the 2D DCT-II to each 8×8 block, individually for each channel , , and . Element-wise formula:
- where:
- : pixel value at row , column in the level-shifted block
- : DCT coefficient at row frequency , column frequency
- : spatial row index (vertical)
- : spatial column index (horizontal)
- : vertical spatial frequency index
- : horizontal spatial frequency index
- : block size
- This is the unnormalized DCT-II (see wikipedia for normalized). The 64 pixel values become 64 DCT coefficients:
- : the DC coefficient — equals 8 times the block mean; corresponds to the flat (constant) basis image
- for : the 63 AC coefficients — capture spatial frequency content at increasing row and column frequencies
Visualise matrices: 2D DCT on one sub-image ( pixel block)
Animation: 64 DCT basis images (64 px blocks), used to reconstruct a sub-image ( px block)
- The 64 DCT basis images (each image is an pixel block)
- Animation: Reconstructing a sub-image ( pixel block), but combining a weighted sum of these basis images:
In-practice, dot product with precomputed basis image blocks
- In practice, the 64 DCT coefficients are just 64 dot products. Each DCT coefficient is the dot product of the pixel block with the corresponding DCT basis image :
- where is an array of cosine values that can be precomputed once.
- To compute, flatten the pixel block into a 64-element vector , and each basis image likewise. Then:
- Each DCT coefficient answers: how much of basis image is needed to reconstruct this block ?
- Concatenate all 64 flattened basis images as rows of a matrix , and the entire DCT becomes one matrix multiply:
- Because the basis images are orthonormal, is orthogonal and the inverse DCT is simply . In practice, is precomputed once — no cosines at runtime.
Intuition (for a smooth pixel block)
- Consider a smooth, medium-gray block where all Y values are . After the mandatory level shift, all pixel values are .
- — high energy, because the flat basis is all 1s, so every pixel contributes equally
- — the high-frequency basis is a cosine product that oscillates rapidly between positive and negative values; for a uniform block these contributions cancel in the dot product
Dimensionality count (illustrative only)
- Each pixel block DCT coefficient matrix. Shape unchanged: 267 blocks, each .
- (DC) ≈ block mean (typically a large value like 500–900 for bright regions).
- (highest frequency AC) ≈ near zero for any smooth block.
4. Quantization — the lossy step
- Divide each DCT coefficient by a quantization value and round to the nearest integer:
- The JPEG standard defines a quantization table where increases with frequency. High-frequency coefficients are divided by large numbers → they round to zero. The low-frequency DC coefficient has a small value (precise).
- There are separate tables for luminance () and chrominance (, ) — the chroma tables use larger quantization values, again exploiting lower sensitivity.
- JPEG Quality factor (1–100): Scales the quantization table. At , the standard tables are used.
- Higher quality: At : (smaller entries → finer quantization → less loss).
- Lower quality: At : (larger entries → more loss).
This division and rounding is the only place information is destroyed. Everything else in the pipeline is exactly reversible.
Visualise matrices: Quantize the luminance ( ) DCT coefs for one sub-image ( pixel block)
- Example: The DC coefficient:
- Top left quantized coefficients are much larger. Our eyes perceive low frequency patterns well.
- Bottom right quantized coefficients are smaller. Our eyes dont really perceive high frequency patterns anyway.
Dimensionality count (illustrative only)
- Each coefficient divided and rounded: 267 blocks, each .
- At quality 50, typically 40–55 of the 63 AC coefficients per sub-image (pixel block) round to zero
- the zig-zag tail is a long run of s.
5. Serialization: Zig-zag scan
Reorder the 8×8 quantized coefficient block into a 1D array following a zig-zag path:
Image of serialization path
This orders coefficients from low to high 2D frequency. After quantization, the array typically ends with a long run of zeros (high-frequency coefficients that rounded to zero) — ideal for run-length encoding.
Visualise array of serialised DCT coefficients
- Quantized DCT coefficients, now serialized: .
- This is a 1D array of length 64. The format below is only for ease of understanding:
Dimensionality count (illustrative only)
- Each block a length-64 integer vector. Order: DC first, then AC from low to high frequency.
- Typical vector for a smooth block: — non-zero values cluster at the start.
- Total: integers, before entropy coding.
6. DC coefficient: delta coding
The DC coefficient of each block (the block mean) changes slowly across the image. Rather than encoding it absolutely, encode the difference from the previous block’s DC:
This reduces the number of bits needed for smooth images.
Visualise: DC delta coding for our running block
- The quantized DC coefficient of the current block is .
- Suppose the previous block had quantized DC , then .
- is encoded instead of — smaller magnitude → fewer bits under Huffman.
- For the very first block in a channel, the previous DC is defined as , so the raw DC value is sent directly.
Dimensionality count (illustrative only)
- DC sequences (one value per block, per channel):
- : 169 DC values 169 deltas, e.g.
- : 49 deltas each
- Deltas are small for smooth images, compressing well under Huffman.
7. AC coefficients: Run Length Encoding (RLE)
Encode the remaining 63 AC coefficients (after zig-zag) as the following 2 symbols:
| Symbol 1 | Symbol 2 |
|---|---|
(RUNLENGTH, SIZE) | (AMPLITUDE) |
- : non-zero, quantized AC coefficient
- Symbol 1 (concatenated and Huffman-coded together):
RUNLENGTH: number of zeros preceeding the current coefficient (0–15)SIZE: number of bits required to represent :- Special symbols:
(0,0)= end-of-block — no AMPLITUDE follows, all remaining coefficients are zero;(15,0)= zero-run-length marker (ZRL) — 16 consecutive zeros, no AMPLITUDE follows.
- Symbol 2:
AMPLITUDE: the actual coefficient value (bit representation of )- appended as
SIZEraw bits, after Huffman code; not Huffman-coded itself
- appended as
Visualise: AC RLE for our running block
AC values from the serialized zig-zag array (positions 1–63):
Encoding each non-zero coefficient, , as
(RUNLENGTH, SIZE)(AMPLITUDE), we get the following 20 units which encode all 63 AC values.
- The
(RUNLENGTH, SIZE)part is Huffman-coded;- The
AMPLITUDEis appended raw.
(RUNLENGTH, SIZE)AMPLITUDENote (0, 2) 0 preceding zeros; SIZE2 covers(1, 2) 1 zero before this value, SIZE2 to store it(0, 2) (0, 3) SIZE3 covers(0, 2) (0, 3) (0, 1) SIZE1 covers(0, 2) (0, 1) (0, 1) (0, 3) (0, 1) (0, 2) (0, 1) (0, 1) (0, 1) (0, 2) (5, 1) 5 zeros before this value (0, 1) (0, 0) — EOB: 38 trailing zeros collapsed to one symbol
Dimensionality count (illustrative only)
- A block with zig-zag AC values (60 trailing zeros) encodes as:
- — 3 units instead of 63 values.
- The end-of-block symbol is the key savings: any block with a long zero tail gets a single terminator.
8. Huffman coding
Encode the (RUNLENGTH, SIZE) and symbols using Huffman codes. The JPEG standard defines default Huffman tables (alternatively, the encoder can derive optimal tables from the image — “optimised Huffman”). The Huffman coder is lossless: it assigns shorter bit sequences to more frequent symbols.
The output is a compressed bitstream with a JPEG File Interchange Format (JFIF/EXIF) header describing the tables used. JFIF is a wrapper holding the compressed data.
Dimensionality count — final tally
Stage Data size Raw RGB pixels values × 8 bits = 240 kbits After 4:2:0 values × 8 bits = 120 kbits After DCT + quantization (quality 50) non-zero coefficients × ~4 bits avg ≈ ~51 kbits After Huffman typically ~45–60 kbits for a photographic image Compression ratio: roughly 4:1 at quality 50 for this 100×100 example.
Decoder pipeline
Exactly reverse:
- Huffman decode → RLE symbols → AC + DC coefficients
- Zig-zag inverse → 8×8 coefficient matrix
- Dequantize: multiply by
- Inverse 2D DCT → pixel block
- Add 128 (undo level shift)
- Upsample chroma channels (bilinear or nearest)
- YCbCr → RGB
The dequantization step cannot recover the lost precision — if rounded to 0, multiplying 0 back by gives 0, not the original value.
Compression artefacts
Blocking: At low quality, quantization introduces large differences between adjacent blocks. Since DCT is applied independently per block, there is no information across block boundaries → visible 8×8 grid pattern.
Ringing (Gibbs phenomenon): Near sharp edges, the DCT is being asked to represent a discontinuity with a truncated frequency series. This causes oscillation on both sides of the edge, similar to the Gibbs phenomenon in Fourier series.
Colour bleeding: Chroma subsampling plus low-quality chrominance DCT causes colour to bleed across sharp luminance edges.
Typical compression ratios
| Quality | Ratio | Use case |
|---|---|---|
| 95 | ~3:1 | Archival, print |
| 80 | ~8:1 | Web photos |
| 60 | ~15:1 | Thumbnails |
| 30 | ~30:1 | Preview images |
JPEG is poorly suited for graphics with sharp edges, text, or flat colour regions — PNG (lossless) is better there. JPEG excels at photographic images where high-frequency DCT coefficients are genuinely small.






