DRAM Burst Buffer and Burst Length

Summary: DDR5’s 128-bit burst buffer implements 16n prefetch — the DRAM array fetches 128 bits per chip in one internal core cycle, then streams them as 16 consecutive 8-bit transfers, matching the x86 cache line size exactly and reducing column-multiplexer switching overhead by 16×.

The prefetch architecture

Prefetch depth (n) describes how many bits the DRAM internally fetches from the array per external I/O cycle. Higher prefetch depth decouples the internal array clock from the fast external interface, letting the array operate at a lower, more stable frequency.

Generation	Prefetch	Internal core clock (DDR5-4800 equivalent)
SDR	1n	= external I/O clock
DDR	2n	½ × external I/O clock
DDR2	4n	¼ × external I/O clock
DDR3/DDR4	8n	⅛ × external I/O clock
DDR5	16n	¹⁄₁₆ × external I/O clock

For DDR5-4800 (2,400 MHz external clock, since DDR doubles each edge):

Internal array core runs at: 2,400 MHz ÷ 8 = 300 MHz
Each internal core cycle fetches 16 × 8 bits = 128 bits per chip

The burst buffer holds exactly this 128-bit fetch result, then serialises it to the 8-pin ×8 interface over 16 clock cycles.

TODO: Prefetch evolution diagram — show 1n through 16n. Each level: internal bus width doubles, external clock frequency stays fixed, internal core frequency halves. Illustrate how the prefetch buffer bridges internal and external domains.

The burst buffer

A 128-bit temporary register (burst buffer) is placed between the column multiplexer and each driver (one for reads, one for writes).

TODO: Circuit diagram — Address Input → Bank Group/Bank Control (×5) → Row Decoder (×16) → 65,536 rows → Sense Amplifiers (×8,192) → Column Multiplexer → Burst Buffer (read) + Burst Buffer (write) → Read Driver / Write Driver → ×8 data wires. Source:

Column address split (10 bits → 6 + 4)

Field	Bits	Range	Purpose
Multiplexer select	6	0–63	Selects 1 of 64 contiguous groups of 128 bitlines (64 × 128 = 8,192 bitlines total)
Burst position	4	0–15	Selects 1 of 16 eight-bit segments within the 128-bit burst buffer

The 128 selected bitlines must be contiguous — the multiplexer cannot select an arbitrary non-contiguous window.

How a burst read works

6-bit multiplexer select: Connects 128 contiguous bitlines to the burst buffer → loads 128 bits in one operation.
4-bit burst counter (0000 → 1111): Steps through the burst buffer 8 bits at a time → 16 consecutive 8-bit transfers to the read driver → out to the 8 data wires.

This is BL16 (burst length 16): one multiplexer command produces 16 data transfers = 128 bits total per chip.

Write works identically in reverse: the burst counter fills the 128-bit burst buffer from the write driver, then the multiplexer drives all 128 bits back to the selected bitlines simultaneously.

Bandwidth improvement

	Without burst buffer	With burst buffer
Multiplexer positions for 8,192 columns	8,192 ÷ 8 = 1,024	8,192 ÷ 128 = 64
Transfers per multiplexer position	1 × 8 bits	16 × 8 bits
Multiplexer switching overhead	baseline	16× lower

Cache line alignment

For a 32-bit DDR5 sub-channel (4 chips × 8 bits each):

Per burst: 128 bits per chip × 4 chips = 512 bits = 64 bytes
64 bytes = one x86 cache line

A single BL16 burst fills exactly one CPU cache line. This is deliberate: the CPU always requests and evicts memory in cache-line units, so BL16 was chosen to match exactly, eliminating wasted partial transfers.

Burst Chop (BC8)

DDR5 supports halving the burst to 8 transfers (BC8):

64 bits per chip instead of 128 bits (32 bytes per sub-channel instead of 64)
Useful when interleaving read and write commands at fine granularity, or for access patterns that don’t align to 64-byte boundaries
Burst chop can be issued mid-burst to terminate early

Column-to-column timing (tCCD)

Consecutive burst commands must be spaced by at least tCCD to allow the burst buffer to reload and the data bus to settle:

Variant	When it applies	Typical DDR5
tCCD_S (short)	Two CAS commands to different bank groups	8–12 cycles
tCCD_L (long)	Two CAS commands to the same bank group	16–20 cycles

Different bank groups have independent I/O paths, allowing shorter tCCD between them. With 8 bank groups, up to 8 burst commands can be pipelined with tCCD_S gaps — substantially higher throughput than DDR4’s 4 bank groups.

Flexibility

The burst buffer does not force sequential access. If the next request targets:

A different 128-bit block in the same open row: the multiplexer loads a new block → new burst begins immediately
A different row (row miss): full PRE + ACT + burst sequence

Sources

Branch Education — How Does Computer Memory Work?
Wikipedia — DDR5 SDRAM
KAD8 — DDR Memory Fundamentals: Architecture, Prefetch, and Addressing
Micron — DDR5 SDRAM New Features White Paper

notes/

DRAM Burst Buffer and Burst Length

The prefetch architecture

The burst buffer

Column address split (10 bits → 6 + 4)

How a burst read works

Bandwidth improvement

Cache line alignment

Burst Chop (BC8)

Column-to-column timing (tCCD)

Flexibility

See also

Sources

DRAM Burst Buffer and Burst Length

The prefetch architecture

The burst buffer

Column address split (10 bits → 6 + 4)

How a burst read works

Bandwidth improvement

Cache line alignment

Burst Chop (BC8)

Column-to-column timing (tCCD)

Flexibility

See also

Sources

Graph View

Backlinks

Explorer