Cache Efficiency
Optimizing for CPU cache performance.
Cache Hierarchy
| Level | Size | Latency |
|---|---|---|
| L1 | 32-64 KB | ~4 cycles |
| L2 | 256-512 KB | ~12 cycles |
| L3 | 4-32 MB | ~40 cycles |
| RAM | GBs | ~200+ cycles |
Strategies
Contiguous Storage
Keep related data together:
#![allow(unused)] fn main() { // Good: Single contiguous allocation audio_buffers: Vec<AudioBuffer<S>> // Each buffer is contiguous samples: [S; BUFFER_SIZE] }
Sequential Access
Process in order:
#![allow(unused)] fn main() { // Good: Sequential iteration for sample in buffer.iter_mut() { *sample = process(*sample); } // Avoid: Random access for i in random_order { buffer[i] = process(buffer[i]); } }
Hot/Cold Separation
Separate frequently from rarely used data:
#![allow(unused)] fn main() { struct Block<S> { // Hot path (processing) state: S, coefficient: S, // Cold path (setup) name: String, // Rarely accessed } }
Avoid Pointer Chasing
Minimize indirection:
#![allow(unused)] fn main() { // Less ideal: Vec of trait objects blocks: Vec<Box<dyn Block>> // Better: Enum of concrete types blocks: Vec<BlockType<S>> }
Buffer Layout
Interleaved vs non-interleaved:
#![allow(unused)] fn main() { // Non-interleaved (better for processing) left: [L0, L1, L2, L3, ...] right: [R0, R1, R2, R3, ...] // Interleaved (worse for SIMD) data: [L0, R0, L1, R1, L2, R2, ...] }
bbx_audio uses non-interleaved buffers.