Benchmarking

Performance benchmarks for measuring SIMD optimization effectiveness and overall DSP performance.

Overview

The bbx_dsp crate includes Criterion benchmarks for:

  • Block micro-benchmarks - Individual SIMD-optimized blocks in isolation
  • Graph integration benchmarks - Realistic DSP graph configurations

Benchmarks support comparing scalar vs SIMD performance by running with and without the simd feature flag.

Available Benchmark Suites

simd_blocks

Micro-benchmarks for individual blocks:

BlockWhat's measuredVariations
OscillatorBlockWaveform generationsine, sawtooth, square, triangle
PannerBlockPan law + gain application-
GainBlockSIMD gain application-
LfoBlockModulation signal generationsine

Each block is benchmarked with:

  • Sample types: f32, f64
  • Buffer sizes: 256, 512, 1024

simd_graphs

Integration benchmarks for realistic DSP configurations:

GraphBlocksPurpose
simple_chainOscillatorSingle-block baseline
effect_chainOscillator → OverdriveSignal chain overhead
modulated_synthOscillator + LFOModulation path
multi_osc4 OscillatorsMultiple generator load

Running Benchmarks

Basic Commands

# Run all benchmarks (scalar mode)
cargo bench -p bbx_dsp

# Run all benchmarks (SIMD mode, requires nightly)
cargo +nightly bench -p bbx_dsp --features simd

# Run specific benchmark suite
cargo bench -p bbx_dsp --bench simd_blocks
cargo bench -p bbx_dsp --bench simd_graphs

# Run specific benchmark by name filter
cargo bench -p bbx_dsp -- oscillator
cargo bench -p bbx_dsp -- "graph_simple"

Comparing SIMD vs Scalar Performance

The recommended workflow for comparing performance:

# 1. Run scalar benchmarks and save as baseline
cargo bench --benches -p bbx_dsp -- --save-baseline scalar

# 2. Run SIMD benchmarks and compare against baseline
cargo +nightly bench --benches -p bbx_dsp --features simd -- --save-baseline scalar

This produces output showing the performance change:

oscillator_f32/sine/512
                        time:   [961.30 ns 962.33 ns 964.71 ns]
                        thrpt:  [530.73 Melem/s 532.04 Melem/s 532.61 Melem/s]
                 change:
                        time:   [-55.337% -53.509% -52.405%] (p = 0.00 < 0.05)
                        thrpt:  [+110.11% +115.10% +123.90%]
                        Performance has improved.

Understanding Results

Output Format

Criterion reports three values:

  • Lower bound - Conservative estimate
  • Estimate - Most likely value
  • Upper bound - Optimistic estimate

Throughput

Benchmarks report throughput in Melem/s (million elements per second), representing samples processed per second.

HTML Reports

Criterion generates detailed HTML reports in target/criterion/. Open target/criterion/report/index.html to view:

  • Time distribution histograms
  • Regression analysis
  • Comparison charts between runs

Benchmark Naming Convention

Benchmarks follow the pattern:

{category}_{sample_type}/{variant}/{buffer_size}

Examples:

  • oscillator_f32/sine/512 - f32 sine oscillator, 512 samples
  • panner_f64/1024 - f64 panner, 1024 samples
  • graph_simple_chain_f32/512 - Simple graph, f32, 512 samples

Use these names to filter benchmarks:

# All f32 benchmarks
cargo bench -p bbx_dsp -- f32

# All 512-sample benchmarks
cargo bench -p bbx_dsp -- /512

# All oscillator benchmarks
cargo bench -p bbx_dsp -- oscillator

Adding New Benchmarks

Block Benchmarks

Add to bbx_dsp/benches/simd_blocks.rs:

#![allow(unused)]
fn main() {
fn bench_my_block<S: Sample>(c: &mut Criterion, type_name: &str) {
    let mut group = c.benchmark_group(format!("my_block_{}", type_name));

    for buffer_size in BUFFER_SIZES {
        group.throughput(Throughput::Elements(*buffer_size as u64));

        let bench_id = BenchmarkId::from_parameter(buffer_size);

        group.bench_with_input(bench_id, buffer_size, |b, &size| {
            let context = create_context(size);
            let mut block = MyBlock::<S>::new(/* params */);
            let inputs = create_input_buffers::<S>(size, 1);
            let mut outputs = create_output_buffers::<S>(size, 1);
            let modulation_values: Vec<S> = vec![];

            b.iter(|| {
                let input_slices = as_input_slices(&inputs);
                let mut output_slices = as_output_slices(&mut outputs);
                block.process(
                    black_box(&input_slices),
                    black_box(&mut output_slices),
                    black_box(&modulation_values),
                    black_box(&context),
                );
            });
        });
    }

    group.finish();
}
}

Graph Benchmarks

Add to bbx_dsp/benches/simd_graphs.rs:

#![allow(unused)]
fn main() {
fn create_my_graph<S: Sample>(buffer_size: usize) -> Graph<S> {
    let mut builder = GraphBuilder::new(SAMPLE_RATE, buffer_size, NUM_CHANNELS);
    // Add blocks and connections
    builder.build()
}

fn bench_my_graph_f32(c: &mut Criterion) {
    bench_graph::<f32, _>(c, "f32", "my_graph", create_my_graph);
}
}

Tips

  • Warm cache: Criterion automatically warms up before measuring
  • Stable environment: Close other applications for consistent results
  • Multiple runs: Run benchmarks multiple times to verify consistency
  • Release mode: Benchmarks always run in release mode (--release is implicit)