SIMD Optimizations

SIMD (Single Instruction Multiple Data) support for accelerated DSP processing.

Enabling SIMD

Enable the simd feature flag in your Cargo.toml:

[dependencies]
bbx_dsp = { version = "...", features = ["simd"] }

Requirements:

  • Nightly Rust toolchain (uses the unstable portable_simd feature)
  • Build with: cargo +nightly build --features simd

How It Works

SIMD processes multiple samples simultaneously:

Scalar: a[0]*b[0], a[1]*b[1], a[2]*b[2], a[3]*b[3]  (4 operations)
SIMD:   a[0:3] * b[0:3]                              (1 operation)

The implementation uses 4-lane vectors (f32x4 and f64x4) from Rust's std::simd.

SIMD Operations

The bbx_core::simd module provides these vectorized operations:

FunctionDescription
fill_f32/f64Fill a buffer with a constant value
apply_gain_f32/f64Multiply samples by a gain factor
multiply_add_f32/f64Element-wise multiplication of two buffers
sin_f32/f64Vectorized sine computation

Additionally, the denormal module provides SIMD-accelerated batch denormal flushing:

  • flush_denormals_f32_batch
  • flush_denormals_f64_batch

Sample Trait SIMD Methods

The Sample trait includes built-in SIMD support when the simd feature is enabled. This allows writing generic SIMD code that works for both f32 and f64.

Associated Type

Each Sample implementation has an associated SIMD type:

Sample TypeSIMD Type
f32f32x4
f64f64x4

SIMD Methods

MethodDescription
simd_splat(value)Create a vector with all lanes set to value
simd_from_slice(slice)Load 4 samples from a slice
simd_to_array(simd)Convert a SIMD vector to [Self; 4]
simd_select_gt(a, b, if_true, if_false)Per-lane selection where a > b
simd_select_lt(a, b, if_true, if_false)Per-lane selection where a < b

Example: Generic SIMD Code

#![allow(unused)]
fn main() {
use bbx_core::sample::{Sample, SIMD_LANES};

fn apply_gain_simd<S: Sample>(output: &mut [S], gain: S) {
    let gain_vec = S::simd_splat(gain);
    let (chunks, remainder) = output.as_chunks_mut::<SIMD_LANES>();

    for chunk in chunks {
        let samples = S::simd_from_slice(chunk);
        let result = samples * gain_vec;
        chunk.copy_from_slice(&S::simd_to_array(result));
    }

    // Scalar fallback for remainder
    for sample in remainder {
        *sample = *sample * gain;
    }
}
}

This single implementation works for both f32 and f64 without code duplication.

Optimized Blocks

The following blocks use SIMD when the feature is enabled:

BlockOptimization
OscillatorBlockVectorized waveform generation (4 samples at a time)
LfoBlockVectorized modulation signal generation
GainBlockVectorized gain application
PannerBlockVectorized sin/cos gain calculation

Feature Propagation

The simd feature propagates through crate dependencies:

bbx_plugin --simd--> bbx_dsp --simd--> bbx_core

Enable simd on bbx_plugin for plugin builds:

[dependencies]
bbx_plugin = { version = "...", features = ["simd"] }

Trade-offs

AspectScalarSIMD
ComplexitySimpleMore complex
ToolchainStable RustNightly required
DebuggingEasyHarder
PerformanceBaselineUp to 4x faster

Implementation Notes

  • Lane width is 4 for both f32 and f64 (SSE/NEON compatible)
  • Remainder samples (when buffer size isn't divisible by 4) are processed with scalar fallback
  • Noise waveforms use scalar processing due to RNG sequentiality requirements