Rust ❤️ Bela – MIDI and Sound

Af­ter a brief in­ter­lude on 3D graph­ics in the brows­er, we're pick­ing our se­ries on us­ing Rust with Bela back up. So far we have dis­cussed set­ting up a Rust cross com­pi­la­tion project for Bela, the projects I plan to cov­er and bench­mark-based fea­si­bil­i­ty checks, and the im­proved safe Rust API. This time, we'll ac­tu­al­ly use those APIs to be­gin in­ter­pret­ing MI­DI sig­nals mak­ing some noise! Won't quite be mu­sic yet, but we're get­ting there.

En­sur­ing Re­al-Time Safe­ty

One thing we should prob­a­bly go in­to more de­tail first, is the unsafe-ty of our main trait:

pub unsafe trait BelaApplication: Sized + Send {
    fn render(&mut self, context: &mut RenderContext);
}

As dis­cussed last time, im­ple­ment­ing this trait is un­safe, as the ren­der­ing thread has high­er pri­or­i­ty than all op­er­at­ing sys­tem threads. There­fore, any and all sys­tem calls must be avoid­ed, in­clud­ing al­lo­ca­tion and even print­ing. Check­ing this by hand is te­dious and er­ror-prone, so what can we do?

En­ter the no_std crate at­tribute. no_std is an at­tribute that tells Rust to on­ly link libcore, a plat­form-ag­nos­tic sub­set of the std-crate, and is use­ful when writ­ing firmware, ker­nel, or boot­load­er code. Or re­al-time code that shouldn't per­form any sys­tem calls, since those calls aren't avail­able (at least not via std). For more de­tails, check out The Em­bed­ded Rust Book.

So can we just go ahead and add

#![no_std]

to the be­gin­ning of our lib.rs and/or main.rs? Well… yes and no. We can't re­al­ly add it to our main.rs, since ini­tial­iz­ing MI­DI re­quires a &std::ffi::CStr spec­i­fy­ing the port. And &std::ffi::CStr isn't part of libcore (dis­cus­sion if it should be is on­go­ing). Fur­ther­more, there is no re­al rea­son a Bela ap­pli­ca­tion would have to be com­plete­ly no_std. It is a full-fledged Lin­ux with RTOS ad­di­tions af­ter all. Adding no_std to our lib.rs would work a bit bet­ter, but there is an­oth­er is­sue: no_std doesn't af­fect de­pen­den­cies re­cur­sive­ly! So while this would en­sure our code is sys­tem-call-free, our de­pen­den­cies could per­form sys­tem calls be­hind our back. Ugh.

The sim­plest way to check for no_std-com­pat­i­bil­i­ty (I al­so tried cargo-nono, but ran in­to hobofan/cargo-nono#47) is to in­stall sup­port for a tar­get that doesn't have std-sup­port at all, such as thumbv6m-none-eabi and run­ning cargo check (or cargo build) for that tar­get:

rustup target add thumbv6m-none-eabi
cargo check --target thumbv6m-none-eabi

How­ev­er, since our li­brary and bi­na­ry crates live in the same pack­age, they share de­pen­den­cies — in­clud­ing the no_std-in­com­pat­i­ble bela-crate — and can't re­al­ly be checked sep­a­rate­ly. What we can do, is cre­ate a new pack­age in a sub-fold­er of our bela_3, by nav­i­gat­ing to our bela_3-fold­er and call­ing

cargo new --lib tonewheel-organ

mov­ing our im­ple­men­ta­tion (the Bela-in­de­pen­dent part) there, and then adding the new de­pen­den­cy by path in our Cargo.toml:

[dependencies]
# ...
tonewheel-organ = { path = "tonewheel-organ" }

Since we want to run checks and tests on our sub­pro­ject, let's al­so add the cor­re­spond­ing target-fold­er and Cargo.lock to our .gitignore:

/tonewheel-organ/target
/tonewheel-organ/Cargo.lock

Now we can run cargo check --target thumbv6m-none-eabi with­in the sub­fold­er tonewheel-organ to check for no_std-com­pat­i­bil­i­ty.

One ad­di­tion­al thing that gets lost though, is most f32/f64 math sup­port. While many of those func­tions are ex­pen­sive, such as f32::sin for which we cre­at­ed a fast ap­prox­i­ma­tion, they are of­ten very use­ful while de­vel­op­ing. Sad­ly, there is no built-in way to en­able just math sup­port. How­ev­er, there are crates that im­ple­ment the miss­ing func­tions. One of them is the of­fi­cial libm crate which has a dif­fer­ent API un­less used via num_trait's Float trait. An­oth­er is the small­er, less pre­cise, but (al­leged­ly) faster micromath and its F32Ext trait. So let's add it to our tonewheel-organ/Cargo.toml. And in­tro­duce a std fea­ture, so we can ex­plic­it­ly build with stan­dard li­brary math (and CPU in­trin­sics if avail­able!) sup­port when not ac­tu­al­ly tar­get­ing a no_std sys­tem:

[dependencies]
# ...
micromath = "2"

[features]
std = []

Ad­di­tion­al­ly, we have to con­di­tion­al­ly en­able no_std and use micromath::F32Ext in our tonewheel-organ/lib.rs (the lat­ter may al­so be nec­es­sary in sub­mod­ules):

#![cfg_attr(not(feature = "std"), no_std)]

#[cfg(not(feature = "std"))]
use micromath::F32Ext;

Now we can use math func­tions as usu­al and don't have to wor­ry as much about ac­ci­den­tal­ly per­form­ing any sys­tem calls (ma­li­cious code in de­pen­den­cies is an­oth­er is­sue al­to­geth­er). To do so in our ap­pli­ca­tion, we need to spec­i­fy that we want to use the std fea­ture we in­tro­duced in the main Cargo.toml:

[dependencies]
# ...
tonewheel-organ = { path = "tonewheel-organ", features = ["std"] }

There are oth­er more lo­cal ap­proach­es that don't re­quire sep­a­rat­ing our code in­to two pack­ages to solv­ing parts of the is­sue, such as rust-assert-no-alloc, but sep­a­rat­ing the plat­form-spe­cif­ic parts from the ac­tu­al pro­cess­ing is a good idea any­way. We might want to turn this in­to a VST plug­in us­ing vst-rs as well at some point, for ex­am­ple.

To get the same gen­er­at­ed code qual­i­ty as be­fore, I found it nec­es­sary to en­able “fat” LTO (link-time op­ti­miza­tion) in my main Cargo.toml:

[profile.release]
lto = true # equivalent to "fat"

More ex­pen­sive than the "thin" op­tion at com­pile time, but some op­ti­miza­tions, such as in­lin­ing of tone gen­er­a­tion in­to render, didn't hap­pen with the cheap­er op­tion, even though render is the on­ly call­er.

MI­DI Pars­ing

Now that we have tak­en care of a ma­jor­i­ty of the ways our sig­nal pro­cess­ing code can po­ten­tial­ly be­come re­al-time un­safe, let's fi­nal­ly start pro­cess­ing some MI­DI and mak­ing some sound! The first thing we'll have to do, is open a MI­DI con­nec­tion. Since this is done via the Bela API, we have to do this in our bi­na­ry crate. Specif­i­cal­ly, in the con­struc­tor func­tion passed to Bela::new, as it re­quires a SetupContext. This looks some­thing like this:

struct Bela3 {
    midi: Midi,
    organ: TonewheelOrgan, // defined in no_std crate
}

impl Bela3 {
    fn new(context: &mut SetupContext) -> Option<Bela3> {
        // create Midi connection; ok to convert Result to Option
        let midi = context.new_midi(cstr!("hw:0,0,0")).ok()?;
        // create no_std audio processor, passing Bela's sample rate
        let organ = TonewheelOrgan::new(context.audio_sample_rate());
        Some(Bela3 { midi, organ })
    }
}

We will im­ple­ment TonewheelOrgan lat­er on in our no_std-crate, so let's con­tin­ue with our MI­DI set­up first.

The cstr!-macro from the cstr-crate, al­lows us to eas­i­ly cre­ate a stat­ic, null-ter­mi­nat­ed C-string lit­er­al. Where does that val­ue "hw:0,0,0" come from? That is the AL­SA port name of our MI­DI de­vice — yes, for MI­DI the Bela us­es plain Lin­ux AL­SA, no spe­cial RTOS sup­port there. In this case, the vir­tu­al de­vice, as I didn't con­nect a re­al MI­DI de­vice to the Bela's USB host port yet. If you are us­ing a re­al key­board, it should prob­a­bly be "hw:1,0,0", but check the out­put of amidi -l on the Bela con­sole to make sure. In the­o­ry, we could prob­a­bly au­to­mate this by enu­mer­at­ing valid ports us­ing the alsa-crate, but we'll skip that for now.

In the render-func­tion, we can then re­ceive MI­DI mes­sages, which are one- to three-byte mes­sages. We will have to parse these mes­sages to iden­ti­fy what they mean — and we'll get some ex­tra help to do so — but for now, we just want to for­ward them to our no_std pro­ces­sor:

unsafe impl BelaApplication for Bela3 {
    fn render(&mut self, context: &mut RenderContext) {
        let Bela3 { midi, organ } = self;

        // forward incoming midi messages to no_std subcrate
        let mut buffer = [0u8; 3];
        while let Some(msg) =
            context.get_midi_message(midi, &mut buffer)
        {
            organ.process_midi_message(msg);
        }

get_midi_message re­trieves a sin­gle mes­sage from the midi ob­ject, us­ing an ex­ter­nal­ly pro­vid­ed three-byte buf­fer — note to self, maybe that buf­fer should be a [MaybeUninit<u8>; 3]. If no mes­sages are avail­able, None is re­turned. So to process all avail­able MI­DI mes­sages, we loop un­til it re­turns None by us­ing a let-match on Some(msg). In­side the loop, we just pass the re­sult­ing [u8] slice to our yet-to-be-de­fined pro­ces­sor. So let's get start­ed, us­ing a one-to-one map­ping be­tween tone wheels and MI­DI notes for now. For MI­DI pars­ing we'll be us­ing the wmidi-crate. Since we want to keep the in­ner crate no_std-com­pat­i­ble, we have to make sure to dis­able wmidi's de­fault fea­tures when adding it to our tonewheel-organ/Cargo.toml:

[dependencies]
# ...
wmidi = { version = "4", default-features = false }

Then the MI­DI han­dling in tonewheel-organ/lib.rs be­comes fair­ly sim­ple:

pub struct TonewheelOrgan {
    phasors: [u32x4; ROUNDED_TONE_WHEEL_CHUNKS],
    phasor_steps: [u32x4; ROUNDED_TONE_WHEEL_CHUNKS],
    active_notes: [u8x4; ROUNDED_TONE_WHEEL_CHUNKS],
}

impl TonewheelOrgan {
    pub fn process_midi_message(&mut self, msg: &[u8]) {
        // we only care about active_notes here
        let TonewheelOrgan { active_notes, .. } = self;
        // cast u8x4 array to u8 array (can we do this without unsafe?)
        let active_notes_scalar: &mut [u8; 4
                 * ROUNDED_TONE_WHEEL_CHUNKS] =
            unsafe { transmute::<&mut _, _>(active_notes) };
        // use wmidi to parse msg
        match MidiMessage::try_from(msg) {
            Ok(MidiMessage::NoteOn(_channel, note, _velocity)) => {
                let note = note as u8;
                // 24 = C1 to 107 = B7 map directly to tonewheels
                if (24..108).contains(&note) {
                    active_notes_scalar[(note - 24) as usize] = !0;
                // skip dummy tonewheels for 108 = C8 to 114 = F#8
                } else if (108..115).contains(&note) {
                    active_notes_scalar[(note - 19) as usize] = !0;
                }
            }
            Ok(MidiMessage::NoteOff(_channel, note, _velocity)) => {
                let note = note as u8;
                // 24 = C1 to 107 = B7 map directly to tonewheels
                if (24..108).contains(&note) {
                    active_notes_scalar[(note - 24) as usize] = 0;
                // skip dummy tonewheels for 108 = C8 to 114 = F#8
                } else if (108..115).contains(&note) {
                    active_notes_scalar[(note - 19) as usize] = 0;
                }
            }
            Ok(MidiMessage::Reset) => {
                // deactivate all notes on reset
                *active_notes_scalar =
                    [0; 4 * ROUNDED_TONE_WHEEL_CHUNKS];
            }
            _ => {}
        }
    }
}

So on NoteOn, we set the cor­re­spond­ing en­try of active_notes_scalar, which is just a scalar view in­to the vec­tor­ized active_notes, to !0, which just means that all bits are set. On NoteOff we set the en­try to ze­ro in­stead. Fi­nal­ly, on Reset, i.e., MI­DI's pan­ic but­ton, we de­ac­ti­vate all ac­tive notes. For now, we'll com­plete­ly ig­nore all oth­er mes­sages as well as the channel and velocity parts of the NoteOn and NoteOff mes­sages. There al­so ap­pears to be some weird remap­ping be­tween MI­DI notes and active_notes go­ing on here. That is re­lat­ed how to we gen­er­ate sounds, and how our ref­er­ence tonewheel or­gan is ac­tu­al­ly con­struct­ed.

Sim­u­lat­ing the Tonewheels

The fol­low­ing in­for­ma­tion is based on an old Elec­tric Druid ar­ti­cle. The pitch­es of a tonewheel or­gan's wheels don't match the fre­quen­cies of an equal tem­per­a­ment scale ex­act­ly, since the ir­ra­tional 21121.059462^\frac{1}{12} \approx 1.05946\ldots has to be ap­prox­i­mat­ed us­ing a small set of gears, and gear ra­tios can on­ly be ra­tio­nal. While the true har­mon­ic ra­tios are ra­tio­nal, hav­ing nine sets of gears and tone wheels for each note would have been pro­hib­i­tive. Plus, that wouldn't solve the ir­ra­tional ra­tios be­tween fun­da­men­tals. So with­in an oc­tave, the fol­low­ing gear ra­tios are used that ap­prox­i­mate an equal tem­per­a­ment scale:

NoteDriv­ing (A)Driv­en (B)Ra­tio (A/B)
C851040.817307692
C♯71820.865853659
D67730.917808219
D♯1051080.972222222
E1031001.030000000
F84771.090909091
F♯74641.156250000
G98801.225000000
G♯96741.297297297
A88641.375000000
A♯67461.456521739
B108701.542857143

The oc­taves are then gen­er­at­ed by vary­ing the num­ber of teeth of the tonewheels:

Oc­taveTonewheelsTeeth
1.122
2.124
3.128
4.1216
5.1232
6.1264
7.12128
8.7192

Wait a minute. There is some­thing wrong with that list. The first sev­en oc­taves have pow­ers of two for the num­ber of teeth — which makes sense, since oc­taves are a fac­tor of two apart — but the eighth oc­tave is on­ly a fac­tor of 1.5 — which cor­re­sponds to a fifth, not an oc­tave — away from the sev­enth oc­tave.

Well, the rea­son is sim­ple: re­li­ably man­u­fac­tur­ing tonewheels with 256 teeth, pre­cise­ly shaped to cre­ate a near-si­nu­soidal tone wasn't vi­able at the time. So in­stead, 192-tooth wheels were used, and cor­rect­ed by shift­ing them by a quar­ter (com­ple­ment­ing the fifth to get an oc­tave). So the C8 wheel was ac­tu­al­ly con­nect­ed to the F-gear and so on up to the F♯8 wheel, which was con­nect­ed to the B-gear.

But why are we sim­u­lat­ing the miss­ing five wheels? The tonewheels were paired in “bins,” four oc­taves apart. While the bins were fair­ly well elec­tro­mag­net­i­cal­ly sep­a­rat­ed, there was crosstalk with­in each bin. Since we want to sim­u­late ef­fects like that and want to avoid ex­cess con­di­tion­als, es­pe­cial­ly if they strad­dle SIMD vec­tor size bound­aries as is the case here, it just makes sense to sim­u­late the un­used wheels, even if we don't use their out­put. Luck­i­ly, each oc­tave has 12 notes, which is neat­ly di­vis­i­ble by four (our vec­tor width). And as a mat­ter of fact, the re­al or­gan used dum­my wheels as well to get sim­i­lar me­chan­i­cal loads.

Let's have a look at the tonewheel sig­nal gen­er­a­tion code (part of impl TonewheelOrgan):

    fn generate_base_signals(
        &mut self,
        signals: &mut [MaybeUninit<f32x4>; ROUNDED_TONE_WHEEL_CHUNKS],
    ) {
        let Self {
            phasors,
            phasor_steps,
            ..
        } = self;
        let mid = ROUNDED_TONE_WHEEL_CHUNKS / 2;
        let (phasors_a, phasors_b) = phasors.split_at_mut(mid);
        let (phasors_steps_a, phasor_steps_b) =
            phasor_steps.split_at(mid);
        let (signals_a, signals_b) = signals.split_at_mut(mid);

        for (
            index,
            (
                phasors_a,
                phasor_steps_a,
                signals_a,
                phasors_b,
                phasor_steps_b,
                signals_b,
            ),
        ) in itertools::izip!(
            phasors_a.iter_mut(),
            phasors_steps_a.iter(),
            signals_a.iter_mut(),
            phasors_b.iter_mut(),
            phasor_steps_b.iter(),
            signals_b.iter_mut()
        )
        .enumerate()
        {
            let a = {
                let signals = sin_quadrant_cubic_x4(*phasors_a);
                *phasors_a += *phasor_steps_a;
                // simple "complex tone wheel" simulation (bottom
                // octave is notched for additional harmonics)
                if index < CHUNKS_PER_OCTAVE {
                    let s1 = signals;
                    let s2 = s1 * s1;
                    let s4 = s2 * s2;
                    let s6 = s4 * s2;
                    let s7 = s6 * s1;
                    s7 - 0.25 * s6 + 0.0765625
                } else {
                    signals
                }
            };

            let b = sin_quadrant_cubic_x4(*phasors_b);
            *phasors_b += *phasor_steps_b;

            signals_a.write(a + CROSSTALK_GAIN * b);
            signals_b.write(b + CROSSTALK_GAIN * a);
        }
    }

At the be­gin­ning, we split ev­ery­thing in­to a bot­tom and top half, i.e., the low­er four oc­taves and the up­per four oc­taves. Then, we it­er­ate over both in lock­step us­ing itertools' izip! macro, which cre­ates an it­er­a­tor over a large tu­ple in­stead of the nest­ed pairs you'd get when chain­ing reg­u­lar zip func­tions. Then two sets of four tonewheel sig­nals a and b are gen­er­at­ed us­ing our ap­prox­i­mate sin_quadrant_cubic_x4, and then writ­ten out to signals (via signals_a and signals_b) with a small amount of crosstalk ap­plied (CROSSTALK_GAIN is cur­rent­ly a guessti­mat­ed 1e-3, i.e., 20log10103=60dBFS20 \log_{10} 10^{-3} = -60\,\mathrm{dBFS} — just bare­ly no­tice­able).

But wait, there's some­thing odd hap­pen­ing in the a-branch. The top­most oc­tave isn't the on­ly un­usu­al one. The first oc­tave was on­ly used on the foot ped­als and used sharp-edged “com­plex” tonewheels that cre­ate a non-si­nu­soidal tone. Since I couldn't find de­tailed in­fo on the wave­form shape, I just went with a sim­ple poly­no­mi­al dis­tor­tion term

sin7x0.25sin6x+0.0765625.\sin^7 x - 0.25 \sin^6 x + 0.0765625.

I did find a pho­to of a com­plex tonewheel's wave­form here, along with a bunch of de­tailed shots of the tone wheels and gears, but de­cid­ed not to try mod­el­ing it too close­ly, since it is a slight­ly dif­fer­ent mod­el, no in­fo about the ped­al mute set­ting is pro­vid­ed etc.

The con­di­tion­al here should be tak­en care of by the op­ti­miz­er by split­ting the loop (it was for me). An­oth­er neat fea­ture of us­ing a poly­no­mi­al shaper is that we know ex­act­ly what the high­est gen­er­at­ed fre­quen­cy will be (base fre­quen­cy times high­est ex­po­nent), and that it is low enough not to get any alias­ing. The con­stant term is just to avoid a DC off­set.

Now we just need to com­pute the phasor_steps from our gear ra­tios and con­nect ev­ery­thing. For the steps, I used

const MOTOR_RPM_RUN: f32 = 20.0;
// `const`s are inlined while `static`s have a fixed memory location
static FREQUENCY_MULTIPLIERS: [f32x4; ROUNDED_TONE_WHEEL_CHUNKS] = [
    // most octaves use the base gearings and an exponentially
    // increasing number of teeth
    // octave 1 -> 2 teeth
    f32x4::new(
        85.0 / 104.0 * 2.0,
        71.0 / 82.0 * 2.0,
        67.0 / 73.0 * 2.0,
        105.0 / 108.0 * 2.0,
    ),
    f32x4::new(
        103.0 / 100.0 * 2.0,
        84.0 / 77.0 * 2.0,
        74.0 / 64.0 * 2.0,
        98.0 / 80.0 * 2.0,
    ),
    f32x4::new(
        96.0 / 74.0 * 2.0,
        88.0 / 64.0 * 2.0,
        67.0 / 46.0 * 2.0,
        108.0 / 70.0 * 2.0,
    ),
    // octave 2 -> 4 teeth
    f32x4::new(
        85.0 / 104.0 * 4.0,
        71.0 / 82.0 * 4.0,
        67.0 / 73.0 * 4.0,
        105.0 / 108.0 * 4.0,
    ),
    f32x4::new(
        103.0 / 100.0 * 4.0,
        84.0 / 77.0 * 4.0,
        74.0 / 64.0 * 4.0,
        98.0 / 80.0 * 4.0,
    ),
    f32x4::new(
        96.0 / 74.0 * 4.0,
        88.0 / 64.0 * 4.0,
        67.0 / 46.0 * 4.0,
        108.0 / 70.0 * 4.0,
    ),
    // octave 3 -> 8 teeth
    f32x4::new(
        85.0 / 104.0 * 8.0,
        71.0 / 82.0 * 8.0,
        67.0 / 73.0 * 8.0,
        105.0 / 108.0 * 8.0,
    ),
    f32x4::new(
        103.0 / 100.0 * 8.0,
        84.0 / 77.0 * 8.0,
        74.0 / 64.0 * 8.0,
        98.0 / 80.0 * 8.0,
    ),
    f32x4::new(
        96.0 / 74.0 * 8.0,
        88.0 / 64.0 * 8.0,
        67.0 / 46.0 * 8.0,
        108.0 / 70.0 * 8.0,
    ),
    // octave 4 -> 16 teeth
    f32x4::new(
        85.0 / 104.0 * 16.0,
        71.0 / 82.0 * 16.0,
        67.0 / 73.0 * 16.0,
        105.0 / 108.0 * 16.0,
    ),
    f32x4::new(
        103.0 / 100.0 * 16.0,
        84.0 / 77.0 * 16.0,
        74.0 / 64.0 * 16.0,
        98.0 / 80.0 * 16.0,
    ),
    f32x4::new(
        96.0 / 74.0 * 16.0,
        88.0 / 64.0 * 16.0,
        67.0 / 46.0 * 16.0,
        108.0 / 70.0 * 16.0,
    ),
    // octave 5 -> 32 teeth
    f32x4::new(
        85.0 / 104.0 * 32.0,
        71.0 / 82.0 * 32.0,
        67.0 / 73.0 * 32.0,
        105.0 / 108.0 * 32.0,
    ),
    f32x4::new(
        103.0 / 100.0 * 32.0,
        84.0 / 77.0 * 32.0,
        74.0 / 64.0 * 32.0,
        98.0 / 80.0 * 32.0,
    ),
    f32x4::new(
        96.0 / 74.0 * 32.0,
        88.0 / 64.0 * 32.0,
        67.0 / 46.0 * 32.0,
        108.0 / 70.0 * 32.0,
    ),
    // octave 6 -> 64 teeth
    f32x4::new(
        85.0 / 104.0 * 64.0,
        71.0 / 82.0 * 64.0,
        67.0 / 73.0 * 64.0,
        105.0 / 108.0 * 64.0,
    ),
    f32x4::new(
        103.0 / 100.0 * 64.0,
        84.0 / 77.0 * 64.0,
        74.0 / 64.0 * 64.0,
        98.0 / 80.0 * 64.0,
    ),
    f32x4::new(
        96.0 / 74.0 * 64.0,
        88.0 / 64.0 * 64.0,
        67.0 / 46.0 * 64.0,
        108.0 / 70.0 * 64.0,
    ),
    // octave 7 -> 128 teeth
    f32x4::new(
        85.0 / 104.0 * 128.0,
        71.0 / 82.0 * 128.0,
        67.0 / 73.0 * 128.0,
        105.0 / 108.0 * 128.0,
    ),
    f32x4::new(
        103.0 / 100.0 * 128.0,
        84.0 / 77.0 * 128.0,
        74.0 / 64.0 * 128.0,
        98.0 / 80.0 * 128.0,
    ),
    f32x4::new(
        96.0 / 74.0 * 128.0,
        88.0 / 64.0 * 128.0,
        67.0 / 46.0 * 128.0,
        108.0 / 70.0 * 128.0,
    ),
    // the final octave only has 192-tooth tone wheels and 5 dummy
    // wheels
    // octave 8 -> 192(!) teeth
    f32x4::splat(0.0), // 4 dummy wheels
    f32x4::new(
        0.0, // final 5th dummy wheel
        84.0 / 77.0 * 192.0,
        74.0 / 64.0 * 192.0,
        98.0 / 80.0 * 192.0,
    ),
    f32x4::new(
        96.0 / 74.0 * 192.0,
        88.0 / 64.0 * 192.0,
        67.0 / 46.0 * 192.0,
        108.0 / 70.0 * 192.0,
    ),
];

#[derive(Clone, Copy)]
#[repr(transparent)]
struct FrequencyPhasorConversionFactor(f32);

impl FrequencyPhasorConversionFactor {
    fn new(sample_rate: f32) -> FrequencyPhasorConversionFactor {
        FrequencyPhasorConversionFactor(
            (1u64 << u32::BITS) as f32 / sample_rate,
        )
    }

    fn to_step_x4(self, frequencies: f32x4) -> u32x4 {
        u32x4::from_cast(f32x4::splat(self.0) * frequencies)
    }
}

impl TonewheelOrgan {
    /* ... */

    pub fn new(sample_rate: f32) -> Self {
        let frequency_multipliers = &FREQUENCY_MULTIPLIERS;
        let phasors = [u32x4::splat(0); ROUNDED_TONE_WHEEL_CHUNKS];
        let conversion =
            FrequencyPhasorConversionFactor::new(sample_rate);
        let mut phasor_steps =
            [u32x4::splat(0); ROUNDED_TONE_WHEEL_CHUNKS];
        for chunk in 0..ROUNDED_TONE_WHEEL_CHUNKS {
            let frequencies =
                MOTOR_RPM_RUN * frequency_multipliers[chunk];
            phasor_steps[chunk] = conversion.to_step_x4(frequencies);
        }
        let active_notes = [u8x4::splat(0); ROUNDED_TONE_WHEEL_CHUNKS];
        TonewheelOrgan {
            phasors,
            phasor_steps,
            active_notes,
        }
    }

Sor­ry for the wall of code, but most of it is just a bunch of con­stants tak­en from the ta­bles above, mul­ti­plied by the mo­tor's ro­ta­tion fre­quen­cy of 20 Hz. In the fu­ture, we'll want to vary the mo­tor RPM as well. But let's see if we even have the com­pute pow­er left over. Af­ter all the con­stants, I de­fined a helper to per­form float­ing-point fre­quen­cy to fixed-point pha­sor step con­ver­sion. Then new is just a mat­ter of mul­ti­ply­ing it all to­geth­er, and ini­tial­iz­ing the rest to ze­ro.

Now we just need to sum up the ac­tive sig­nals, and pass the re­sult on to the Bela. With­in impl TonewheelOrgan, we add

    pub fn render_sample(&mut self) -> f32 {
        let mut signals = MaybeUninit::uninit_array();
        self.generate_base_signals(&mut signals);
        let signals =
            unsafe { MaybeUninit::array_assume_init(signals) };

        // signal summation
        let mut signal_x4 = f32x4::splat(0.0);
        for chunk in 0..ROUNDED_TONE_WHEEL_CHUNKS {
            signal_x4 += f32x4::from_bits(
                i32x4::from_cast(i8x4::from_cast(
                    self.active_notes[chunk],
                )) & i32x4::from_bits(signals[chunk]),
            );
        }
        let signal = 0.05 * signal_x4.sum();

        // soft clipping
        let signal = signal.max(-1.0).min(1.0);
        let signal = 1.5 * signal - 0.5 * signal.powi(3);

        signal
    }

which sim­ply calls the pre­vi­ous­ly de­fined generate_base_signals, de­ter­mines ac­tive sig­nals by ap­ply­ing active_notes as a bit-mask, and sums the sig­nals. Sum­ma­tion is done in two steps for per­for­mance: first four sep­a­rate sums are com­put­ed for each SIMD lane, then the fi­nal hor­i­zon­tal sum is com­put­ed on­ly once (and scaled, be­cause we don't want to add mul­ti­ple full-scale sig­nals). The soft clip­ping code at the end is just to pre­vent it from sound­ing too bad if our to­tal sig­nal turns out a bit hot any­way.

The re­main­der of our render func­tion (see above) is now just a mat­ter of copy­ing the out­put to all of the Bela's out­puts:

        let audio_out_channels = context.audio_out_channels();
        for frame in
            context.audio_out().chunks_exact_mut(audio_out_channels)
        {
            // generate sample using no_std subcrate
            let signal = organ.render_sample();

            // write generated sample to all audio outputs
            for sample in frame {
                *sample = signal;
            }
        }
    }
}

And that's pret­ty much it for this time! As usu­al, feel free to fol­low me and send me a DM on Mastodon if you have any ques­tions or com­ments. I'm al­so ac­tive on the Bela fo­rum.

But I don't want to leave you hang­ing, won­der­ing what this very, very ba­sic ver­sion of our or­gan sounds like so here goes:

Sor­ry for the medi­ocre qual­i­ty, I couldn't find any ⅛" jack to XLR adapters (or ⅛" to ¼" jack adapters ei­ther), so I had to record this with my main­board's built-in sound card, which buzzes like crazy. First I play a scale on the com­plex tone wheels, then an arpeg­gio for ev­ery oc­tave. Some of them even sound kin­da or­gan-ish. Due to the free-run­ning os­cil­la­tors it al­so clicks a lot. But that's fine and to be ex­pect­ed. The re­al thing al­so had to ap­ply some tricks to con­trol click­ing.