Rust ❤️ Bela – MIDI and Sound
Posted
After a brief interlude on 3D graphics in the browser, we're picking our series on using Rust with Bela back up. So far we have discussed setting up a Rust cross compilation project for Bela, the projects I plan to cover and benchmark-based feasibility checks, and the improved safe Rust API. This time, we'll actually use those APIs to begin interpreting MIDI signals making some noise! Won't quite be music yet, but we're getting there.
Ensuring Real-Time Safety
One thing we should probably go into more detail first, is the unsafe
-ty of our main trait:
pub unsafe trait BelaApplication: Sized + Send {
fn render(&mut self, context: &mut RenderContext);
}
As discussed last time, implementing this trait is unsafe, as the rendering thread has higher priority than all operating system threads. Therefore, any and all system calls must be avoided, including allocation and even printing. Checking this by hand is tedious and error-prone, so what can we do?
Enter the no_std
crate attribute. no_std
is an attribute that tells Rust to only link libcore
, a platform-agnostic subset of the std
-crate, and is useful when writing firmware, kernel, or bootloader code. Or real-time code that shouldn't perform any system calls, since those calls aren't available (at least not via std
). For more details, check out The Embedded Rust Book.
So can we just go ahead and add
#![no_std]
to the beginning of our lib.rs
and/or main.rs
? Well… yes and no. We can't really add it to our main.rs
, since initializing MIDI requires a &std::ffi::CStr
specifying the port. And &std::ffi::CStr
isn't part of libcore
(discussion if it should be is ongoing). Furthermore, there is no real reason a Bela application would have to be completely no_std
. It is a full-fledged Linux with RTOS additions after all. Adding no_std
to our lib.rs
would work a bit better, but there is another issue: no_std
doesn't affect dependencies recursively! So while this would ensure our code is system-call-free, our dependencies could perform system calls behind our back. Ugh.
The simplest way to check for no_std
-compatibility (I also tried cargo-nono
, but ran into hobofan/cargo-nono#47
) is to install support for a target that doesn't have std
-support at all, such as thumbv6m-none-eabi
and running cargo check
(or cargo build
) for that target:
rustup target add thumbv6m-none-eabi
cargo check --target thumbv6m-none-eabi
However, since our library and binary crates live in the same package, they share dependencies — including the no_std
-incompatible bela
-crate — and can't really be checked separately. What we can do, is create a new package in a sub-folder of our bela_3
, by navigating to our bela_3
-folder and calling
cargo new --lib tonewheel-organ
moving our implementation (the Bela-independent part) there, and then adding the new dependency by path in our Cargo.toml
:
[dependencies]
# ...
tonewheel-organ = { path = "tonewheel-organ" }
Since we want to run checks and tests on our subproject, let's also add the corresponding target
-folder and Cargo.lock
to our .gitignore
:
/tonewheel-organ/target
/tonewheel-organ/Cargo.lock
Now we can run cargo check --target thumbv6m-none-eabi
within the subfolder tonewheel-organ
to check for no_std
-compatibility.
One additional thing that gets lost though, is most f32
/f64
math support. While many of those functions are expensive, such as f32::sin
for which we created a fast approximation, they are often very useful while developing. Sadly, there is no built-in way to enable just math support. However, there are crates that implement the missing functions. One of them is the official libm
crate which has a different API unless used via num_trait
's Float
trait. Another is the smaller, less precise, but (allegedly) faster micromath
and its F32Ext
trait. So let's add it to our tonewheel-organ/Cargo.toml
. And introduce a std
feature, so we can explicitly build with standard library math (and CPU intrinsics if available!) support when not actually targeting a no_std
system:
[dependencies]
# ...
micromath = "2"
[features]
std = []
Additionally, we have to conditionally enable no_std
and use micromath::F32Ext
in our tonewheel-organ/lib.rs
(the latter may also be necessary in submodules):
#![cfg_attr(not(feature = "std"), no_std)]
#[cfg(not(feature = "std"))]
use micromath::F32Ext;
Now we can use math functions as usual and don't have to worry as much about accidentally performing any system calls (malicious code in dependencies is another issue altogether). To do so in our application, we need to specify that we want to use the std
feature we introduced in the main Cargo.toml
:
[dependencies]
# ...
tonewheel-organ = { path = "tonewheel-organ", features = ["std"] }
There are other more local approaches that don't require separating our code into two packages to solving parts of the issue, such as rust-assert-no-alloc
, but separating the platform-specific parts from the actual processing is a good idea anyway. We might want to turn this into a VST plugin using vst-rs
as well at some point, for example.
To get the same generated code quality as before, I found it necessary to enable “fat” LTO (link-time optimization) in my main Cargo.toml
:
[profile.release]
lto = true # equivalent to "fat"
More expensive than the "thin"
option at compile time, but some optimizations, such as inlining of tone generation into render
, didn't happen with the cheaper option, even though render
is the only caller.
MIDI Parsing
Now that we have taken care of a majority of the ways our signal processing code can potentially become real-time unsafe, let's finally start processing some MIDI and making some sound! The first thing we'll have to do, is open a MIDI connection. Since this is done via the Bela API, we have to do this in our binary crate. Specifically, in the constructor function passed to Bela::new
, as it requires a SetupContext
. This looks something like this:
struct Bela3 {
midi: Midi,
organ: TonewheelOrgan, // defined in no_std crate
}
impl Bela3 {
fn new(context: &mut SetupContext) -> Option<Bela3> {
// create Midi connection; ok to convert Result to Option
let midi = context.new_midi(cstr!("hw:0,0,0")).ok()?;
// create no_std audio processor, passing Bela's sample rate
let organ = TonewheelOrgan::new(context.audio_sample_rate());
Some(Bela3 { midi, organ })
}
}
We will implement TonewheelOrgan
later on in our no_std
-crate, so let's continue with our MIDI setup first.
The cstr!
-macro from the cstr
-crate, allows us to easily create a static, null-terminated C-string literal. Where does that value "hw:0,0,0"
come from? That is the ALSA port name of our MIDI device — yes, for MIDI the Bela uses plain Linux ALSA, no special RTOS support there. In this case, the virtual device, as I didn't connect a real MIDI device to the Bela's USB host port yet. If you are using a real keyboard, it should probably be "hw:1,0,0"
, but check the output of amidi -l
on the Bela console to make sure. In theory, we could probably automate this by enumerating valid ports using the alsa
-crate, but we'll skip that for now.
In the render
-function, we can then receive MIDI messages, which are one- to three-byte messages. We will have to parse these messages to identify what they mean — and we'll get some extra help to do so — but for now, we just want to forward them to our no_std
processor:
unsafe impl BelaApplication for Bela3 {
fn render(&mut self, context: &mut RenderContext) {
let Bela3 { midi, organ } = self;
// forward incoming midi messages to no_std subcrate
let mut buffer = [0u8; 3];
while let Some(msg) =
context.get_midi_message(midi, &mut buffer)
{
organ.process_midi_message(msg);
}
get_midi_message
retrieves a single message from the midi
object, using an externally provided three-byte buffer — note to self, maybe that buffer should be a [MaybeUninit<u8>; 3]
. If no messages are available, None
is returned. So to process all available MIDI messages, we loop until it returns None
by using a let
-match on Some(msg)
. Inside the loop, we just pass the resulting [u8]
slice to our yet-to-be-defined processor. So let's get started, using a one-to-one mapping between tone wheels and MIDI notes for now. For MIDI parsing we'll be using the wmidi
-crate. Since we want to keep the inner crate no_std
-compatible, we have to make sure to disable wmidi
's default features when adding it to our tonewheel-organ/Cargo.toml
:
[dependencies]
# ...
wmidi = { version = "4", default-features = false }
Then the MIDI handling in tonewheel-organ/lib.rs
becomes fairly simple:
pub struct TonewheelOrgan {
phasors: [u32x4; ROUNDED_TONE_WHEEL_CHUNKS],
phasor_steps: [u32x4; ROUNDED_TONE_WHEEL_CHUNKS],
active_notes: [u8x4; ROUNDED_TONE_WHEEL_CHUNKS],
}
impl TonewheelOrgan {
pub fn process_midi_message(&mut self, msg: &[u8]) {
// we only care about active_notes here
let TonewheelOrgan { active_notes, .. } = self;
// cast u8x4 array to u8 array (can we do this without unsafe?)
let active_notes_scalar: &mut [u8; 4
* ROUNDED_TONE_WHEEL_CHUNKS] =
unsafe { transmute::<&mut _, _>(active_notes) };
// use wmidi to parse msg
match MidiMessage::try_from(msg) {
Ok(MidiMessage::NoteOn(_channel, note, _velocity)) => {
let note = note as u8;
// 24 = C1 to 107 = B7 map directly to tonewheels
if (24..108).contains(¬e) {
active_notes_scalar[(note - 24) as usize] = !0;
// skip dummy tonewheels for 108 = C8 to 114 = F#8
} else if (108..115).contains(¬e) {
active_notes_scalar[(note - 19) as usize] = !0;
}
}
Ok(MidiMessage::NoteOff(_channel, note, _velocity)) => {
let note = note as u8;
// 24 = C1 to 107 = B7 map directly to tonewheels
if (24..108).contains(¬e) {
active_notes_scalar[(note - 24) as usize] = 0;
// skip dummy tonewheels for 108 = C8 to 114 = F#8
} else if (108..115).contains(¬e) {
active_notes_scalar[(note - 19) as usize] = 0;
}
}
Ok(MidiMessage::Reset) => {
// deactivate all notes on reset
*active_notes_scalar =
[0; 4 * ROUNDED_TONE_WHEEL_CHUNKS];
}
_ => {}
}
}
}
So on NoteOn
, we set the corresponding entry of active_notes_scalar
, which is just a scalar view into the vectorized active_notes
, to !0
, which just means that all bits are set. On NoteOff
we set the entry to zero instead. Finally, on Reset
, i.e., MIDI's panic button, we deactivate all active notes. For now, we'll completely ignore all other messages as well as the channel
and velocity
parts of the NoteOn
and NoteOff
messages. There also appears to be some weird remapping between MIDI notes and active_notes
going on here. That is related how to we generate sounds, and how our reference tonewheel organ is actually constructed.
Simulating the Tonewheels
The following information is based on an old Electric Druid article. The pitches of a tonewheel organ's wheels don't match the frequencies of an equal temperament scale exactly, since the irrational has to be approximated using a small set of gears, and gear ratios can only be rational. While the true harmonic ratios are rational, having nine sets of gears and tone wheels for each note would have been prohibitive. Plus, that wouldn't solve the irrational ratios between fundamentals. So within an octave, the following gear ratios are used that approximate an equal temperament scale:
Note | Driving (A) | Driven (B) | Ratio (A/B) |
---|---|---|---|
C | 85 | 104 | 0.817307692 |
C♯ | 71 | 82 | 0.865853659 |
D | 67 | 73 | 0.917808219 |
D♯ | 105 | 108 | 0.972222222 |
E | 103 | 100 | 1.030000000 |
F | 84 | 77 | 1.090909091 |
F♯ | 74 | 64 | 1.156250000 |
G | 98 | 80 | 1.225000000 |
G♯ | 96 | 74 | 1.297297297 |
A | 88 | 64 | 1.375000000 |
A♯ | 67 | 46 | 1.456521739 |
B | 108 | 70 | 1.542857143 |
The octaves are then generated by varying the number of teeth of the tonewheels:
Octave | Tonewheels | Teeth |
---|---|---|
1. | 12 | 2 |
2. | 12 | 4 |
3. | 12 | 8 |
4. | 12 | 16 |
5. | 12 | 32 |
6. | 12 | 64 |
7. | 12 | 128 |
8. | 7 | 192 |
Wait a minute. There is something wrong with that list. The first seven octaves have powers of two for the number of teeth — which makes sense, since octaves are a factor of two apart — but the eighth octave is only a factor of 1.5 — which corresponds to a fifth, not an octave — away from the seventh octave.
Well, the reason is simple: reliably manufacturing tonewheels with 256 teeth, precisely shaped to create a near-sinusoidal tone wasn't viable at the time. So instead, 192-tooth wheels were used, and corrected by shifting them by a quarter (complementing the fifth to get an octave). So the C8 wheel was actually connected to the F-gear and so on up to the F♯8 wheel, which was connected to the B-gear.
But why are we simulating the missing five wheels? The tonewheels were paired in “bins,” four octaves apart. While the bins were fairly well electromagnetically separated, there was crosstalk within each bin. Since we want to simulate effects like that and want to avoid excess conditionals, especially if they straddle SIMD vector size boundaries as is the case here, it just makes sense to simulate the unused wheels, even if we don't use their output. Luckily, each octave has 12 notes, which is neatly divisible by four (our vector width). And as a matter of fact, the real organ used dummy wheels as well to get similar mechanical loads.
Let's have a look at the tonewheel signal generation code (part of impl TonewheelOrgan
):
fn generate_base_signals(
&mut self,
signals: &mut [MaybeUninit<f32x4>; ROUNDED_TONE_WHEEL_CHUNKS],
) {
let Self {
phasors,
phasor_steps,
..
} = self;
let mid = ROUNDED_TONE_WHEEL_CHUNKS / 2;
let (phasors_a, phasors_b) = phasors.split_at_mut(mid);
let (phasors_steps_a, phasor_steps_b) =
phasor_steps.split_at(mid);
let (signals_a, signals_b) = signals.split_at_mut(mid);
for (
index,
(
phasors_a,
phasor_steps_a,
signals_a,
phasors_b,
phasor_steps_b,
signals_b,
),
) in itertools::izip!(
phasors_a.iter_mut(),
phasors_steps_a.iter(),
signals_a.iter_mut(),
phasors_b.iter_mut(),
phasor_steps_b.iter(),
signals_b.iter_mut()
)
.enumerate()
{
let a = {
let signals = sin_quadrant_cubic_x4(*phasors_a);
*phasors_a += *phasor_steps_a;
// simple "complex tone wheel" simulation (bottom
// octave is notched for additional harmonics)
if index < CHUNKS_PER_OCTAVE {
let s1 = signals;
let s2 = s1 * s1;
let s4 = s2 * s2;
let s6 = s4 * s2;
let s7 = s6 * s1;
s7 - 0.25 * s6 + 0.0765625
} else {
signals
}
};
let b = sin_quadrant_cubic_x4(*phasors_b);
*phasors_b += *phasor_steps_b;
signals_a.write(a + CROSSTALK_GAIN * b);
signals_b.write(b + CROSSTALK_GAIN * a);
}
}
At the beginning, we split everything into a bottom and top half, i.e., the lower four octaves and the upper four octaves. Then, we iterate over both in lockstep using itertools
' izip!
macro, which creates an iterator over a large tuple instead of the nested pairs you'd get when chaining regular zip
functions. Then two sets of four tonewheel signals a
and b
are generated using our approximate sin_quadrant_cubic_x4
, and then written out to signals
(via signals_a
and signals_b
) with a small amount of crosstalk applied (CROSSTALK_GAIN
is currently a guesstimated 1e-3
, i.e., — just barely noticeable).
But wait, there's something odd happening in the a
-branch. The topmost octave isn't the only unusual one. The first octave was only used on the foot pedals and used sharp-edged “complex” tonewheels that create a non-sinusoidal tone. Since I couldn't find detailed info on the waveform shape, I just went with a simple polynomial distortion term
I did find a photo of a complex tonewheel's waveform here, along with a bunch of detailed shots of the tone wheels and gears, but decided not to try modeling it too closely, since it is a slightly different model, no info about the pedal mute setting is provided etc.
The conditional here should be taken care of by the optimizer by splitting the loop (it was for me). Another neat feature of using a polynomial shaper is that we know exactly what the highest generated frequency will be (base frequency times highest exponent), and that it is low enough not to get any aliasing. The constant term is just to avoid a DC offset.
Now we just need to compute the phasor_steps
from our gear ratios and connect everything. For the steps, I used
const MOTOR_RPM_RUN: f32 = 20.0;
// `const`s are inlined while `static`s have a fixed memory location
static FREQUENCY_MULTIPLIERS: [f32x4; ROUNDED_TONE_WHEEL_CHUNKS] = [
// most octaves use the base gearings and an exponentially
// increasing number of teeth
// octave 1 -> 2 teeth
f32x4::new(
85.0 / 104.0 * 2.0,
71.0 / 82.0 * 2.0,
67.0 / 73.0 * 2.0,
105.0 / 108.0 * 2.0,
),
f32x4::new(
103.0 / 100.0 * 2.0,
84.0 / 77.0 * 2.0,
74.0 / 64.0 * 2.0,
98.0 / 80.0 * 2.0,
),
f32x4::new(
96.0 / 74.0 * 2.0,
88.0 / 64.0 * 2.0,
67.0 / 46.0 * 2.0,
108.0 / 70.0 * 2.0,
),
// octave 2 -> 4 teeth
f32x4::new(
85.0 / 104.0 * 4.0,
71.0 / 82.0 * 4.0,
67.0 / 73.0 * 4.0,
105.0 / 108.0 * 4.0,
),
f32x4::new(
103.0 / 100.0 * 4.0,
84.0 / 77.0 * 4.0,
74.0 / 64.0 * 4.0,
98.0 / 80.0 * 4.0,
),
f32x4::new(
96.0 / 74.0 * 4.0,
88.0 / 64.0 * 4.0,
67.0 / 46.0 * 4.0,
108.0 / 70.0 * 4.0,
),
// octave 3 -> 8 teeth
f32x4::new(
85.0 / 104.0 * 8.0,
71.0 / 82.0 * 8.0,
67.0 / 73.0 * 8.0,
105.0 / 108.0 * 8.0,
),
f32x4::new(
103.0 / 100.0 * 8.0,
84.0 / 77.0 * 8.0,
74.0 / 64.0 * 8.0,
98.0 / 80.0 * 8.0,
),
f32x4::new(
96.0 / 74.0 * 8.0,
88.0 / 64.0 * 8.0,
67.0 / 46.0 * 8.0,
108.0 / 70.0 * 8.0,
),
// octave 4 -> 16 teeth
f32x4::new(
85.0 / 104.0 * 16.0,
71.0 / 82.0 * 16.0,
67.0 / 73.0 * 16.0,
105.0 / 108.0 * 16.0,
),
f32x4::new(
103.0 / 100.0 * 16.0,
84.0 / 77.0 * 16.0,
74.0 / 64.0 * 16.0,
98.0 / 80.0 * 16.0,
),
f32x4::new(
96.0 / 74.0 * 16.0,
88.0 / 64.0 * 16.0,
67.0 / 46.0 * 16.0,
108.0 / 70.0 * 16.0,
),
// octave 5 -> 32 teeth
f32x4::new(
85.0 / 104.0 * 32.0,
71.0 / 82.0 * 32.0,
67.0 / 73.0 * 32.0,
105.0 / 108.0 * 32.0,
),
f32x4::new(
103.0 / 100.0 * 32.0,
84.0 / 77.0 * 32.0,
74.0 / 64.0 * 32.0,
98.0 / 80.0 * 32.0,
),
f32x4::new(
96.0 / 74.0 * 32.0,
88.0 / 64.0 * 32.0,
67.0 / 46.0 * 32.0,
108.0 / 70.0 * 32.0,
),
// octave 6 -> 64 teeth
f32x4::new(
85.0 / 104.0 * 64.0,
71.0 / 82.0 * 64.0,
67.0 / 73.0 * 64.0,
105.0 / 108.0 * 64.0,
),
f32x4::new(
103.0 / 100.0 * 64.0,
84.0 / 77.0 * 64.0,
74.0 / 64.0 * 64.0,
98.0 / 80.0 * 64.0,
),
f32x4::new(
96.0 / 74.0 * 64.0,
88.0 / 64.0 * 64.0,
67.0 / 46.0 * 64.0,
108.0 / 70.0 * 64.0,
),
// octave 7 -> 128 teeth
f32x4::new(
85.0 / 104.0 * 128.0,
71.0 / 82.0 * 128.0,
67.0 / 73.0 * 128.0,
105.0 / 108.0 * 128.0,
),
f32x4::new(
103.0 / 100.0 * 128.0,
84.0 / 77.0 * 128.0,
74.0 / 64.0 * 128.0,
98.0 / 80.0 * 128.0,
),
f32x4::new(
96.0 / 74.0 * 128.0,
88.0 / 64.0 * 128.0,
67.0 / 46.0 * 128.0,
108.0 / 70.0 * 128.0,
),
// the final octave only has 192-tooth tone wheels and 5 dummy
// wheels
// octave 8 -> 192(!) teeth
f32x4::splat(0.0), // 4 dummy wheels
f32x4::new(
0.0, // final 5th dummy wheel
84.0 / 77.0 * 192.0,
74.0 / 64.0 * 192.0,
98.0 / 80.0 * 192.0,
),
f32x4::new(
96.0 / 74.0 * 192.0,
88.0 / 64.0 * 192.0,
67.0 / 46.0 * 192.0,
108.0 / 70.0 * 192.0,
),
];
#[derive(Clone, Copy)]
#[repr(transparent)]
struct FrequencyPhasorConversionFactor(f32);
impl FrequencyPhasorConversionFactor {
fn new(sample_rate: f32) -> FrequencyPhasorConversionFactor {
FrequencyPhasorConversionFactor(
(1u64 << u32::BITS) as f32 / sample_rate,
)
}
fn to_step_x4(self, frequencies: f32x4) -> u32x4 {
u32x4::from_cast(f32x4::splat(self.0) * frequencies)
}
}
impl TonewheelOrgan {
/* ... */
pub fn new(sample_rate: f32) -> Self {
let frequency_multipliers = &FREQUENCY_MULTIPLIERS;
let phasors = [u32x4::splat(0); ROUNDED_TONE_WHEEL_CHUNKS];
let conversion =
FrequencyPhasorConversionFactor::new(sample_rate);
let mut phasor_steps =
[u32x4::splat(0); ROUNDED_TONE_WHEEL_CHUNKS];
for chunk in 0..ROUNDED_TONE_WHEEL_CHUNKS {
let frequencies =
MOTOR_RPM_RUN * frequency_multipliers[chunk];
phasor_steps[chunk] = conversion.to_step_x4(frequencies);
}
let active_notes = [u8x4::splat(0); ROUNDED_TONE_WHEEL_CHUNKS];
TonewheelOrgan {
phasors,
phasor_steps,
active_notes,
}
}
Sorry for the wall of code, but most of it is just a bunch of constants taken from the tables above, multiplied by the motor's rotation frequency of 20 Hz. In the future, we'll want to vary the motor RPM as well. But let's see if we even have the compute power left over. After all the constants, I defined a helper to perform floating-point frequency to fixed-point phasor step conversion. Then new
is just a matter of multiplying it all together, and initializing the rest to zero.
Now we just need to sum up the active signals, and pass the result on to the Bela. Within impl TonewheelOrgan
, we add
pub fn render_sample(&mut self) -> f32 {
let mut signals = MaybeUninit::uninit_array();
self.generate_base_signals(&mut signals);
let signals =
unsafe { MaybeUninit::array_assume_init(signals) };
// signal summation
let mut signal_x4 = f32x4::splat(0.0);
for chunk in 0..ROUNDED_TONE_WHEEL_CHUNKS {
signal_x4 += f32x4::from_bits(
i32x4::from_cast(i8x4::from_cast(
self.active_notes[chunk],
)) & i32x4::from_bits(signals[chunk]),
);
}
let signal = 0.05 * signal_x4.sum();
// soft clipping
let signal = signal.max(-1.0).min(1.0);
let signal = 1.5 * signal - 0.5 * signal.powi(3);
signal
}
which simply calls the previously defined generate_base_signals
, determines active signals by applying active_notes
as a bit-mask, and sums the signals. Summation is done in two steps for performance: first four separate sums are computed for each SIMD lane, then the final horizontal sum is computed only once (and scaled, because we don't want to add multiple full-scale signals). The soft clipping code at the end is just to prevent it from sounding too bad if our total signal turns out a bit hot anyway.
The remainder of our render
function (see above) is now just a matter of copying the output to all of the Bela's outputs:
let audio_out_channels = context.audio_out_channels();
for frame in
context.audio_out().chunks_exact_mut(audio_out_channels)
{
// generate sample using no_std subcrate
let signal = organ.render_sample();
// write generated sample to all audio outputs
for sample in frame {
*sample = signal;
}
}
}
}
And that's pretty much it for this time! As usual, feel free to follow me and send me a DM on Mastodon if you have any questions or comments. I'm also active on the Bela forum.
But I don't want to leave you hanging, wondering what this very, very basic version of our organ sounds like so here goes:
Sorry for the mediocre quality, I couldn't find any ⅛" jack to XLR adapters (or ⅛" to ¼" jack adapters either), so I had to record this with my mainboard's built-in sound card, which buzzes like crazy. First I play a scale on the complex tone wheels, then an arpeggio for every octave. Some of them even sound kinda organ-ish. Due to the free-running oscillators it also clicks a lot. But that's fine and to be expected. The real thing also had to apply some tricks to control clicking.