FFT in Bonzomatic

Following FFT explaination will use this resource https://www.ap.com/news/more-about-ffts as vocabulary base.

Bonzomatic can have the sound from a microphone or a loopback device (Desktop sound etc...) processed with a FFT and then available on the shader fia the texture texFFT.

Characteristics

The FFT_SIZE defined at , correspond to the FFT Output Bin size.

The sampleBuf defined , correspond to our FFT Length / FFT Input and is defined as twice the FFT_SIZE .

The lib used by bonzomatic to fetch audio frame is miniaudio . The initial configuration is defined at :

samplingRate = 44100 , it's hardcoded

2 channels enabled, left and right

The lib then define an interface waiting for a callback function that will be called per audio frame. This callback is defined at . It's important to note that frameCount represent the frame size of the current sample which has been observed to be samplingRate / 100 = 441 with interleaved data for left and right channel.

This callback function, at each audio frame, append the latest received audioframe by shifting the current sampleBuf .

The actual computation is done at . The process done to convert the result of the FFT to an explotable texture is :

Getting the Amplitude of the signal

Amplify the signal with a configurable variable.

The array is then injected as texture at

Knowing theses details :

FFT bandwidth : 44100 / 2 = 22050 . Which means FFT is going from 0hz to 22khz

Bin width : 22050 / 1024 = 21.53 . So each bin has a width of 21.53 hz

Consequences

Out of the fact that the FFT Amplitude is used "raw" without prior pre-processing for "normalisation", you should also take into account the FFT Bandwidth :

Human ear goes from 20hz to 20khz, which means the first bin and the last 100 bins (which represent around 2khz width) are technically un-eared sound.

Warning high frequency notes on following video

https://scrapbox.io/files/66feb93845de66001d7a1591.mp4

You also need to take into account what kind of sound is getting generated :

If the sound generated is from a "raw" sources, like generating sound directly from the computer or using a lossless format, then all the bandwidth will technically have data

If the sound comes from a lossy format, like mp3 or a stream on any VOD / Streaming platform, it's highly probable that a part of high frequencies of the FFT won't have any information, as most of theses format will just striped theses frequencies. E.g, MP3 with some parameter will mosly cut frequencies above 16 -19khz

Following example does some amplitude rectification to have something more visual.

FLAC File (We got roughtly everything <20khz represented)

https://scrapbox.io/files/66febce97c7b8e001d7ef110.mp4

MP3 from website (notice the drop much before the 20khz from example above)

https://scrapbox.io/files/66febd3bb8a4de001c34529d.mp4

FFT Smoothed

The texture texFFTSmoothed defined at takes the FFT values but do a mix of values between current and previous FFT values using the fFFTSlightSmoothingFactor . The impact of this operation will generate a similar FFT data but more robust to suddent changes and spike of amplitude.

FFT Integrated

the texture texFFTIntegrated defined at https://github.com/Gargaj/Bonzomatic/blob/master/src/main.cpp#L537-L538 does accumulate the FFT values. The accumulated values isn't the raw value from the fftData but it been smoothed in fftDataSlightlySmoothed , the same way than texFFTSmoothed, before beeing accumulated to the fftDataIntegrated.

This generate a texture where each texel will increment at the pace of the bin frequency associated. You could imagine having a fGlobalTime but instead of increasing monotonously like a normal clock, it increase based on the selected frequency amplitude, which could create a sense of rythm if well used. See FFT based motion for concreate example.

Notice also the check defined at that make sure the value of all bins in fftDataIntegrated never go above maxIntegralValue, which is a hard coded value defined at to 1024.