I've started learning about spectrum analysis a long time ago at university,
but failed to gain a true insight in how it works under the hood.
For me, it was just some magical math formulas I had to memorize and know about.
I knew the theory but it didn't really click

.

A few years ago I wanted to do some audio programming and play around with analyzing sound. That's
how I got interested into building an intuition on how the Fourier transform actually

works.

I made these interactive examples mostly to improve my own understanding and learn by doing.
I am following a mechanistic approach and building from the ground up with simpler building blocks,
rather than introducing the formula and saying this is what it means.

Since I mainly do web programming for a living, this seems to be easier to wrap my mind around than some *abstract math*.

My language and math are very simplified, and probably a bit imprecise or not completely correct. I am also not including explanations for everything (like trigonometry functions etc, because that's out of scope) and completely ignoring some details that I didn't think are relevant (like negative frequencies).

Feel free to submit any corrections or feedback in the github repo.

Sound comprises of oscillations or repeating patterns in pressure that propagate in a medium. Sound waves typically have complex patterns repeating at different frequencies. We can typically reconstruct any complex wave form from other periodic waves.

Sinusoidal waves are a good choice for this because they have nice mathematical properties. That is, sound waves are not made out of sines as real-world phenomena.

To quote one reddit comment:

The only things "perfect" about a sine wave are its mathematical conveniences, it has no special connection to real-world phenomena. You could model those phenomena with equal precision with other waveforms.~RickRussellTX

Sine waves also represent up-and-down motion of pistons on a crankshaft:

I suspect that the obsession with sine waves comes from the fact that in our daily machine-driven lives we have an awful lot of machines that rotate around an axis, or are attached to crankshafts and what-have you. A piston on a crankshaft produces "pure" sinusoidal up-and-down motion when operating at constant angular velocity.~RickRussellTX

Sinusoids in fact cannot correctly reproduce discontinuous signals, but they are good enough for band-limited signals.

Probingwith a single sine wave

We start from a relatively straightforward approach of

or analyzing the target signal by
doing a *probing**sine transform*
as originally done by Fourier himself. I'm using the term probe

here, I've read it somewhere else already and it feels appropriate,
but these are usually called analyzing functions.

The idea is to multiply the signal with a pure sine wave. The resulting transform is the area, which can tell us how closely the signal aligns with our test sine wave.

Here is a simplified formula for this idea: $$\mathit{transform} = \mathit{area}(\ \mathit{target\ signal} \times \mathit{probe}\ )$$

For completeness sake, let's also include the math formula for our
sine transform that analyzes the presence of frequency *s* in the target signal:

\(sin(2πst)\) is our sine analysis function (aka *probe*) and \(f(t)\) is our target signal.

Our target signal is a **4Hz sine wave**.
We can multiply it by a single sine probe with a fixed phase. This means it won't slide left or right, just change its frequency.

When we aggregate all the values in the transform together, we get a high positive result if the probe correlates with the signal. This number represents the magnitude of the transform. For non-matching frequencies the result is zero because all the peaks and troughs cancel out.

This is all there is to it, the magic trick behind the transform. The total sum of a single sine wave averages out to zero, because there are positive and negative sides. But when we multiply it with a correlating signal, multiplying the negatives will turn them into positive numbers, effectively resulting in a non-zero sum of the transform.

Notice how the resulting magnitude also depends on the amplitude of the signal we want to analyze.

Let's try to analyze a signal that is still **4Hz** but offset by a quarter of a turn (phase is **π/2**).

Why can't we match the signal?

The sine probe is out of phase with the target signal and won't correlate. Sinusoidal waves are periodic and repeat from the beginning every one turn around the circle, 360° or 2π radians. When the phase of our probe doesn't match the phase of the target wave, we won't get a match.

Let's turn our probe into a *cosine* wave. This will bring it back by a quarter of a turn to match the signal perfectly.

Notice how we're also getting a so-called DC offset of magnitude 1 for all the frequencies that we are analyzing.
The transform for our 0Hz probe is also a replica of the original wave! This is because the *cosine* function is 1 when the phase is 0.
The *sine* starts at 0, so we don't see the same effect.

Probingwith sine and cosine

Let's try combining the sine & cosine transforms in our analysis of a
4Hz wave ofset by π/4 (half way between a sine and a cosine).
This is effectively the famous Fourier transform

.

We are now dealing with two numbers, which we can show on an 2d *xy-plot*.

When we pick the 4Hz analysis waves, we can notice how both match partially. From this we can re-construct a complete match. Any movement in phase of the target wave will reflect in the sine and cosine components of our transform.

From the ratio of our sine and cosine matches we can figure out the actual phase and magnitude (as if we were using a sinusoid of that phase).

The magnitude becomes the length of the diagonal line, which is equal to \(\sqrt{{sin}^2 + {cos}^2}\). The phase can be found from the ratio or angle between the sine & cosine components, ie \(atan2({sin}, {cos})\).

We don't need any more *probes* to cover the rest of the quadrants, since the next two
sinusoids (apart by π/2) would just be the negatives of our sine/cosine probes.

$$\sin(π) = {-\sin(0)}, \ \ \cos(π) = {-\cos(0)}$$

To further drive the point home, let's look at another interactive example.

Here we make the frequencies of both the signal under test and our sine/cosine probes set at **4Hz**
and just look at the phase of the signal.

**Drag** the target sine wave left and right. Notice that our sine & cosine probes give a match for any phase in-between.

- The frequency graph with bins shows magnitudes for all probed frequencies.
- Again, notice that the 0Hz band for our cosine transform produces an average of the signal (which is zero in this case).
- When choosing 1/4 Hertz resolution there is something strange going on. We don't get a nice single bar for each of the frequencies. Instead, the spectrum spills over into neighboring buckets.
- At 0.25Hz resolution the 3.75Hz frequency bar appears stronger than the 4Hz bar, even though it's a 4Hz component (the actual component has a phase of π).
- Drag the target signal left or right to change its phase. Notice how this affects the bars in the spectrum plot.

Let's examine these anomalies in our next examples.

What happens if we analyze frequencies that are not whole numbers, but somewhere in-between?
Let's say this time we are analyzing a frequency of **3.5Hz** with phase **π/3**.

See how for non-integer frequencies we don't get a single bar, instead the spectrum is spilling over into adjacent buckets. The actual peak is somewhere between 3Hz & 4Hz. We can figure out the actual peak either by interpolating between the bins, or we can increase the number of bins by creating more granular probes for analyzing.

What's even stranger is how the spill-over shape doesn't just taper out like a bell curve, instead it oscillates up and down, but dampens as it spreads out, creating shapes known as lobes.

This *spilling over* happens because our sine isn't infinite in time and we are analyzing
frequencies that are not all periodic with the fixed time interval.

An infinite sine would give perfect peaks in the spectrum plot, but our sine signal is showing lobes in neighboring frequency bins. With a finite window we will have many of our probes that are close in frequency give a perceptible match. That's because they or the signal don't fit perfectly in the window (aren't periodic with it) and have some dangling fragments near each side of the window.

In different words, our transforms don't always cancel out when they should, they are sliced off at the ends in the wrong place. If we had an infinite window of time or we tailored the window to be periodic with each measured frequency and the target signal, all of this noise would cancel out and leave a pure spike for the probe that fully matches our sinusoid.

To illustrate the effect, here is an example of **2.5Hz** wave and a 1-second time window.
Increasing the time interval changes the spectral resolution and we get a cleaner spike for the target frequency.
This shape of the frequency spectrum with side lobes spreading out is known as the **sinc** function (in this case, it's absolute, ie |sinc|).

We can demonstrate this requirement to have a periodic window of time in the following interactive example.

There is a 2Hz target sine wave and a 4Hz sine probe. The time interval can be increased in increments of 0.125s or half of the probe's period. Notice how when the window size is periodic with the sine transform, we get a perfect cancellation. This is the behavior we want, our 4Hz probe isn't supposed to give a match for the 2Hz target signal. We get partial matches when the probe sine does not perfectly fit in the window (isn't periodic with it).

This is where those lobes and the *sinc* shape come from.
Note that for higher frequencies these fragments are smaller and have less effect on the transform,
causing smaller spectrum lobes.

How do complex numbers come into play?

There is a common way to write down the Fourier transform for frequency *s* that uses complex numbers:
$$F_s = \int_{-\infty}^\infty f(t)\cdot(\cos(2πst) - i \cdot\sin(2πst))dt$$

From my understanding, it's just a convenient way to write sine & cosine components in a single math expression. Multiplying the sine transform with the imaginary number separates it from the cosine component, since real & imaginary numbers don't mix.

Another common notation uses Euler's identity: $$e^{ix} = \cos{x} + i\sin{x}$$ which gives us this beautiful condensed version: $$F_s = \int_{-\infty}^\infty f(t)\cdot e^{-i2πst}dt$$

We can analyze frequencies in any waveform by multiplying it with other periodic waves for each of the frequencies we want to find.
These analysis waveforms, aka probes

must have some parts above zero and some below zero so their sum averages to zero.
Sines and cosines are good candidates, the combination of these two transforms is sufficient to detect a
frequency regardless of where it is positioned in the wave, ie its phase (or even its duration).
But they exhibit issues when analyzing limited time intervals, since they don't taper off and instead extend to infinity.