Short-Time Fourier Transform#
import IPython.display as ipd
import matplotlib.pyplot as plt
import librosa.display
from mirdotcom import mirdotcom
mirdotcom.init()
Musical signals are highly non-stationary, i.e., their statistics change over time. It would be rather meaningless to compute a single Fourier transform over an entire 10-minute song.
The short-time Fourier transform (STFT) (Wikipedia; FMP, p. 53) is obtained by computing the Fourier transform for successive frames in a signal.
\[ X(m, \omega) = \sum_n x(n) w(n-m) e^{-j \omega n} \]
As we increase \(m\), we slide the window function \(w\) to the right. For the resulting frame, \(x(n) w(n-m)\), we compute the Fourier transform. Therefore, the STFT \(X\) is a function of both time, \(m\), and frequency, \(\omega\).
Let’s load a file:
filename = mirdotcom.get_audio("brahms_hungarian_dance_5.mp3")
x, sr = librosa.load(filename)
ipd.Audio(x, rate=sr)