Spectral Features#

import IPython.display as ipd
import matplotlib.pyplot as plt
import librosa.display
import sklearn

from mirdotcom import mirdotcom

mirdotcom.init()

For classification, we’re going to be using new features in our arsenal: spectral moments (centroid, bandwidth, skewness, kurtosis) and other spectral statistics.

A moment is a term used in physics and statistics. There are raw moments and central moments.

You are probably already familiar with two examples of moments: mean and variance. The first raw moment is known as the mean. The second central moment is known as the variance.

Spectral Centroid#

Load an audio file:

filename = mirdotcom.get_audio("simple_loop.wav")
x, sr = librosa.load(filename)
ipd.Audio(x, rate=sr)

The spectral centroid (Wikipedia) indicates at which frequency the energy of a spectrum is centered upon. This is like a weighted mean:

\[ f_c = \frac{\sum_k S(k) f(k)}{\sum_k S(k)} \]

where \(S(k)\) is the spectral magnitude at frequency bin \(k\), \(f(k)\) is the frequency at bin \(k\).

librosa.feature.spectral_centroid computes the spectral centroid for each frame in a signal:

spectral_centroids = librosa.feature.spectral_centroid(y=x, sr=sr)[0]
spectral_centroids.shape
(97,)

Compute the time variable for visualization:

frames = range(len(spectral_centroids))
t = librosa.frames_to_time(frames)

Define a helper function to normalize the spectral centroid for visualization:

def normalize(x, axis=0):
    return sklearn.preprocessing.minmax_scale(x, axis=axis)

Plot the spectral centroid along with the waveform:

librosa.display.waveshow(x, sr=sr, alpha=0.4)
plt.plot(
    t, normalize(spectral_centroids), color="r"
)  # normalize for visualization purposes
plt.ylabel("Amplitude")
Text(22.472222222222214, 0.5, 'Amplitude')
../../_images/f4980148eb6b04bc0033b128b6941522efe7d52e1b25c9eb3d1f36016aaa4f8b.png

Similar to the zero crossing rate, there is a spurious rise in spectral centroid at the beginning of the signal. That is because the silence at the beginning has such small amplitude that high frequency components have a chance to dominate. One hack around this is to add a small constant before computing the spectral centroid, thus shifting the centroid toward zero at quiet portions:

spectral_centroids = librosa.feature.spectral_centroid(y=x + 0.01, sr=sr)[0]
librosa.display.waveshow(x, sr=sr, alpha=0.4)
plt.plot(
    t, normalize(spectral_centroids), color="r"
)  # normalize for visualization purposes
plt.ylabel("Amplitude")
Text(22.472222222222214, 0.5, 'Amplitude')
../../_images/ca92775bc7e4d12100bb47a006d6763e62a0523b2db9835ba4c178a8a5aaa458.png

Spectral Bandwidth#

librosa.feature.spectral_bandwidth computes the order-\(p\) spectral bandwidth:

\[ \left( \sum_k S(k) \left(f(k) - f_c \right)^p \right)^{\frac{1}{p}} \]

where \(S(k)\) is the spectral magnitude at frequency bin \(k\), \(f(k)\) is the frequency at bin \(k\), and \(f_c\) is the spectral centroid. When \(p = 2\), this is like a weighted standard deviation.

spectral_bandwidth_2 = librosa.feature.spectral_bandwidth(y=x + 0.01, sr=sr)[0]
spectral_bandwidth_3 = librosa.feature.spectral_bandwidth(y=x + 0.01, sr=sr, p=3)[0]
spectral_bandwidth_4 = librosa.feature.spectral_bandwidth(y=x + 0.01, sr=sr, p=4)[0]
librosa.display.waveshow(x, sr=sr, alpha=0.4)
plt.plot(t, normalize(spectral_bandwidth_2), color="r")
plt.plot(t, normalize(spectral_bandwidth_3), color="g")
plt.plot(t, normalize(spectral_bandwidth_4), color="y")
plt.legend(("p = 2", "p = 3", "p = 4"))
plt.ylabel("Amplitude")
Text(22.472222222222214, 0.5, 'Amplitude')
../../_images/4a5eb85b98065eb6e98b939d9e961b9d2bd29e9aa0173120f1c9c328758f7dc1.png

Spectral Contrast#

Spectral contrast considers the spectral peak, the spectral valley, and their difference in each frequency subband. For more information:

librosa.feature.spectral_contrast computes the spectral contrast for six subbands for each time frame:

spectral_contrast = librosa.feature.spectral_contrast(y=x, sr=sr)
spectral_contrast.shape
(7, 97)

Display:

plt.imshow(
    normalize(spectral_contrast, axis=1), aspect="auto", origin="lower", cmap="coolwarm"
)
plt.ylabel("Subband")
plt.xlabel("Time (frames)")
Text(0.5, 0, 'Time (frames)')
../../_images/e5dd324f910a7a42d008755842a5726912ccd2c70f8c4069f30102dfb3b820fd.png

Spectral Rolloff#

Spectral rolloff is the frequency below which a specified percentage of the total spectral energy, e.g. 85%, lies.

librosa.feature.spectral_rolloff computes the rolloff frequency for each frame in a signal:

spectral_rolloff = librosa.feature.spectral_rolloff(y=x + 0.01, sr=sr)[0]
librosa.display.waveshow(x, sr=sr, alpha=0.4)
plt.plot(t, normalize(spectral_rolloff), color="r")
plt.ylabel("Amplitude")
Text(22.472222222222214, 0.5, 'Amplitude')
../../_images/7d5e86d7a92b76bfc633cfeb3de9b409a6480ec2ea1d0c818213a356a7b06205.png