Basic Feature Extraction

Basic Feature Extraction#

Somehow, we must extract the characteristics of our audio signal that are most relevant to the problem we are trying to solve. For example, if we want to classify instruments by timbre, we will want features that distinguish sounds by their timbre and not their pitch. If we want to perform pitch detection, we want features that distinguish pitch and not timbre.

This process is known as feature extraction.

from pathlib import Path

import matplotlib.pyplot as plt
import librosa
import librosa.display
import numpy
import sklearn

from mirdotcom import mirdotcom

mirdotcom.init()

Let’s begin with twenty audio files: ten kick drum samples, and ten snare drum samples. Each audio file contains one drum hit.

Read and store each signal:

kick_signals = [
    librosa.load(p)[0]
    for p in Path().glob(mirdotcom.AUDIO_DIRECTORY + "/drum_samples/train/kick_*.mp3")
]
snare_signals = [
    librosa.load(p)[0]
    for p in Path().glob(mirdotcom.AUDIO_DIRECTORY + "/drum_samples/train/snare_*.mp3")
]

len(kick_signals)

len(snare_signals)

Display the kick drum signals:

plt.figure(figsize=(15, 6))
for i, x in enumerate(kick_signals):
    plt.subplot(2, 5, i + 1)
    librosa.display.waveshow(x[:10000])

    # Y-axis label only on the first column
    if i == 0 or i == 5:
        plt.ylabel("Amplitude")

    # X-axis label only on the bottom row
    if i >= 5:
        plt.xlabel("Time")
    else:
        plt.xlabel("")

    plt.gca().set_xticks([0, 0.25, 0.5])
    plt.ylim(-1, 1)

../../_images/a00e88d88b8d662a6ad18699a1583719cc7a806373c050e0d3c07d98eebc53f2.png

Display the snare drum signals:

plt.figure(figsize=(15, 6))
for i, x in enumerate(snare_signals):
    plt.subplot(2, 5, i + 1)
    librosa.display.waveshow(x[:10000])

    # Y-axis label only on the first column
    if i == 0 or i == 5:
        plt.ylabel("Amplitude")

    # X-axis label only on the bottom row
    if i >= 5:
        plt.xlabel("Time")
    else:
        plt.xlabel("")

    plt.gca().set_xticks([0, 0.25, 0.5])
    plt.ylim(-1, 1)

../../_images/8a9cbd4d11452a677eb2d46d7f1d4d047c55013c20168a67e9ab49527f3f57fd.png

Constructing a Feature Vector#

A feature vector is simply a collection of features. Here is a simple function that constructs a two-dimensional feature vector from a signal:

def extract_features(signal):
    return [
        librosa.feature.zero_crossing_rate(y=signal)[0, 0],
        librosa.feature.spectral_centroid(y=signal)[0, 0],
    ]

If we want to aggregate all of the feature vectors among signals in a collection, we can use a list comprehension as follows:

kick_features = numpy.array([extract_features(x) for x in kick_signals])
snare_features = numpy.array([extract_features(x) for x in snare_signals])

Visualize the differences in features by plotting separate histograms for each of the classes:

plt.figure(figsize=(14, 5))
plt.hist(kick_features[:, 0], color="b", range=(0, 0.2), alpha=0.5, bins=20)
plt.hist(snare_features[:, 0], color="r", range=(0, 0.2), alpha=0.5, bins=20)
plt.legend(("kicks", "snares"))
plt.xlabel("Zero Crossing Rate")
plt.ylabel("Count")

Text(0, 0.5, 'Count')

../../_images/bafd682eee7dbb6546ebbbdc39d03a55fa0e085f9c349035d4082a20e81e731f.png

plt.figure(figsize=(14, 5))
plt.hist(kick_features[:, 1], color="b", range=(0, 4000), bins=30, alpha=0.6)
plt.hist(snare_features[:, 1], color="r", range=(0, 4000), bins=30, alpha=0.6)
plt.legend(("kicks", "snares"))
plt.xlabel("Spectral Centroid (frequency bin)")
plt.ylabel("Count")

Text(0, 0.5, 'Count')

../../_images/76daaa708db23868f4b20ca3de4444421d27fdd6c99caf4a95135e6547447677.png

Feature Scaling#

The features that we used in the previous example included zero crossing rate and spectral centroid. These two features are expressed using different units. This discrepancy can pose problems when performing classification later. Therefore, we will normalize each feature vector to a common range and store the normalization parameters for later use.

Many techniques exist for scaling your features. For now, we’ll use sklearn.preprocessing.MinMaxScaler. MinMaxScaler returns an array of scaled values such that each feature dimension is in the range -1 to 1.

Let’s concatenate all of our feature vectors into one feature table:

feature_table = numpy.vstack((kick_features, snare_features))
print(feature_table.shape)

(20, 2)

Scale each feature dimension to be in the range -1 to 1:

scaler = sklearn.preprocessing.MinMaxScaler(feature_range=(-1, 1))
training_features = scaler.fit_transform(feature_table)
print(training_features.min(axis=0))
print(training_features.max(axis=0))

[-1. -1.]
[1. 1.]

Plot the scaled features:

plt.scatter(training_features[:10, 0], training_features[:10, 1], c="b")
plt.scatter(training_features[10:, 0], training_features[10:, 1], c="r")
plt.xlabel("Zero Crossing Rate")
plt.ylabel("Spectral Centroid")

Text(0, 0.5, 'Spectral Centroid')

../../_images/a4d99b0edf3636265a9d32bd8b2edccf669cc2d452813cdb925a2236fc59900d.png

Basic Feature Extraction

Contents

Basic Feature Extraction#

Constructing a Feature Vector#

Feature Scaling#