Unsupervised Instrument Classification Using K-Means#

import librosa
import mir_eval
import numpy

from mirdotcom import mirdotcom

mirdotcom.init()

This tutorial is loosely based on Lab 3 (2010).

Read Audio#

Retrieve an audio file, load it into an array, and listen to it.

?urllib.urlretrieve
?librosa.load
?IPython.display.Audio

Detect Onsets#

Detect onsets in the audio signal:

?librosa.onset.onset_detect

Convert the onsets from units of frames to seconds (and samples):

?librosa.frames_to_time
?librosa.frames_to_samples

Listen to detected onsets:

?mir_eval.sonify.clicks
?IPython.display.Audio

Extract Features#

Extract a set of features from the audio at each onset. Use any of the features we have learned so far: zero crossing rate, spectral moments, MFCCs, chroma, etc. For more, see the librosa API reference.

First, define which features to extract:

def extract_features(x, fs):
    feature_1 = librosa.zero_crossings(x).sum()  # placeholder
    feature_2 = 0  # placeholder
    return [feature_1, feature_2]

For each onset, extract a feature vector from the signal:

# Assumptions:
# x: input audio signal
# fs: sampling frequency
# onset_samples: onsets in units of samples
frame_sz = fs * 0.100
features = numpy.array(
    [extract_features(x[i : i + frame_sz], fs) for i in onset_samples]
)

Scale Features#

Use sklearn.preprocessing.MinMaxScaler to scale your features to be within [-1, 1].

?sklearn.preprocessing.MinMaxScaler
?sklearn.preprocessing.MinMaxScaler.fit_transform

Plot Features#

Use scatter to plot features on a 2-D plane. (Choose two features at a time.)

?plt.scatter

Cluster Using K-Means#

Use KMeans to cluster your features and compute labels.

?sklearn.cluster.KMeans
?sklearn.cluster.KMeans.fit_predict

Plot Features by Class Label#

Use scatter, but this time choose a different marker color (or type) for each class.

?plt.scatter

Listen to Click Track#

Create a beep for each onset within a class:

beeps = mir_eval.sonify.clicks(onset_times[labels == 0], fs, length=len(x))
?IPython.display.Audio

Listen to Clustered Frames#

Use the concatenated_segments function from the feature sonification exercise to concatenate frames from the same cluster into one signal. Then listen to the signal.

def concatenate_segments(segments, fs=44100, pad_time=0.300):
    padded_segments = [
        numpy.concatenate([segment, numpy.zeros(int(pad_time * fs))])
        for segment in segments
    ]
    return numpy.concatenate(padded_segments)


concatenated_signal = concatenate_segments(segments, fs)

Compare across separate classes. What do you hear?

For Further Exploration#

Use a different number of clusters in KMeans.

Use a different initialization method in KMeans.

Use different features. Compare tonal features against timbral features.

?librosa.feature

Use different audio files.

# filename = '1_bar_funk_groove.mp3'
# filename = '58bpm.wav'
# filename = '125_bounce.wav'
# filename = 'prelude_cmaj_10s.wav'