In [ ]:
import numpy, scipy, matplotlib.pyplot as plt, sklearn, librosa, mir_eval, IPython.display, urllib

Unsupervised Instrument Classification Using K-Means

This lab is loosely based on Lab 3 (2010).

Read Audio

Retrieve an audio file, load it into an array, and listen to it.

In [ ]:
urllib.urlretrieve?
In [ ]:
librosa.load?
In [ ]:
IPython.display.Audio?

Detect Onsets

Detect onsets in the audio signal:

In [ ]:
librosa.onset.onset_detect?

Convert the onsets from units of frames to seconds (and samples):

In [ ]:
librosa.frames_to_time?
In [ ]:
librosa.frames_to_samples?

Listen to detected onsets:

In [ ]:
mir_eval.sonify.clicks?
In [ ]:
IPython.display.Audio?

Extract Features

Extract a set of features from the audio at each onset. Use any of the features we have learned so far: zero crossing rate, spectral moments, MFCCs, chroma, etc. For more, see the librosa API reference.

First, define which features to extract:

In [ ]:
def extract_features(x, fs):
    feature_1 = librosa.zero_crossings(x).sum() # placeholder
    feature_2 = 0 # placeholder
    return [feature_1, feature_2]

For each onset, extract a feature vector from the signal:

In [ ]:
# Assumptions:
# x: input audio signal
# fs: sampling frequency
# onset_samples: onsets in units of samples
frame_sz = fs*0.100
features = numpy.array([extract_features(x[i:i+frame_sz], fs) for i in onset_samples])

Scale Features

Use sklearn.preprocessing.MinMaxScaler to scale your features to be within [-1, 1].

In [ ]:
sklearn.preprocessing.MinMaxScaler?
In [ ]:
sklearn.preprocessing.MinMaxScaler.fit_transform?

Plot Features

Use scatter to plot features on a 2-D plane. (Choose two features at a time.)

In [ ]:
plt.scatter?

Cluster Using K-Means

Use KMeans to cluster your features and compute labels.

In [ ]:
sklearn.cluster.KMeans?
In [ ]:
sklearn.cluster.KMeans.fit_predict?

Plot Features by Class Label

Use scatter, but this time choose a different marker color (or type) for each class.

In [ ]:
plt.scatter?

Listen to Click Track

Create a beep for each onset within a class:

In [ ]:
beeps = mir_eval.sonify.clicks(onset_times[labels==0], fs, length=len(x))
In [ ]:
IPython.display.Audio?

Listen to Clustered Frames

Use the concatenated_segments function from the feature sonification exercise to concatenate frames from the same cluster into one signal. Then listen to the signal.

In [ ]:
def concatenate_segments(segments, fs=44100, pad_time=0.300):
    padded_segments = [numpy.concatenate([segment, numpy.zeros(int(pad_time*fs))]) for segment in segments]
    return numpy.concatenate(padded_segments)
concatenated_signal = concatenate_segments(segments, fs)

Compare across separate classes. What do you hear?

For Further Exploration

Use a different number of clusters in KMeans.

Use a different initialization method in KMeans.

Use different features. Compare tonal features against timbral features.

In [ ]:
librosa.feature?

Use different audio files.

In [ ]:
#filename = '1_bar_funk_groove.mp3'
#filename = '58bpm.wav'
#filename = '125_bounce.wav'
#filename = 'prelude_cmaj_10s.wav'