import numpy, scipy, matplotlib.pyplot as plt, sklearn, librosa, mir_eval, IPython.display, urllib

Unsupervised Instrument Classification Using K-Means¶

This lab is loosely based on Lab 3 (2010).

Read Audio¶

Retrieve an audio file, load it into an array, and listen to it.

urllib.urlretrieve?

librosa.load?

IPython.display.Audio?

Detect Onsets¶

Detect onsets in the audio signal:

librosa.onset.onset_detect?

Convert the onsets from units of frames to seconds (and samples):

librosa.frames_to_time?

librosa.frames_to_samples?

Listen to detected onsets:

mir_eval.sonify.clicks?

IPython.display.Audio?

Extract Features¶

Extract a set of features from the audio at each onset. Use any of the features we have learned so far: zero crossing rate, spectral moments, MFCCs, chroma, etc. For more, see the librosa API reference.

First, define which features to extract:

def extract_features(x, fs):
    feature_1 = librosa.zero_crossings(x).sum() # placeholder
    feature_2 = 0 # placeholder
    return [feature_1, feature_2]

For each onset, extract a feature vector from the signal:

# Assumptions:
# x: input audio signal
# fs: sampling frequency
# onset_samples: onsets in units of samples
frame_sz = fs*0.100
features = numpy.array([extract_features(x[i:i+frame_sz], fs) for i in onset_samples])

Scale Features¶

Use sklearn.preprocessing.MinMaxScaler to scale your features to be within [-1, 1].

sklearn.preprocessing.MinMaxScaler?

sklearn.preprocessing.MinMaxScaler.fit_transform?

Plot Features¶

Use scatter to plot features on a 2-D plane. (Choose two features at a time.)

plt.scatter?

Cluster Using K-Means¶

Use KMeans to cluster your features and compute labels.

sklearn.cluster.KMeans?

sklearn.cluster.KMeans.fit_predict?

Plot Features by Class Label¶

Use scatter, but this time choose a different marker color (or type) for each class.

plt.scatter?

Listen to Click Track¶

Create a beep for each onset within a class:

beeps = mir_eval.sonify.clicks(onset_times[labels==0], fs, length=len(x))

IPython.display.Audio?

Listen to Clustered Frames¶

Use the concatenated_segments function from the feature sonification exercise to concatenate frames from the same cluster into one signal. Then listen to the signal.

def concatenate_segments(segments, fs=44100, pad_time=0.300):
    padded_segments = [numpy.concatenate([segment, numpy.zeros(int(pad_time*fs))]) for segment in segments]
    return numpy.concatenate(padded_segments)
concatenated_signal = concatenate_segments(segments, fs)

Compare across separate classes. What do you hear?

For Further Exploration¶

Use a different number of clusters in KMeans.

Use a different initialization method in KMeans.

Use different features. Compare tonal features against timbral features.

librosa.feature?

Use different audio files.

#filename = '1_bar_funk_groove.mp3'
#filename = '58bpm.wav'
#filename = '125_bounce.wav'
#filename = 'prelude_cmaj_10s.wav'

← Back to Index