In [1]:
import numpy, scipy, matplotlib.pyplot as plt, sklearn, librosa, mir_eval, IPython.display, urllib
plt.rcParams['figure.figsize'] = (14, 4)

K-Means Clustering

Sometimes, an unsupervised learning technique is preferred. Perhaps you do not have access to adequate training data, or perhaps the training data's labels are not completely clear. Maybe you just want to quickly sort real-world, unseen, data into groups based on its feature similarity.

In such cases, clustering is a great option!

Download an audio file:

In [2]:
filename = '125_bounce.wav'
urllib.urlretrieve('http://audio.musicinformationretrieval.com/' + filename,
                   filename=filename)
Out[2]:
('125_bounce.wav', <httplib.HTTPMessage instance at 0x1138a6c20>)

Play the audio file:

In [3]:
IPython.display.Audio(filename)
Out[3]:

Load the audio file into an array.

In [4]:
x, fs = librosa.load(filename)
print fs
22050

Plot audio signal:

In [5]:
librosa.display.waveplot(x, fs)
Out[5]:
<matplotlib.collections.PolyCollection at 0x115623690>

Onset Detection

Detect onsets:

In [6]:
onset_frames = librosa.onset.onset_detect(x, sr=fs, delta=0.04, wait=4)
onset_times = librosa.frames_to_time(onset_frames, sr=fs)
onset_samples = librosa.frames_to_samples(onset_frames)

Listen to detected onsets:

In [7]:
x_with_beeps = mir_eval.sonify.clicks(onset_times, fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Out[7]:

Feature Extraction

Let's compute the zero crossing rate and energy for each detected onset.

Plot the zero crossing rate:

In [8]:
def extract_features(x, fs):
    zcr = librosa.zero_crossings(x).sum()
    energy = scipy.linalg.norm(x)
    return [zcr, energy]
In [9]:
frame_sz = fs*0.090
features = numpy.array([extract_features(x[i:i+frame_sz], fs) for i in onset_samples])
print features.shape
(36, 2)

Feature Scaling

Scale the features (using the scale function) from -1 to 1.

In [10]:
min_max_scaler = sklearn.preprocessing.MinMaxScaler(feature_range=(-1, 1))
features_scaled = min_max_scaler.fit_transform(features)
print features_scaled.shape
print features_scaled.min(axis=0)
print features_scaled.max(axis=0)
(36, 2)
[-1. -1.]
[ 1.  1.]

Plot the features.

In [11]:
plt.scatter(features_scaled[:,0], features_scaled[:,1])
plt.xlabel('Zero Crossing Rate (scaled)')
plt.ylabel('Spectral Centroid (scaled)')
Out[11]:
<matplotlib.text.Text at 0x115603410>

Using K-Means

Time to cluster! Let's initialize the algorithm to find three clusters.

In [12]:
model = sklearn.cluster.KMeans(n_clusters=2)
labels = model.fit_predict(features_scaled)
print labels
[0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 1 1 0 0]

Plot the results.

In [13]:
plt.scatter(features_scaled[labels==0,0], features_scaled[labels==0,1], c='b')
plt.scatter(features_scaled[labels==1,0], features_scaled[labels==1,1], c='r')
plt.xlabel('Zero Crossing Rate (scaled)')
plt.ylabel('Energy (scaled)')
plt.legend(('Class 0', 'Class 1'))
Out[13]:
<matplotlib.legend.Legend at 0x115c3ded0>

Listen to onsets assigned to Class 0:

In [14]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==0], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Out[14]:

Class 1:

In [15]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==1], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Out[15]:

Affinity Propagation

In scikit-learn, other clustering algorithms such as affinity propagation can cluster without defining the number of clusters beforehand.

All we need to do is swap out KMeans for AffinityPropagation:

In [16]:
model = sklearn.cluster.AffinityPropagation()
labels = model.fit_predict(features_scaled)
print labels
[0 0 2 1 2 0 1 0 2 2 0 1 2 1 2 0 1 0 2 2 0 1 2 1 2 0 1 0 2 0 2 1 2 2 0 0]

Plot features:

In [17]:
plt.scatter(features_scaled[labels==0,0], features_scaled[labels==0,1], c='b')
plt.scatter(features_scaled[labels==1,0], features_scaled[labels==1,1], c='r')
plt.scatter(features_scaled[labels==2,0], features_scaled[labels==2,1], c='y')
plt.xlabel('Zero Crossing Rate (scaled)')
plt.ylabel('Energy (scaled)')
plt.legend(('Class 0', 'Class 1', 'Class 2'))
Out[17]:
<matplotlib.legend.Legend at 0x115c81590>

Play a beep upon each frame in the same cluster:

Class 0:

In [18]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==0], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Out[18]:

Class 1:

In [19]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==1], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Out[19]:

Class 2:

In [20]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels==2], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Out[20]: