K-Means Clustering#
import IPython.display
import matplotlib.pyplot as plt
import librosa
import mir_eval
import numpy
import scipy
import sklearn
from mirdotcom import mirdotcom
mirdotcom.init()
Sometimes, an unsupervised learning technique is preferred. Perhaps you do not have access to adequate training data, or perhaps the training data’s labels are not completely clear. Maybe you just want to quickly sort real-world, unseen, data into groups based on its feature similarity.
In such cases, clustering is a great option!
Play the audio file:
filename = mirdotcom.get_audio("125_bounce.wav")
IPython.display.Audio(filename)
Load the audio file into an array.
x, fs = librosa.load(filename)
print(fs)
22050
Plot audio signal:
librosa.display.waveshow(x, sr=fs)
plt.ylabel("Autocorrelation")
Text(22.472222222222214, 0.5, 'Autocorrelation')
Onset Detection#
Detect onsets:
onset_frames = librosa.onset.onset_detect(y=x, sr=fs, delta=0.04, wait=4)
onset_times = librosa.frames_to_time(onset_frames, sr=fs)
onset_samples = librosa.frames_to_samples(onset_frames)
Listen to detected onsets:
x_with_beeps = mir_eval.sonify.clicks(onset_times, fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Feature Extraction#
Let’s compute the zero crossing rate and energy for each detected onset.
Plot the zero crossing rate:
def extract_features(x, fs):
zcr = librosa.zero_crossings(x).sum()
energy = scipy.linalg.norm(x)
return [zcr, energy]
frame_sz = fs * 0.090
features = numpy.array(
[extract_features(x[i : int(i + frame_sz)], fs) for i in onset_samples]
)
print(features.shape)
(37, 2)
Feature Scaling#
Scale the features (using the scale function) from -1 to 1.
min_max_scaler = sklearn.preprocessing.MinMaxScaler(feature_range=(-1, 1))
features_scaled = min_max_scaler.fit_transform(features)
print(features_scaled.shape)
print(features_scaled.min(axis=0))
print(features_scaled.max(axis=0))
(37, 2)
[-1. -1.]
[1. 1.]
Plot the features.
plt.scatter(features_scaled[:, 0], features_scaled[:, 1])
plt.xlabel("Zero Crossing Rate (scaled)")
plt.ylabel("Spectral Centroid (scaled)")
Text(0, 0.5, 'Spectral Centroid (scaled)')
Using K-Means#
Time to cluster! Let’s initialize the algorithm to find three clusters.
model = sklearn.cluster.KMeans(n_clusters=2)
labels = model.fit_predict(features_scaled)
print(labels)
[0 0 0 1 0 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 0 1 0 1 1 0 0]
Plot the results.
plt.scatter(features_scaled[labels == 0, 0], features_scaled[labels == 0, 1], c="b")
plt.scatter(features_scaled[labels == 1, 0], features_scaled[labels == 1, 1], c="r")
plt.xlabel("Zero Crossing Rate (scaled)")
plt.ylabel("Energy (scaled)")
plt.legend(("Class 0", "Class 1"))
<matplotlib.legend.Legend at 0x7dcf427fb010>
Listen to onsets assigned to Class 0:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels == 0], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Class 1:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels == 1], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Affinity Propagation#
In scikit-learn, other clustering algorithms such as affinity propagation can cluster without defining the number of clusters beforehand.
All we need to do is swap out KMeans for AffinityPropagation:
model = sklearn.cluster.AffinityPropagation()
labels = model.fit_predict(features_scaled)
print(labels)
[0 0 0 3 2 3 0 1 0 2 3 0 1 3 2 3 0 1 0 2 3 0 2 3 2 3 0 1 0 3 0 3 2 3 3 0 0]
Plot features:
plt.scatter(features_scaled[labels == 0, 0], features_scaled[labels == 0, 1], c="b")
plt.scatter(features_scaled[labels == 1, 0], features_scaled[labels == 1, 1], c="r")
plt.scatter(features_scaled[labels == 2, 0], features_scaled[labels == 2, 1], c="y")
plt.xlabel("Zero Crossing Rate (scaled)")
plt.ylabel("Energy (scaled)")
plt.legend(("Class 0", "Class 1", "Class 2"))
<matplotlib.legend.Legend at 0x7dcfd43c2aa0>
Play a beep upon each frame in the same cluster:
Class 0:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels == 0], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Class 1:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels == 1], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)
Class 2:
x_with_beeps = mir_eval.sonify.clicks(onset_times[labels == 2], fs, length=len(x))
IPython.display.Audio(x + x_with_beeps, rate=fs)