Unsupervised Instrument Classification Using K-Means#
import librosa
import mir_eval
import numpy
from mirdotcom import mirdotcom
mirdotcom.init()
This tutorial is loosely based on Lab 3 (2010).
Read Audio#
Retrieve an audio file, load it into an array, and listen to it.
?urllib.urlretrieve
?librosa.load
?IPython.display.Audio
Detect Onsets#
Detect onsets in the audio signal:
?librosa.onset.onset_detect
Convert the onsets from units of frames to seconds (and samples):
?librosa.frames_to_time
?librosa.frames_to_samples
Listen to detected onsets:
?mir_eval.sonify.clicks
?IPython.display.Audio
Extract Features#
Extract a set of features from the audio at each onset. Use any of the features we have learned so far: zero crossing rate, spectral moments, MFCCs, chroma, etc. For more, see the librosa API reference.
First, define which features to extract:
def extract_features(x, fs):
feature_1 = librosa.zero_crossings(x).sum() # placeholder
feature_2 = 0 # placeholder
return [feature_1, feature_2]
For each onset, extract a feature vector from the signal:
# Assumptions:
# x: input audio signal
# fs: sampling frequency
# onset_samples: onsets in units of samples
frame_sz = fs * 0.100
features = numpy.array(
[extract_features(x[i : i + frame_sz], fs) for i in onset_samples]
)
Scale Features#
Use sklearn.preprocessing.MinMaxScaler to scale your features to be within [-1, 1].
?sklearn.preprocessing.MinMaxScaler
?sklearn.preprocessing.MinMaxScaler.fit_transform
Plot Features#
Use scatter to plot features on a 2-D plane. (Choose two features at a time.)
?plt.scatter
Cluster Using K-Means#
Use KMeans to cluster your features and compute labels.
?sklearn.cluster.KMeans
?sklearn.cluster.KMeans.fit_predict
Plot Features by Class Label#
Use scatter, but this time choose a different marker color (or type) for each class.
?plt.scatter
Listen to Click Track#
Create a beep for each onset within a class:
beeps = mir_eval.sonify.clicks(onset_times[labels == 0], fs, length=len(x))
?IPython.display.Audio
Listen to Clustered Frames#
Use the concatenated_segments function from the feature sonification exercise to concatenate frames from the same cluster into one signal. Then listen to the signal.
def concatenate_segments(segments, fs=44100, pad_time=0.300):
padded_segments = [
numpy.concatenate([segment, numpy.zeros(int(pad_time * fs))])
for segment in segments
]
return numpy.concatenate(padded_segments)
concatenated_signal = concatenate_segments(segments, fs)
Compare across separate classes. What do you hear?
For Further Exploration#
Use a different number of clusters in KMeans.
Use a different initialization method in KMeans.
Use different features. Compare tonal features against timbral features.
?librosa.feature
Use different audio files.
# filename = '1_bar_funk_groove.mp3'
# filename = '58bpm.wav'
# filename = '125_bounce.wav'
# filename = 'prelude_cmaj_10s.wav'