%matplotlib inline
import matplotlib.pyplot as plt, IPython.display as ipd, numpy
import librosa, librosa.display
import stanford_mir; stanford_mir.init()
Peak picking is the act of locating peaks in a signal. For example, in onset detection, we may want to find peaks in a novelty function. These peaks would correspond to the musical onsets.
Let's load an example audio file.
x, sr = librosa.load('audio/58bpm.wav')
print(x.shape, sr)
Listen to the audio file:
ipd.Audio(x, rate=sr)
Plot the signal:
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)
Compute an onset envelope:
hop_length = 256
onset_envelope = librosa.onset.onset_strength(x, sr=sr, hop_length=hop_length)
onset_envelope.shape
Generate a time variable:
N = len(x)
T = N/float(sr)
t = numpy.linspace(0, T, len(onset_envelope))
Plot the onset envelope:
plt.figure(figsize=(14, 5))
plt.plot(t, onset_envelope)
plt.xlabel('Time (sec)')
plt.xlim(xmin=0)
plt.ylim(0)
In this onset strength envelope, we clearly see many peaks. Some correspond to onsets, and others don't. How do we create peak picker that will detect true peaks while avoiding unwanted spurious peaks?
librosa.util
has a peak_pick
method. We can control the parameters based upon our signal. Let's see how it works:
def peak_pick(x, pre_max, post_max, pre_avg, post_avg, delta, wait):
'''Uses a flexible heuristic to pick peaks in a signal.
A sample n is selected as a peak if the corresponding x[n]
fulfills the following three conditions:
1. `x[n] == max(x[n - pre_max:n + post_max])`
2. `x[n] >= mean(x[n - pre_avg:n + post_avg]) + delta`
3. `n - previous_n > wait`
where `previous_n` is the last sample picked as a peak (greedily).
Get the frame indices of the peaks:
onset_frames = librosa.util.peak_pick(onset_envelope, 7, 7, 7, 7, 0.5, 5)
onset_frames
Plot the onset envelope along with the detected peaks:
plt.figure(figsize=(14, 5))
plt.plot(t, onset_envelope)
plt.grid(False)
plt.vlines(t[onset_frames], 0, onset_envelope.max(), color='r', alpha=0.7)
plt.xlabel('Time (sec)')
plt.xlim(0, T)
plt.ylim(0)
Superimpose a click track upon the original:
clicks = librosa.clicks(frames=onset_frames, sr=22050, hop_length=hop_length, length=N)
ipd.Audio(x+clicks, rate=sr)
Using the parameters above, we find that the peak picking algorithm seems to have high precision, e.g. few false positives. However, recall can be improved, i.e. it is missing several onsets that actually occur in the audio signal.
Adjust the hop length from 512 to 256 or 1024. How does that affect the onset envelope, and consequently, the peak picking?
Adjust the peak_pick
parameters, pre_max
, post_max
, pre_avg
, post_avg
, delta
, and wait
. How do the detected peaks change?
Try this notebook again on other audio files:
ls audio