Onset Detection

Automatic detection of musical events in an audio signal is one of the most fundamental tasks in music information retrieval. Here, we will show how to detect an onset, the very instant that marks the beginning of the transient part of a sound, or the earliest moment at which a transient can be reliably detected.

For more reading:

Load an audio file into the NumPy array x and sampling rate sr.

In [3]:
x, sr = librosa.load('audio/classic_rock_beat.wav')
Out[4]:
Out[5]:
<matplotlib.collections.PolyCollection at 0x11365c358>

librosa.onset.onset_detect

librosa.onset.onset_detect works in the following way:

  1. Compute a spectral novelty function.
  2. Find peaks in the spectral novelty function.
  3. [optional] Backtrack from each peak to a preceding local minimum. Backtracking can be useful for finding segmentation points such that the onset occurs shortly after the beginning of the segment.

Compute the frame indices for estimated onsets in a signal:

In [6]:
onset_frames = librosa.onset.onset_detect(x, sr=sr, wait=1, pre_avg=1, post_avg=1, pre_max=1, post_max=1)
print(onset_frames) # frame numbers of estimated onsets
[ 20  29  38  57  65  75  84  93 103 112 121 131 140 148 158 167 176 185
 213 232 241 250 260 268 278 288]

Convert onsets to units of seconds:

In [7]:
onset_times = librosa.frames_to_time(onset_frames)
print(onset_times)
[0.46439909 0.67337868 0.88235828 1.32353741 1.50929705 1.7414966
 1.95047619 2.15945578 2.39165533 2.60063492 2.80961451 3.04181406
 3.25079365 3.43655329 3.66875283 3.87773243 4.08671202 4.29569161
 4.94585034 5.38702948 5.59600907 5.80498866 6.03718821 6.22294785
 6.45514739 6.68734694]

Plot the onsets on top of a spectrogram of the audio:

In [8]:
S = librosa.stft(x)
logS = librosa.amplitude_to_db(abs(S))
Out[9]:
<matplotlib.collections.LineCollection at 0x1133c26a0>

Let's also plot the onsets with the time-domain waveform.

Out[10]:
<matplotlib.collections.LineCollection at 0x113349a58>

librosa.clicks

We can add a click at the location of each detected onset.

In [11]:
clicks = librosa.clicks(frames=onset_frames, sr=sr, length=len(x))

Listen to the original audio plus the detected onsets. One way is to add the signals together, sample-wise:

In [12]:
ipd.Audio(x + clicks, rate=sr)
Out[12]:

Another method is to play the original track in one stereo channel and the click track in the other stereo channel:

In [13]:
ipd.Audio(numpy.vstack([x, clicks]), rate=sr)
Out[13]:

You can also change the click to a custom audio file instead:

In [14]:
cowbell, _ = librosa.load('audio/cowbell.wav')
Out[15]:

More cowbell?

In [16]:
clicks = librosa.clicks(frames=onset_frames, sr=sr, length=len(x), click=cowbell)
In [17]:
ipd.Audio(x + clicks, rate=sr)
Out[17]:
Out[18]:

Questions

In librosa.onset.onset_detect, use the backtrack=True parameter. What does that do, and how does it affect the detected onsets? (See librosa.onset.onset_backtrack.)

In librosa.onset.onset_detect, you can use the keyword parameters found in librosa.util.peak_pick, e.g. pre_max, post_max, pre_avg, post_avg, delta, and wait, to control the peak picking algorithm. Adjust these parameters. How does it affect the detected onsets?