%matplotlib inline
import seaborn
import numpy, scipy, matplotlib.pyplot as plt, IPython.display as ipd
import librosa, librosa.display
plt.rcParams['figure.figsize'] = (13, 5)

Onset-based Segmentation with BacktrackingÂ¶

librosa.onset.onset_detect works by finding peaks in a spectral novelty function. However, these peaks may not actually coincide with the initial rise in energy or how we perceive the beginning of a musical note.

The optional keyword parameter backtrack=True will backtrack from each peak to a preceding local minimum. Backtracking can be useful for finding segmentation points such that the onset occurs shortly after the beginning of the segment. We will use backtrack=True to perform onset-based segmentation of a signal.

Load an audio file into the NumPy array x and sampling rate sr.

x, sr = librosa.load('audio/classic_rock_beat.wav')
print x.shape, sr

(151521,) 22050

Listen:

ipd.Audio(x, rate=sr)

Compute the frame indices for estimated onsets in a signal:

hop_length = 512
onset_frames = librosa.onset.onset_detect(x, sr=sr, hop_length=hop_length)
print onset_frames # frame numbers of estimated onsets

[ 20  29  38  57  66  75  84  93 103 112 121 131 140 149 158 167 176 185
 196 204 213 232 241 250 260 269 278 288]

Convert onsets to units of seconds:

onset_times = librosa.frames_to_time(onset_frames, sr=sr, hop_length=hop_length)
print onset_times

[ 0.46439909  0.67337868  0.88235828  1.32353741  1.53251701  1.7414966
  1.95047619  2.15945578  2.39165533  2.60063492  2.80961451  3.04181406
  3.25079365  3.45977324  3.66875283  3.87773243  4.08671202  4.29569161
  4.55111111  4.73687075  4.94585034  5.38702948  5.59600907  5.80498866
  6.03718821  6.2461678   6.45514739  6.68734694]

Convert onsets to units of samples:

onset_samples = librosa.frames_to_samples(onset_frames, hop_length=hop_length)
print onset_samples

[ 10240  14848  19456  29184  33792  38400  43008  47616  52736  57344
  61952  67072  71680  76288  80896  85504  90112  94720 100352 104448
 109056 118784 123392 128000 133120 137728 142336 147456]

Plot the onsets on top of a spectrogram of the audio:

S = librosa.stft(x)
logS = librosa.logamplitude(S)
librosa.display.specshow(logS, sr=sr, x_axis='time', y_axis='log')
plt.vlines(onset_times, 0, 10000, color='k')

<matplotlib.collections.LineCollection at 0x1150e5490>

As we see in the spectrogram, the detected onsets seem to occur a bit before the actual rise in energy.

Let's listen to these segments. We will create a function to do the following:

Divide the signal into segments beginning at each detected onset.
Pad each segment with 500 ms of silence.
Concatenate the padded segments.

def concatenate_segments(x, onset_samples, pad_duration=0.500):
    """Concatenate segments into one signal."""
    silence = numpy.zeros(int(pad_duration*sr)) # silence
    frame_sz = min(numpy.diff(onset_samples))   # every segment has uniform frame size
    return numpy.concatenate([
        numpy.concatenate([x[i:i+frame_sz], silence]) # pad segment with silence
        for i in onset_samples
    ])

Concatenate the segments:

concatenated_signal = concatenate_segments(x, onset_samples, 0.500)

Listen to the concatenated signal:

ipd.Audio(concatenated_signal, rate=sr)

As we hear, the little glitch between segments occurs because the segment boundaries occur during the attack, not before the attack.

`librosa.onset.onset_backtrack`Â¶

We can avoid this glitch by backtracking from the detected onsets.

When setting the parameter backtrack=True, librosa.onset.onset_detect will call librosa.onset.onset_backtrack. For each detected onset, librosa.onset.onset_backtrack searches backward for a local minimum.

onset_frames = librosa.onset.onset_detect(x, sr=sr, hop_length=hop_length, backtrack=True)

Convert onsets to units of seconds:

onset_times = librosa.frames_to_time(onset_frames, sr=sr, hop_length=hop_length)

Convert onsets to units of samples:

onset_samples = librosa.frames_to_samples(onset_frames, hop_length=hop_length)

Plot the onsets on top of a spectrogram of the audio:

S = librosa.stft(x)
logS = librosa.logamplitude(S)
librosa.display.specshow(logS, sr=sr, x_axis='time', y_axis='log')
plt.vlines(onset_times, 0, 10000, color='k')

<matplotlib.collections.LineCollection at 0x114cef190>

Notice how the vertical lines denoting each segment boundary appears before each rise in energy.

Concatenate the segments:

concatenated_signal = concatenate_segments(x, onset_samples, 0.500)

Listen to the concatenated signal:

ipd.Audio(concatenated_signal, rate=sr)

While listening, notice now the segments are perfectly segmented.

QuestionsÂ¶

Try with other audio files:

ls audio

125_bounce.wav         classic_rock_beat.wav  oboe_c6.wav
58bpm.wav              conga_groove.wav       prelude_cmaj.wav
beatbox_steve.wav      funk_groove.mp3        simple_loop.wav
c_strum.wav            jangle_pop.mp3         simple_piano.wav
clarinet_c6.wav        latin_groove.mp3       tone_440.wav

← Back to Index

Onset-based Segmentation with BacktrackingÂ¶

librosa.onset.onset_backtrackÂ¶

QuestionsÂ¶

`librosa.onset.onset_backtrack`Â¶