%matplotlib inline
import seaborn
import numpy, scipy, matplotlib.pyplot as plt, IPython.display as ipd
import librosa, librosa.display
plt.rcParams['figure.figsize'] = (13, 5)
librosa.onset.onset_detect
works by finding peaks in a spectral novelty function. However, these peaks may not actually coincide with the initial rise in energy or how we perceive the beginning of a musical note.
The optional keyword parameter backtrack=True
will backtrack from each peak to a preceding local minimum. Backtracking can be useful for finding segmentation points such that the onset occurs shortly after the beginning of the segment. We will use backtrack=True
to perform onset-based segmentation of a signal.
Load an audio file into the NumPy array x
and sampling rate sr
.
x, sr = librosa.load('audio/classic_rock_beat.wav')
print x.shape, sr
Listen:
ipd.Audio(x, rate=sr)
Compute the frame indices for estimated onsets in a signal:
hop_length = 512
onset_frames = librosa.onset.onset_detect(x, sr=sr, hop_length=hop_length)
print onset_frames # frame numbers of estimated onsets
Convert onsets to units of seconds:
onset_times = librosa.frames_to_time(onset_frames, sr=sr, hop_length=hop_length)
print onset_times
Convert onsets to units of samples:
onset_samples = librosa.frames_to_samples(onset_frames, hop_length=hop_length)
print onset_samples
Plot the onsets on top of a spectrogram of the audio:
S = librosa.stft(x)
logS = librosa.logamplitude(S)
librosa.display.specshow(logS, sr=sr, x_axis='time', y_axis='log')
plt.vlines(onset_times, 0, 10000, color='k')
As we see in the spectrogram, the detected onsets seem to occur a bit before the actual rise in energy.
Let's listen to these segments. We will create a function to do the following:
def concatenate_segments(x, onset_samples, pad_duration=0.500):
"""Concatenate segments into one signal."""
silence = numpy.zeros(int(pad_duration*sr)) # silence
frame_sz = min(numpy.diff(onset_samples)) # every segment has uniform frame size
return numpy.concatenate([
numpy.concatenate([x[i:i+frame_sz], silence]) # pad segment with silence
for i in onset_samples
])
Concatenate the segments:
concatenated_signal = concatenate_segments(x, onset_samples, 0.500)
Listen to the concatenated signal:
ipd.Audio(concatenated_signal, rate=sr)
As we hear, the little glitch between segments occurs because the segment boundaries occur during the attack, not before the attack.
librosa.onset.onset_backtrack
¶We can avoid this glitch by backtracking from the detected onsets.
When setting the parameter backtrack=True
, librosa.onset.onset_detect
will call librosa.onset.onset_backtrack
.
For each detected onset, librosa.onset.onset_backtrack
searches backward for a local minimum.
onset_frames = librosa.onset.onset_detect(x, sr=sr, hop_length=hop_length, backtrack=True)
Convert onsets to units of seconds:
onset_times = librosa.frames_to_time(onset_frames, sr=sr, hop_length=hop_length)
Convert onsets to units of samples:
onset_samples = librosa.frames_to_samples(onset_frames, hop_length=hop_length)
Plot the onsets on top of a spectrogram of the audio:
S = librosa.stft(x)
logS = librosa.logamplitude(S)
librosa.display.specshow(logS, sr=sr, x_axis='time', y_axis='log')
plt.vlines(onset_times, 0, 10000, color='k')
Notice how the vertical lines denoting each segment boundary appears before each rise in energy.
Concatenate the segments:
concatenated_signal = concatenate_segments(x, onset_samples, 0.500)
Listen to the concatenated signal:
ipd.Audio(concatenated_signal, rate=sr)
While listening, notice now the segments are perfectly segmented.
Try with other audio files:
ls audio