In [1]:
%matplotlib inline
import seaborn
import numpy, scipy, matplotlib.pyplot as plt, IPython.display as ipd
import librosa, librosa.display
plt.rcParams['figure.figsize'] = (13, 5)

Tempo Estimation¶

Tempo (Wikipedia) refers to the speed of a musical piece. More precisely, tempo refers to the rate of the musical beat and is given by the reciprocal of the beat period. Tempo is often defined in units of beats per minute (BPM).

In classical music, common tempo markings include grave, largo, lento, adagio, andante, moderato, allegro, vivace, and presto. See Basic tempo markings for more.

Tempogram¶

Tempo can vary locally within a piece. Therefore, we introduce the tempogram (FMP, p. 317) as a feature matrix which indicates the prevalence of certain tempi at each moment in time.

Fourier Tempogram¶

The Fourier Tempogram (FMP, p. 319) is basically the magnitude spectrogram of the novelty function.

Load an audio file:

In [2]:
x, sr = librosa.load('audio/58bpm.wav')
ipd.Audio(x, rate=sr)
Out[2]:

The tempo of this excerpt is about 58/116 BPM.

Compute the onset envelope, i.e. novelty function:

In [3]:
hop_length = 200 # samples per frame
onset_env = librosa.onset.onset_strength(x, sr=sr, hop_length=hop_length, n_fft=2048)

Plot the onset envelope:

In [4]:
frames = range(len(onset_env))
t = librosa.frames_to_time(frames, sr=sr, hop_length=hop_length)
In [5]:
plt.plot(t, onset_env)
plt.xlim(0, t.max())
plt.ylim(0)
plt.xlabel('Time (sec)')
plt.title('Novelty Function')
Out[5]:
<matplotlib.text.Text at 0x11cc19d90>

Compute the short-time Fourier transform (STFT) of the novelty function. Since the novelty function is computed in frame increments, the hop length of this STFT should be pretty small:

In [6]:
S = librosa.stft(onset_env, hop_length=1, n_fft=512)
fourier_tempogram = numpy.absolute(S)

Plot the Fourier tempogram:

In [7]:
librosa.display.specshow(fourier_tempogram, sr=sr, hop_length=hop_length, x_axis='time')
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x11cc0a910>

Autocorrelation Tempogram¶

Consider a segment from the above novelty function:

In [8]:
n0 = 100
n1 = 500
plt.plot(t[n0:n1], onset_env[n0:n1])
plt.xlim(t[n0], t[n1])
plt.xlabel('Time (sec)')
plt.title('Novelty Function')
Out[8]:
<matplotlib.text.Text at 0x120e9fb10>

Plot the autocorrelation of this segment:

In [9]:
tmp = numpy.log1p(onset_env[n0:n1])
r = librosa.autocorrelate(tmp)
In [10]:
plt.plot(t[:n1-n0], r)
plt.xlim(t[0], t[n1-n0])
plt.xlabel('Lag (sec)')
plt.ylim(0)
Out[10]:
(0, 147.3261039663505)

Wherever the autocorrelation is high is a good candidate of the beat period.

In [11]:
plt.plot(60/t[:n1-n0], r)
plt.xlim(20, 200)
plt.xlabel('Tempo (BPM)')
plt.ylim(0)
/Users/steve/miniconda2/lib/python2.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in divide
  """Entry point for launching an IPython kernel.
Out[11]:
(0, 132.57569384938006)

We will apply this principle of autocorrelation to estimate the tempo at every segment in the novelty function.

librosa.feature.tempogram implements an autocorrelation tempogram, a short-time autocorrelation of the (spectral) novelty function.

For more information:

Compute a tempogram:

In [12]:
tempogram = librosa.feature.tempogram(onset_envelope=onset_env, sr=sr, hop_length=hop_length, win_length=400)
In [13]:
librosa.display.specshow(tempogram, sr=sr, hop_length=hop_length, x_axis='time', y_axis='tempo')
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x12111c350>

Estimating Global Tempo¶

We will use librosa.beat.tempo to estimate the global tempo in an audio file.

Estimate the tempo:

In [14]:
tempo = librosa.beat.tempo(x, sr=sr)
print tempo
[ 117.45383523]

Visualize the tempo estimate on top of the input signal:

In [15]:
T = len(x)/float(sr)
seconds_per_beat = 60.0/tempo[0]
beat_times = numpy.arange(0, T, seconds_per_beat)
In [16]:
librosa.display.waveplot(x)
plt.vlines(beat_times, -1, 1, color='r')
Out[16]:
<matplotlib.collections.LineCollection at 0x121282710>

Listen to the input signal with a click track using the tempo estimate:

In [17]:
clicks = librosa.clicks(beat_times, sr, length=len(x))
ipd.Audio(x + clicks, rate=sr)
Out[17]: