In [1]:
%matplotlib inline
import seaborn
import numpy, scipy, matplotlib.pyplot as plt, IPython.display as ipd
import librosa, librosa.display
plt.rcParams['figure.figsize'] = (14, 5)

Energy and RMSE¶

The energy (Wikipedia; FMP, p. 66) of a signal corresponds to the total magntiude of the signal. For audio signals, that roughly corresponds to how loud the signal is. The energy in a signal is defined as

$$ \sum_n \left| x(n) \right|^2 $$

The root-mean-square energy (RMSE) in a signal is defined as

$$ \sqrt{ \frac{1}{N} \sum_n \left| x(n) \right|^2 } $$

Let's load a signal:

In [2]:
x, sr = librosa.load('audio/simple_loop.wav')

Listen to the signal:

In [3]:
ipd.Audio(x, rate=sr)
Out[3]:

Plot the signal:

In [4]:
librosa.display.waveplot(x, sr=sr)
Out[4]:
<matplotlib.collections.PolyCollection at 0x11d679e50>

Compute the short-time energy using a list comprehension:

In [5]:
hop_length = 256
frame_length = 1024
In [6]:
energy = numpy.array([
    sum(abs(x[i:i+frame_length]**2))
    for i in range(0, len(x), hop_length)
])
In [7]:
energy.shape
Out[7]:
(194,)

Compute the RMSE using librosa.feature.rmse:

In [8]:
rmse = librosa.feature.rmse(x, frame_length=frame_length, hop_length=hop_length)[0]
In [9]:
rmse.shape
Out[9]:
(190,)

Plot both the energy and RMSE along with the waveform:

In [10]:
frames = range(len(energy))
t = librosa.frames_to_time(frames, sr=sr, hop_length=hop_length)
In [11]:
librosa.display.waveplot(x, sr=sr, alpha=0.4)
plt.plot(t, energy/energy.max(), 'r--')             # normalized for visualization
plt.plot(t[:len(rmse)], rmse/rmse.max(), color='g') # normalized for visualization
plt.legend(('Energy', 'RMSE'))
Out[11]:
<matplotlib.legend.Legend at 0x10fbd8ad0>

Questions¶

Write a function, strip, that removes leading and trailing silence from a signal. Make sure it works for a variety of signals recorded in different environments and with different signal-to-noise ratios (SNR).