Understanding Audio Features through Sonification#

In this exercise notebook, we will segment, feature extract, and analyze audio files. Goals:

  1. Detect onsets in an audio signal.

  2. Segment the audio signal at each onset.

  3. Compute features for each segment.

  4. Gain intuition into the features by listening to each segment separately.

import IPython.display as ipd
import matplotlib.pyplot as plt
import librosa.display
import numpy

from mirdotcom import mirdotcom

mirdotcom.init()

Step 1: Retrieve Audio#

Load the audio file simple_loop.wav into an array.

fp = mirdotcom.get_audio("simple_loop.wav")
x, sr = librosa.load(fp)

Show the sample rate:

print(sr)
22050

Listen to the audio signal.

ipd.Audio(x, rate=sr)

Display the audio signal.

librosa.display.waveshow(x, sr=sr)
plt.ylabel("Amplitude")
Text(22.472222222222214, 0.5, 'Amplitude')
../../_images/242c1bf860b3159b566b5e3a0c9fd86ab1f1c81ca108968b945795662dac794d.png

Compute the short-time Fourier transform:

X = librosa.stft(x)

For display purposes, compute the log amplitude of the STFT:

Xmag = librosa.amplitude_to_db(X)
/tmp/ipykernel_59196/2785268339.py:1: UserWarning: amplitude_to_db was called on complex input so phase information will be discarded. To suppress this warning, call amplitude_to_db(np.abs(S)) instead.
  Xmag = librosa.amplitude_to_db(X)

Display the spectrogram.

# Play with the parameters, including x_axis and y_axis
librosa.display.specshow(Xmag, sr=sr, x_axis="time", y_axis="log")
<matplotlib.collections.QuadMesh at 0x7c38b190c940>
../../_images/0a85fd5c9ae6088d7396281f48d6bd00513fb0bdd2042fab1bd4b3a9ec176c05.png

Step 2: Detect Onsets#

Find the times, in seconds, when onsets occur in the audio signal.

onset_frames = librosa.onset.onset_detect(y=x, sr=sr)
print(onset_frames)
[12 33 55 66 76]
onset_times = librosa.frames_to_time(onset_frames, sr=sr)
print(onset_times)
[0.27863946 0.7662585  1.27709751 1.53251701 1.76471655]

Convert the onset frames into sample indices.

onset_samples = librosa.frames_to_samples(onset_frames)
print(onset_samples)
[ 6144 16896 28160 33792 38912]

Play a “beep” at each onset.

# Use the `length` parameter so the click track is the same length as the original signal
clicks = librosa.clicks(times=onset_times, length=len(x))
# Play the click track "added to" the original signal
ipd.Audio(x + clicks, rate=sr)

Step 3: Segment the Audio#

Save into an array, segments, 100-ms segments beginning at each onset.

frame_sz = int(0.100 * sr)
segments = numpy.array([x[i : i + frame_sz] for i in onset_samples])

Here is a function that adds 300 ms of silence onto the end of each segment and concatenates them into one signal.

Later, we will use this function to listen to each segment, perhaps sorted in a different order.

def concatenate_segments(segments, sr=22050, pad_time=0.300):
    padded_segments = [
        numpy.concatenate([segment, numpy.zeros(int(pad_time * sr))])
        for segment in segments
    ]
    return numpy.concatenate(padded_segments)
concatenated_signal = concatenate_segments(segments, sr)

Listen to the newly concatenated signal.

ipd.Audio(concatenated_signal, rate=sr)

Step 4: Extract Features#

For each segment, compute the zero crossing rate.

zcrs = [sum(librosa.core.zero_crossings(segment)) for segment in segments]
print(zcrs)
[11, 570, 11, 10, 568]

Use argsort to find an index array, ind, such that segments[ind] is sorted by zero crossing rate.

ind = numpy.argsort(zcrs)
print(ind)
[3 0 2 4 1]

Sort the segments by zero crossing rate, and concatenate the sorted segments.

concatenated_signal = concatenate_segments(segments[ind], sr)

Step 5: Listen to Segments#

Listen to the sorted segments. What do you hear?

ipd.Audio(concatenated_signal, rate=sr)

More Exercises#

Repeat the steps above using other features from librosa.feature, e.g. rmse, spectral_centroid, spectral_bandwidth.

Repeat the steps above for other audio files:

mirdotcom.list_audio()
simple_piano.wav
latin_groove.mp3
clarinet_c6.wav
cowbell.wav
classic_rock_beat.mp3
oboe_c6.wav
sir_duke_trumpet_fast.mp3
sir_duke_trumpet_slow.mp3
jangle_pop.mp3
125_bounce.wav
brahms_hungarian_dance_5.mp3
58bpm.wav
conga_groove.wav
funk_groove.mp3
tone_440.wav
sir_duke_piano_fast.mp3
thx_original.mp3
simple_loop.wav
classic_rock_beat.wav
c_strum.wav
prelude_cmaj.wav
sir_duke_piano_slow.mp3
busta_rhymes_hits_for_days.mp3