Evaluation method: determine which estimated onsets are “correct”, where correctness is defined as being within a small window of a reference onset.
mir_eval
finds the largest feasible set of matches using the Hopcroft-Karp algorithm. (See _bipartite_match
.)
Let's evaluate an onset detector on the following audio:
y, sr = librosa.load('audio/simple_piano.wav')
Estimate the onsets in the signal using onset_detect
:
est_onsets = librosa.onset.onset_detect(y=y, sr=sr, units='time')
est_onsets
Load a fictional reference annotation.
ref_onsets = numpy.array([0, 0.270, 0.510, 1.02,
1.50, 2.02, 2.53, 3.01])
Plot the estimated and reference onsets together.
Evaluate using mir_eval.onset.evaluate
:
mir_eval.onset.evaluate(ref_onsets, est_onsets)
Out of a possible 8 reference onsets, 7 estimated onsets matched, i.e. recall = 7/8 = 0.875.
Out of a possible 14 estimated onsets, 7 reference onsets matched, i.e. precision = 7/14 = 0.5.
The default matching tolerance is 50 milliseconds. To reduce the matching tolerance, adjust the window
keyword parameter:
mir_eval.onset.evaluate(ref_onsets, est_onsets, window=0.002)