ISMIR Oral Session 2 - Tempo and Rhythm
Session chair: Anssi Klapuri
IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS
By Marthias Gruhne, Christian Dittmar, and Daniel Gaertner
Marthias described their approach to generating beat histogram techniques, similar to those used by Burred, Gouyun, Foote and Tzanetakis. Problem: beat histogram can not be directly used as feature because of tempo dependency. Similar rhythms appear far apart in a Euclidean space because of this dependency. Challenge: reduce tempo dependence.
Solution: logarithmic Transformation. See the figure:
This leads to a histogram with a tempo independent part which can be separated from the tempo dependent part. This tempo independent part can then be used in a Euclidean space to find similar rhythms.
Evaluation: results 20% to 70%, and from 66% to 69% (Needs a significance test here I think)
USING SOURCE SEPARATION TO IMPROVE TEMPO DETECTION
By Parag Chordia and Alex Rae - presented by George Tzanetakis
Well, this is unusual that George will be presenting Para and Alex's work. Anssi suggests that we can use the wisdom of the crowds to anser the questions.
Motivation: Tempo detection is often unreliable for complex music.
Humans often resolve rhythms by entraining to a rhythmical regular part.
Idea: Separate music into components, some components may be more reliable.
Method:
Source separation
track tempo for each source
decide global tempo by either:
Pick one with most regular structure
Look for common tempo across all sources/layers
Here's the system:
PLCA is a source separation method (Probablistic Latent Component Analysis). Issues: Number of components need to be specified in advance. Could merge sources or one source could be split into multiple layers.
Autocorrelation is used for tempo detection. Regular sources will have higher peaks.
Other approach - a machine learning approach - a supervised learning problem
Global Tempo using Clustering - merge all tempo candidates into single vector (and others within a 5% tolerance (and .5x and 2x), to give a peak histogram showing confidence for each tempo.
Evaluation
mirex06 (20 mixed genre exceprts)
Accuracy:
MIREX06: 0.50
THIS : 0.60
Question: How many sources were specified to PLCA, Answer: 8. George thinks it doesn't matter too much.
Question: Other papers show that similar techniques do not show improvement for larger datasets
A MID-LEVEL REPRESENTATION FOR CAPTURING DOMINANT TEMPO AND PULSE INFORMATION IN MUSIC RECORDINGS
By Peter Grosche and Meinard Müller
Example - a waltz - where the downbeat is not too strong compared to beats 2 & 3. It is hard to find onsets in the energy curves. Instead, use:
Create a spectogram
Log compression of the spectrogram
Derivative
Accumulation
This yields a novelty curve, which can be used for onset detection. Downbeats are missing. How to beat track this? compute tempogram - a spectrogram of the novelty curve. This yields a periodicity kernel. All kernels are combined to obtain a single kernel - rectified - this gives a predominate local pulse curve. The PLP curve is dynamic but can be constrained to track at the bar, beat or tatum level.
Issues: PLP likes to fill in the gaps - which is not always appropriate. Trouble with the Borodin String Quartet No. 2. But when tempo is tightly constrained, it works much better.
This was a very good talk. Meinard presented lots of examples including examples where the system did not work well.
Question: Realtime? Currently kernels are 4 to 6 seconds. With a latency of 4 to 6 seconds it should work in an online scenario.
Question: How different from DTW on the tempogram? Not connected to DTW in anyway.
Question: How important is the hopsize? Not that important since a sliding window is used.