LOOKING THROUGH THE “GLASS CEILING”: A CONCEPTUAL FRAMEWORK FOR THE PROBLEMS OF SPECTRAL SIMILARITY
LOOKING THROUGH THE “GLASS CEILING”: A CONCEPTUAL FRAMEWORK FOR THE PROBLEMS OF SPECTRAL SIMILARITY
Alexandros Nanopoulos
Ioannis Karydis, Milosˇ Radovanovic, Mirjana Ivanovic
Abstract: Spectral similarity measures have been shown to exhibit good performance in several Music Information Retrieval (MIR) applications. They are also known, however, to pos- sess several undesirable properties, namely allowing the existence of hub songs (songs which frequently appear in nearest neighbor lists of other songs), “orphans” (songs which practically never appear), and difficulties in distin- guishing the farthest from the nearest neighbor due to the concentration effect caused by high dimensionality of data space. In this paper we develop a conceptual framework that allows connecting all three undesired properties. We show that hubs and “orphans” are expected to appear in high-dimensional data spaces, and relate the cause of their appearance with the concentration property of distance / similarity measures. We verify our conclusions on real mu- sic data, examining groups of frames generated by Gaus- sian Mixture Models (GMMs), considering two similar- ity measures: Earth Mover’s Distance (EMD) in combi- nation with Kullback-Leibler (KL) divergence, and Monte Carlo (MC) sampling. The proposed framework can be useful to MIR researchers to address problems of spectral similarity, understand their fundamental origins, and thus be able to develop more robust methods for their remedy.
Problem is mainly due to the high-dimensional vector space - so problems like hubs, orphans are expected. So, lets look at how to deal with this problem of high-dimensionality.
One problem, in Euclidean space, as we get into higher dimensions it harder to distinguish between the farthest and the nearest neighbor in high dimensions.
This is a natural result of high dimensionality and leads to the problem of hubs and orphans.
Another way of looking at this is to show the ratio between the standard deviation andof the neighbor distances as a function of dimensionality:
Conclusion - high dimensionality is responsible for problems of hubs, orphans and the concentration effect.
This was an interesting talk and has lots of potential impact on spectral similarity.