Dix expériences sur la modélisation du timbre polyphonique
Institution:
Paris 6Disciplines:
Directors:
Abstract EN:
The majority of systems extracting high-level music descriptions from audio signals rely on a common, implicit model of the global sound or polyphonic timbre of a musical signal. This model represents the timbre of a texture as the long-term distribution of its local spectral features. The underlying assumption is rarely made explicit: the perception of the timbre of a texture is assumed to result from the most statistically significant feature windows. This thesis questions the validity of this assumption. To do so, we construct an explicit measure of the timbre similarity between polyphonic music textures, and variants thereof inspired by previous work in Music Information Retrieval. We show that the precision of such measures is bounded, and that the remaining error rate is not incidental. Notably, this class of algorithms tends to create false positives - which we call hubs - which are mostly always the same songs regardless of the query. Their study shows that the perceptual saliency of feature observations is not necessarily correlated with their statistical significance with respect to the global distribution. In other words, music listeners routinely “hear” things that are not statistically significant in musical signals, but rather are the result of high-level cognitive reasoning, which depends on cultural expectations, a priori knowledge, and context. Much of the music we hear as being “piano music” is really music that we expect to be piano music. Such statistical/ perceptual paradoxes are instrumental in the observed discrepancy between human perception of timbre and the models studied here.
Abstract FR:
La majorité des systèmes d'indexation de signaux musicaux modélise leur " timbre polyphonique " comme une distribution statistique globale d'attributs spectraux instantanés. Cette thèse remet en cause la validité de ce modèle. Nous construisons des mesures explicites de la similitude timbrale entre deux textures polyphoniques, et montrons que la précision de ce type d'algorithmes est limitée et que leur taux d'erreur résiduel n'est pas accidentel. Notamment, cette classe de mesures tend à créer de faux positifs qui sont toujours les même chansons, indépendamment de la requête de départ: des " hubs ". Leur étude établit que l'importance perceptuelle des attributs instantanés ne dépend pas de leur saillance statistique par rapport à leur distribution à long terme : nous "entendons" quotidiennement dans la musique polyphonique des choses qui ne sont pourtant pas présentes de façon significative (statistiquement) dans le signal sonore.