Session A (Chair: Eva Zangerle)
Tuesday, November 5, 9.15-12.30 h
[A-00] (Anniversary paper) Data Usage in MIR: History & Future RecommendationsWenqin Chen; Jessica Keast; Jordan Moody; Corinne Moriarty; Felicia Villalobos; Virtue Winter; Xueqi Zhang; Xuanqi Lyu; Elizabeth Freeman; Jessie Wang; Sherry Cai; Katherine Kinnaird"This paper examines the unique issues of data access that MIR has faced over the last 20 years. We explore datasets used in ISMIR papers, examine the evolution of data access over time, and offer three proposals to increase equity of access to data."
[A-01] Zero-shot Learning for Audio-based Music Classification and Tagging Jeong Choi; Jongpil Lee; Jiyoung Park; Juhan Nam"Investigated the paradigm of zero-shot learning applied to music domain. Organized 2 side information setups for music calssification task. Proposed a data split scheme and associated evaluation settings for the multi-label zero-shot learning."
[A-02] Learning Notation Graph Construction for Full-Pipeline Optical Music Recognition Alexander Pacha; Jorge Calvo-Zaragoza; Jan Hajic, jr."An Optical Music Recognition system must infer the relationships between detected symbols to understand the semantics of a music score. This notation assembly stage is formulated as a machine learning problem and solved using deep learning."
[A-03] An Attention Mechanism for Musical Instrument RecognitionSiddharth Gururani; Mohit Sharma; Alexander Lerch"Instrument recognition in multi-instrument recordings is formulated as a multi-instance multi-label classification problem. We train a model on the weakly labeled OpenMIC dataset using an attention mechanism to aggregate predictions over time."
[A-04] MIDI-Sheet Music Alignment Using Bootleg Score SynthesisThitaree Tanprasert; Teerapat Jenrungrot; Meinard Müller; Timothy Tsai"We propose a mid-level representation called a bootleg score representation which enables alignment between sheet music images and MIDI."
[A-05] mirdata: Software for Reproducible Usage of Datasets Rachel Bittner; Magdalena Fuentes; David Rubinstein; Andreas Jansson; Keunwoo Choi; Thor Kell"The lack of a standardized way to access and load commonly used datasets is a hurdle towards accelerated and reproducible research. To mitigate this, we present a tool for easy access to data and means to check the integrity of a dataset."
[A-06] Cover Detection Using Dominant Melody EmbeddingsGuillaume Doras; Geoffroy Peeters"We propose a cover detection method based on vector embedding extraction out of audio dominant melody. This architecture improves state-of-the-art accuracy on large datasets, and scales to query collections of thousands of tracks in a few seconds."
[A-07] Identifying Expressive Semantics in Orchestral Conducting KinematicsYu-Fen Huang; Tsung-Ping Chen; Nikki Moran; Simon Coleman; Li Su"As the pioneering investigation on conducting movement using RNN, we highlight the potential for this framework to be applied to further explore other issues in music conducting."
[A-08] The RomanText Format: A Flexible and Standard Method for Representing Roman Numerial Analyses Mark Gotham; Dmitri Tymoczko; Michael Cuthbert"We provide a technical standard, converter code and example corpora for Roman-numeral analysis, enabling a range of computational, musical, and pedagogical use cases."
[A-09] 20 Years of Playlists: A Statistical Analysis on Popularity and Diversity Lorenzo Porcaro; Emilia Gomez"We find extremely valuable to compare playlist datasets generated in different contexts, as it allows to understand how changes in the listening experience are affecting playlist creation strategies."
[A-10] Identification and Cross-Document Alignment of Measures in Music Score Images Simon Waloschek; Aristotelis Hadjakos; Alexander Pacha"Musicologists regularly compare multiple sources of the same musical piece. To enable cross-source navigation in music score image, we propose a machine-learning approach which automatically detects and aligns measures across multiple sources."
[A-11] Query-by-Blending: A Music Exploration System Blending Latent Vector Representations of Lyric Word, Song Audio, and ArtistKento Watanabe; Masataka Goto"Query-by-Blending is a music exploration system that lets users find music by combining three musical aspects: lyric word, song audio, and artist. We propose an embedding method of constructing a unified vector space by using unsupervised learning."
[A-12] Improving Structure Evaluation Through Automatic Hierarchy ExpansionBrian McFee; Katherine Kinnaird"We propose a method to expose latent hierarchical content in structural segmentation labels. This results in more accurate comparisons between multi-level segmentations."
[A-13] Conditioned-U-Net: Introducing a Control Mechanism in the U-Net for Multiple Source Separations Gabriel Meseguer Brocal; Geoffroy Peeters"In this paper, we apply conditioning learning to source separation and introduce a control mechanism to the standard U-Net architecture. The control mechanism allows multiple instrument separations with just one model without losing performance."
[A-14] An Initial Computational Model for Musical Schemata Theory Andreas Katsiavalos; Tom Collins; Bret Battey"This paper presents a novel classifier for short polyphonic passages in Classical works that performs musical schemata recognition and prototype extraction with the utilisation of high-level musical constructs and similarity functions."
Session B (Chair: Katerina Kosta)
Tuesday, November 5, 13.30-17.00 h
[B-00] (Anniversary paper) Music Performance Analysis: A SurveyAlexander Lerch; Claire Arthur; Ashis Pati; Siddharth Gururani"Music is a performing art. Even so, the performance itself is only infrequently explicitly acknowledged in MIR research. This paper surveys music performance research with the goal of increasing awareness for this topic in the ISMIR community."
[B-01] Evolution of the Informational Complexity of Contemporary Western MusicThomas Parmer; Yong-Yeol Ahn"We find evidence for a global, inverted U-shaped relationship between complexity and hedonistic value within Western contemporary music, suggesting that the most popular songs cluster around average complexity values."
[B-02] Deep Unsupervised Drum Transcription Keunwoo Choi; Kyunghyun Cho"DrummerNet is a drum transcriber trained in an unsupervised fashion. DrummerNet learns to transcribe by learning to reconstruct the audio with the transcription estimate. Unsupervised learning + a large dataset allow DrummerNet to be less-biased."
[B-03] Estimating Unobserved Audio Features for Target-Based Orchestration Jon Gillick; Carmine-Emanuele Cella; David Bamman"We show that neural networks can predict features of the sum of 30 or more individual music notes based only on precomputed features of the source notes. This holds promise for computationally expensive applications like target-based orchestration."
[B-04] Towards Automatically Correcting Tapped Beat Annotations for Music Recordings Jonathan Driedger; Hendrik Schreiber; Bas de Haas; Meinard Müller"A framework for correcting beat annotations that were created by humans tapping to the beat of music recordings. It includes an automated correction procedure, visualizations to inspect the correction process, and a new dataset of beat annotations."
[B-05] Algorithmic Ability to Predict the Musical Future: Datasets and Evaluation Berit Janssen; Tom Collins; Iris Yuping Ren"We introduce a dataset and evaluation methods to compare music prediction models, used in the MIREX Patterns for Prediction task. We compare three models in our framework, and discuss how to improve evaluation strategies and music prediction models."
[B-06] Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval Stefan Balke; Matthias Dorfer; Luis Carvalho; Andreas Arzt; Gerhard Widmer"The amount of temporal context given to a CNN is adapted by an additional soft-attention network, enabling the network to react to local and global tempo deviations in the input audio spectrogram."
[B-07] Contributing to New Musicological Theories with Computational Methods: The Case of Centonization in Arab-Andalusian MusicThomas Nuttall; Miguel García-Casado; Víctor Núñez-Tarifa; Rafael Caro Repetto; Xavier Serra"Here we demonstrate how relatively uncomplicated statistical methods can support and contribute to new musicological theory, namely that developed by expert performer and researcher of Arab-Andalusian music of the Moroccan tradition, Amin Chaachoo."
[B-08] Temporal Convolutional Networks for Speech and Music Detection in Radio Broadcast Quentin Lemaire; Andre Holzapfel"This study shows that a novel deep neural network architecture for sequential data (non-causal Temporal Convolution Network) can outperform state-of-the-art architectures in the task of speech and music detection."
[B-09] Towards Explainable Music Emotion Recognition: The Route via Mid-level Features Shreyan Chowdhury; Andreu Vall Portabella; Verena Haunschmid; Gerhard Widmer"Explainable predictions of emotion from music can be obtained by introducing an intermediate representation of mid-level perceptual features in the predictor deep neural network."
[B-10] Community-Based Cover Song DetectionJonathan Donier"We approach cover song detection by considering larger sets of potential versions for a given work, and create and exploit the graph of relationships between these versions. We show a significant improvement in performance over a 1-vs-1 method."
[B-11] Tracking Beats and Microtiming in Afro-Latin American Music Using Conditional Random Fields and Deep LearningMagdalena Fuentes; Lucas Maia; Martín Rocamora; Luiz Biscainho; Helene-Camille Crayencour; Slim Essid; Juan Bello"A CRF model is able to automatically and jointly track beats and microtiming in timekeeper instruments of Afro-Latin American music, in particular samba and candombe. This allows the study of microtiming profiles' dependency on genre and performer."
[B-12] Harmony Transformer: Incorporating Chord Segmentation into Harmony RecognitionTsung-Ping Chen; Li Su"Incorporating chord segmentation into chord recognition using the Transformer model achieves improved performance over prior art."
[B-13] Statistical Music Structure Analysis Based on a Homogeneity-, Repetitiveness-, and Regularity-Aware Hierarchical Hidden Semi-Markov ModelGo Shibata; Ryo Nishikimi; Eita NAKAMURA; Kazuyoshi Yoshii"This paper proposes a solid statistical approach to music structure analysis based on a homogeneity-, repetitiveness-, and regularity-aware hierarchical hidden semi-Markov model."
[B-14] Towards Measuring Intonation Quality of Choir Recordings: A Case Study on Bruckner's Locus IsteChristof Weiss; Sebastian J. Schlecht; Sebastian Rosenzweig; Meinard Müller"This paper proposes an intonation cost measure for assessing the intonation quality of choir singing. While capturing local frequency deviations, the measure includes a grid shift compensation for cases when the entire choir is drifting in pitch."
[B-15] Guitar Tablature Estimation with a Convolutional Neural Network Andrew Wiggins; Youngmoo Kim"We propose a guitar tablature estimation system that uses a convolutional neural network to predict fingerings used by the guitarist from audio of an acoustic guitar performance."
Session C (Chair: Jin Ha Lee)
Wednesday, November 6, 09.00-12.30 h
[C-00] (Anniversary paper) Intelligent User Interfaces for Music Discovery: The Past 20 Years and What's to ComePeter Knees; Markus Schedl; Masataka Goto"We reflect on the evolution of music discovery interfaces from using content-based analysis, to metadata, to interaction data, while access and listening habits shift from personal collections to streaming services; and extrapolate future trends."
[C-01] Learning a Joint Embedding Space of Monophonic and Mixed Music Signals for Singing Voice Kyungyun Lee; Juhan Nam"The paper introduces a new method of obtaining a consistent singing voice representation from both monophonic and mixed music signals. Also, it presents a simple music mashup pipeline to create a large synthetic singer dataset."
[C-02] Augmenting Music Listening Experiences on Voice AssistantsMorteza Behrooz; Sarah Mennicken; Jennifer Thom; Rohit Kumar; Henriette Cramer"Using metadata about playlists, artists, and tracks, we present an approach inspired by story generation techniques to dynamically augment music streaming sessions on smart speakers with contextualized transitions."
[C-03] Coupled Recurrent Models for Polyphonic Music Composition John Thickstun; Zaid Harchaoui; Dean Foster; Sham Kakade"This paper investigates automatic music composition via parameterized, probabilistic models of scores. We consider ways to exploit the structure of music to strengthen these models, borrowing ideas from convolutional and recurrent neural networks."
[C-04] Hit Song Prediction: Leveraging Low- and High-Level Audio Features Eva Zangerle; Michael Vötter; Ramona Huber; Yi-Hsuan Yang"We show that for predicting the potential success of a song, both low- and high-level audio features are important. We use a deep and wide neural network to model these features and perform a regression task on the track’s rank in the charts."
[C-05] Da-TACOS: A Dataset for Cover Song Identification and UnderstandingFurkan Yesiler; Chris Tralie; Albin Correya; Diego Furtado Silva; Philip Tovstogan; Emilia Gomez; Xavier Serra"This work aims to understand the links among cover songs with computational approaches and to improve reproducibility of Cover Song Identification task by providing a benchmark dataset and frameworks for comparative algorithm evaluation."
[C-06] Harmonic Syntax in Time: Rhythm Improves Grammatical Models of HarmonyDaniel Harasim; Timothy O'Donnell; Martin Rohrmeier"This paper integrates rhythm into harmonic syntax models of harmony using a novel grammar of rhythmic phrases."
[C-07] Learning to Traverse Latent Spaces for Musical Score Inpainting Ashis Pati; Alexander Lerch; Gaëtan Hadjeres"Recurrent Neural Networks can be trained using latent embeddings of a Variational Auto-Encoder-based model to to perform interactive music generation tasks such as inpainting."
[C-08] Detecting Stable Regions in Frequency Trajectories for Tonal Analysis of Traditional Georgian Vocal Music Sebastian Rosenzweig; Frank Scherbaum; Meinard Müller"This paper gives a mathematically rigorous description of two conceptually different approaches (one based on morphological operations, the other based on binary time-frequency masks) for detecting stable regions in frequency trajectories."
[C-09] The AcousticBrainz Genre Dataset: Multi-Source, Multi-Level, Multi-Label, and Large-Scale Dmitry Bogdanov; Alastair Porter; Hendrik Schreiber; Julián Urbano; Sergio Oramas"The AcousticBrainz Genre Dataset allows researchers to explore how the same music pieces are annotated differently by different communities following their own genre taxonomies, and how these differences can be addressed by genre recognition systems."
[C-10] Data-Driven Song Recognition Estimation Using Collective Memory Dynamics Models Christos Koutlis; Manos Schinas; Vasiliki Gkatziaki; Symeon Papadopoulos; Yiannis Kompatsiaris"In this paper a composite track recognition model based on chart data, YouTube views and Spotify popularity is proposed and is evaluated on real data obtained from a survey conducted in Sweden."
[C-11] Towards Interpretable Polyphonic Transcription with Invertible Neural Networks Rainer Kelz; Gerhard Widmer"Invertible Neural Networks enable direct interpretability of the latent space."
[C-12] Learning to Generate Music With SentimentLucas Ferreira; Jim Whitehead"A new LSTM method for generating symbolic music with sentiment."
[C-13] Backtracking Search Heuristics for Solving the All-partition Array ProblemBrian Bemman; David Meredith"This paper provides search heuristics for use with a greedy backtracking algorithm which solve a hard variant of a set-covering problem found in 12-tone serial music."
[C-14] Modeling and Learning Structural Breaks in Sonata Forms Laurent Feisthauer; Louis Bigo; Mathieu Giraud"We trained a neural network with high-level musical feature to find medial caesura in string quartet movements written by Mozart. It finds correctly the MC for a little over half of the corpus."
[C-15] Auto-adaptive Resonance Equalization using Dilated Residual NetworksMaarten Grachten; Emmanuel Deruty; Alexandre Tanguy"We propose a method to fully automate resonance equalization in mixing and mastering musical audio. The method predicts the resonance attenuation factor using neural networks trained and evaluated on ground truth collected from sound engineers."
Session D (Chair: Florence Levé)
Wednesday, November 6, 14.30-17.30 h
[D-01] Analyzing User Interactions with Music Information Retrieval System: An Eye-tracking ApproachXiao Hu; Ying Que; Noriko Kando; Wenwei Lian"Eye movement measures can be used in investigating user interactions with MIR systems."
[D-02] A Cross-Scape Plot Representation for Visualizing Symbolic Melodic Similarity Saebyul Park; Taegyun Kwon; Jongpil Lee; Jeounghoon Kim; Juhan Nam"We propose a cross-scape plot representation to visualize multi-scaled melody similarity between two symbolic music. We evaluate its effectiveness on examples from folk music collections with similarity-based categories and plagiarism cases."
[D-03] JosquIntab: A Dataset for Content-based Computational Analysis of Music in Lute Tablature Reinier de Valk; Ryaan Ahmed; Tim Crawford"We present JosquIntab, a dataset of automatically created transcriptions (MIDI/MEI) of 64 lute intabulations; the creation algorithm; and our evaluation method. In two use cases, we demonstrate its usefulness for both MIR and musicological research."
[D-04] A Dataset of Rhythmic Pattern Reproductions and Baseline Automatic Assessment System Felipe Falcão; Baris Bozkurt; Xavier Serra; Nazareno Andrade; Ozan Baysal"This present work is an effort to address the shortage of music datasets designed for rhythmic assessment. A new dataset and baseline rhythmic assessment system are provided in order to support comparative studies about rhythmic assessment."
[D-05] Self-Supervised Methods for Learning Semantic Similarity in MusicMason Bretan; Larry Heck"By combining self-supervised learning techniques based on contextual prediction with adversarial training we demonstrate it is possible to impose a prior distribution on a learned latent space without degrading the quality of the features."
[D-06] Blending Acoustic and Language Model Predictions for Automatic Music Transcription Adrien Ycart; Andrew McLeod; Emmanouil Benetos; Kazuyoshi Yoshii"Dynamically integrating predictions from an acoustic and a language model with a blending model improves automatic music transcription performance on the MAPS dataset. Results are further improved by operating on 16th-note timesteps rather than 40ms."
[D-07] Modelling the Syntax of North Indian Melodies with a Generalized Graph Grammar Christoph Finkensiep; Richard Widdess; Martin Rohrmeier"Note- and interval-based models of hierarchical structure can be unified with a graph representation. Furthermore, leaps in melodies can be explained by latent structures such as the relative stability of pitches in a mode."
[D-08] A Comparative Study of Neural Models for Polyphonic Music Sequence TransductionAdrien Ycart; Daniel Stoller; Emmanouil Benetos"A systematic study using various neural models and automatic music transcription systems shows that a cross-entropy-loss CNN improves transduction performance, while an LSTM does not. Using an adversarial set-up also does not yield improvement."
[D-09] Learning Similarity Metrics for Melody Retrieval Folgert Karsdorp; Peter Kranenburg; Enrique Manjavacas"We compare different recurrent neural architectures to represent symbolic melodies as continuous vectors. We show how duplet and triplet loss functions can be used to learn distributional representations of symbolic music in an induced melody space."
[D-10] Multi-Task Learning of Tempo and Beat: Learning One to Improve the Other Sebastian Böck; Matthew Davies; Peter Knees"Multi-task learning helps to improve beat tracking accuracy if additional tempo information is used."
[D-11] Can We Increase Inter- and Intra-Rater Agreement in Modeling General Music Similarity? Arthur Flexer; Taric Lallai"Models of general music similarity are problematic due to the subjective nature of music perception, which is shown and discussed by conducting a user experiment trying to improve the MIREX `Audio Music Similarity' task."
[D-12] AIST Dance Video Database: Multi-Genre, Multi-Dancer, and Multi-Camera Database for Dance Information Processing Shuhei Tsuchida; Satoru Fukayama; Masahiro Hamasaki; Masataka Goto"AIST Dance Video Database is the first large-scale database containing original street dance videos with copyright-cleared music. It accelerates research of dance information processing such as dance-motion classification and dancer identification."
[D-13] Microtiming Analysis in Traditional Shetland Fiddle Music Estefania Cano; Scott Beveridge"The analysis of microtiming variations on a corpus of Shetland fiddle music, revealed characteristic patterns in the duration of beats and eighth notes that may be related to the suitability of fiddle music as an accompaniment to dancing."
[D-14] SUPRA: Digitizing the Stanford University Piano Roll Archive Zhengshan Shi; Craig Sapp; Kumaran Arul; Jerry McBride; Julius Smith"This paper describes the digitization process of SUPRA, an online database of historical piano roll recordings, which has resulted in an initial dataset of 478 performances of pianists from the early twentieth century transcribed to MIDI format."
[D-15] Fast and Flexible Neural Audio Synthesis Lamtharn Hantrakul; Jesse Engel; Adam Roberts; Chenjie Gu; Lamtharn Hantrakul"We present an autoregressive WaveRNN model capable of synthesizing realistic audio that closely follows fine-scale temporal conditioning for loudness and fundamental frequency."
Session E (Chair: Hanna Lukashevich)
Thursday, November 7, 09.00-12.30 h
[E-00] (Anniversary paper) 20 Years of Automatic Chord Recognition from AudioJohan Pauwels; Ken O'Hanlon; Emilia Gomez; Mark B. Sandler"Looking back on 20 years of automatic chord recognition in order to move forwards"
[E-01] DeepSRGM - Sequence Classification and Ranking in Indian Classical Music Via Deep LearningSathwik Tejaswi Madhusudhan; Girish Chowdhary"In this work, we propose deep learning based methods for Raga recognition and sequence ranking in Indian classical music. Our approach employs efficient pre-possessing and learns temporal sequences in music data using LSTM Recurrent Neural Networks."
[E-02] Modeling Music Modality with a Key-Class Invariant Pitch Chroma CNNAnders Elowsson; Anders Friberg"When analyzing musical harmony with a CNN it can be beneficial to: start from a pretrained pitch transcription system using deep layered learning, compute a pitch chroma within the CNN, and promote key invariance through pooling across key class."
[E-03] Convolutional Composer Classification Harsh Verma; John Thickstun"This paper investigates the effectiveness simple convolutional models for attributing composers to musical scores, evaluated on a corpus of 2,500 scores authored by a variety of composers spanning the Renaissance era to the early 20th century."
[E-04] A Diplomatic Edition of Il Lauro Secco: Ground Truth for OMR of White Mensural Notation Emilia Parada-Cabaleiro; Anton Batliner; Björn Schuller"We present a symbolic representation in mensural notation of the anthology Il Lauro Secco. For musicological analysis we encoded the repertoire in **mens and MEI; to support OMR research we present ground truth in agnostic and semantic formats."
[E-05] The Harmonix Set: Beats, Downbeats, and Functional Segment Annotations of Western Popular Music Oriol Nieto; Matthew McCallum; Matthew Davies; Andrew Robertson; Adam Stark; Eran Egozy"Human annotated dataset containing beats, downbeats, and structural segmentation for over 900 pop tracks."
[E-06] FMP Notebooks: Educational Material for Teaching and Learning Fundamentals of Music Processing Meinard Müller; Frank Zalkow"The FMP notebooks include open-source Python code, Jupyter notebooks, detailed explanations, as well as numerous audio and music examples for teaching and learning MIR and audio signal processing."
[E-07] Automatic Assessment of Sight-reading ExercisesJiawen Huang; Alexander Lerch"This paper shows the relevancy of different features as well as the contribution of different feature groups to different assessment categories for sight-reading exercises."
[E-08] Supervised Symbolic Music Style Translation Using Synthetic Data Ondrej Cífka; Umut Simsekli; Gael Richard"Synthetic data is useful for learning to efficiently transform musical style."
[E-09] Deep Music Analogy Via Latent Representation Disentanglement Ruihan Yang; Dingsu Wang; Ziyu Wang; Tianyao Chen; Junyan Jiang; Gus Xia"We contribute a representation disentanglement method tailored for music composition, which enables to achieve domain-free music analogy-making."
[E-10] Query by Video: Cross-modal Music Retrieval Bochen Li; Aparna Kumar"This paper presents a cross-modal distance learning model to retrieve music for videos based on emotion concepts. The emotion constraints on the model allow for efficient training."
[E-11] Investigating CNN-based Instrument Family Recognition for Western Classical Music Recordings Michael Taenzer; Jakob Abeßer; Stylianos I. Mimilakis; Christof Weiss; Meinard Müller"This paper describes extensive experiments for CNN-based instrument family recognition systems. In particular, it studies the effect of data normalization, pre-processing, and augmentation techniques on the generalization capability of the models."
[E-12] A Bi-Directional Transformer for Musical Chord Recognition Jonggwon Park; Kyoyun Choi; Sungwook Jeon; Dokyun Kim; Jonghun Park"We propose bi-directional Transformer model based on self-attention mechanism for chord recognition. Through an attention map analysis, we visualize how attention was performed and conclude that the model can effectively capture long-term dependency."
[E-13] SAMBASET: A Dataset of Historical Samba de Enredo Recordings for Computational Music Analysis Lucas Maia; Magdalena Fuentes; Luiz Biscainho; Martín Rocamora; Slim Essid"SAMBASET is a large samba de enredo dataset that includes rich metadata, beat and downbeat annotations. It could provide challenges to state-of-the-art algorithms in MIR tasks such as rhythmic analysis, vocal F0 and chord estimation, among others."
[E-14] Deep-Rhythm for Global Tempo Estimation in MusicHadrien Foroughmand; Geoffroy Peeters"Estimation of tempo or rhythm description using a new 4D representation of the harmonic series related to tempo used as an input to a convolutional neural network which is trained to estimate the tempo or the rhythm pattern classes."
[E-15] Large-vocabulary Chord Transcription Via Chord Structure DecompositionJunyan Jiang; Ke Chen; Wei Li; Gus Xia"In this paper, we propose a new model for large-vocabulary chord recognition by chord structure decomposition with state-of-the-art performance on different metrics."
Session F (Chair: Audrey Laplante)
Thursday, November 7, 13.30-17.00 h
[F-01] BandNet: A Neural Network-based, Multi-Instrument Beatles-Style MIDI Music Composition Machine Yichao Zhou; Wei Chu; Sam Young; Xin Chen"We propose a recurrent neural network (RNN)-based MIDI music composition machine that is able to learn musical knowledge from existing Beatles' music and generate full songs in the style of the Beatles with little human intervention."
[F-02] Can We Listen To It Together?: Factors Influencing Reception of Music Recommendations and Post-Recommendation Behavior Jin Ha Lee; Liz Pritchard; Chris Hubbles"In addition to the aesthetic qualities of music and the respondent’s taste, expectations regarding the delivery, familiarity, trust in the recommender’s abilities, and the rationale for suggestions affected people’s reception of recommendations."
[F-03] Adversarial Learning for Improved Onsets and Frames Music TranscriptionJong Wook Kim; Juan Bello"Piano roll prediction in music transcription can be improved by appending an additional loss incurred by an adversarial discriminator."
[F-04] Automatic Music Transcription and Ethnomusicology: a User Study Andre Holzapfel; Emmanouil Benetos"After decades of developing Automatic Music Transcription (AMT) systems, this paper conducts a first user study with experienced transcribers to shed light on the potential and drawbacks of incorporating AMT into manual transcription practice."
[F-05] LakhNES: Improving Multi-instrumental Music Generation with Cross-domain Pre-training Chris Donahue; Huanru Henry Mao; Yiting Ethan Li; Garrison Cottrell; Julian McAuley"We use transfer learning to improve multi-instrumental music generation by first pre-training a Transformer on a large heterogeneous music dataset (Lakh MIDI) and subsequently fine tuning it on a domain of interest (NES-MDB)."
[F-06] Taking Form: A Representation Standard, Conversion Code, and Example Corpora for Recording, Visualizing, and Studying Analyses of Musical Form Mark Gotham; Matthew Ireland"We provide new specification standards for representing human analyses of musical form, along with corpora of examples, and code for working with them."
[F-07] Learning Complex Basis Functions for Invariant Representations of Audio Stefan Lattner; Monika Dörfler; Andreas Arzt"The "Complex Autoencoder" learns features invariant to transposition and time-shift of audio in CQT representation. The features are competitive in a repeated section discovery, and in an audio-to-score alignment task."
[F-08] Folded CQT RCNN For Real-time Recognition of Instrument Playing Techniques Jean-Francois DUCHER; Philippe Esling"We extend state-of-the-art deep learning models for instrument recognition to the real-time classification of instrument playing techniques. Our models generalize better with a proper taxonomy and an adapted input transform."
[F-09] humdrumR: a New Take on an Old Approach to Computational MusicologyNathaniel Condit-Schultz; Claire Arthur"Describes a new software toolkit for computational musicology research."
[F-10] Tunes Together: Perception and Experience of Collaborative PlaylistsSo Yeon Park; Audrey Laplante; Jin Ha Lee; Blair Kaneshiro"Collaborative playlists (CPs) are critical in bringing back social connectedness to music enjoyment. We characterize purposes and connotations of CPs as well as elucidate similarities and differences between users and non-users with the CP Framework."
[F-11] A Holistic Approach to Polyphonic Music Transcription with Neural Networks Miguel Roman; Antonio Pertusa; Jorge Calvo-Zaragoza"A neural network architecture is trained in an end-to-end manner to transcribe music scores in humdrum **kern format from polyphonic audio files."
[F-12] Generalized Metrics for Single-f0 Estimation Evaluation Rachel Bittner; Juan Jose Bosch"We show a variety of limitations in widely used metrics for measuring the accuracy of single-f0 estimation systems, and propose a generalization which considers non-binary voicing decisions and a weighted scoring of pitch estimations."
[F-13] Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders Yin-Jyun Luo; Kat Agres; Dorien Herremans"We disentangle pitch and timbre of musical instrument sounds by learning separate interpretable latent spaces using Gaussian mixture variational autoencoders. The model is verified by controllable sound synthesis and many-to-many timbre transfer."
[F-14] The ISMIR Explorer - A Visual Interface for Exploring 20 Years of ISMIR Publications Thomas Low; Christian Hentschel; Sayantan Polley; Anustup Das; Harald Sack; Andreas Nurnberger; Sebastian Stober"We present a visual user interface for exploring the cumulative ISMIR proceedings based on locally aligned neighborhood maps containing semantically similar papers. Use this to search for related work or to discover interesing new topics!"
[F-15] Pattern Clustering in Monophonic Music by Learning a Non-Linear Embedding From Human Annotations Timothy de Reuse; Ichiro Fujinaga"Musical pattern discovery can be taken as a clustering task, incorporating manual annotations of repeated patterns as a way of specifying the kinds of patterns desired."
[F-16] A Study of Annotation and Alignment Accuracy for Performance Comparison in Complex Orchestral Music Thassilo Gadermaier; Gerhard Widmer"Annotations of the "beat" of complex orchestral music have considerable uncertainty due to disagreement of annotators. A comparison of typical uncertainties to accuracies achieved by transfer of annotations using dynamic time-warping is given."
[F-17] Mapping Timing Strategies in Drum PerformanceGeorge Sioros; Guilherme Câmara; Anne Danielsen"We present a novel method for the analysis and visualization of microtiming relations between instruments and apply it to drum performances with three different timing profiles (on, pushed and laidback) from a laboratory experiment."
[F-18] Improving Singing Aid System for Laryngectomees With Statistical Voice Conversion and VAE-SPACELi Li; Tomoki Toda; Kazuho Morikawa; Kazuhiro Kobayashi; Shoji Makino"An improved singing aid system for laryngectomees is developed, which converts EL speeches into singing voices according to the melodic information by applying a statistical VC approach to enhance phonetic features and VAE-SPACE to control pitch."
Session G (Chair: Yi-Hsuan Yang)
Friday, November 8, 09.00-12.30 h
[G-01] Approachable Music Composition with Machine Learning at ScaleCheng-Zhi Anna Huang; Curtis Hawthorne; Adam Roberts; Monica Dinculescu; James Wexler; Leon Hong; Jacob Howcroft"We show behind the scenes how the Bach Doodle works, the design, how we sped up the machine learning model Coconet to run in the browser. We are also releasing a dataset of 21.6 million melody and harmonization pairs, along with user ratings."
[G-02] Scalable Searching and Ranking for Melodic Pattern QueriesPhilippe Rigaux; Nicolas Travers"We focus in this paper on the scalable content-based retrieval problem. We consider the search mechanism with a monophonic query pattern in order to retrieve from a very large collection of scores one or more fragments "similar" to this pattern."
[G-03] Adaptive Time-Frequency Scattering for Periodic Modulation Recognition in Music SignalsChanghong Wang; Emmanouil Benetos; Vincent Lostanlen; Elaine Chew"Scattering transform provides a versatile and compact representation for analysing playing techniques."
[G-04] Controlling Symbolic Music Generation based on Concept Learning from Domain KnowledgeTaketo Akama"ExtRes is a generative model that learns decoupled concept spaces, given human domain knowledge. It provides concept-aware (e.g., rhythm, contour) controllability in interpolation and variation generation for symbolic music."
[G-05] Unmixer: An Interface for Extracting and Remixing LoopsJordan Smith; Yuta Kawasaki; Masataka Goto"Unmixer is a web interface where users can upload music, extract loops, remix them, and mash-up loops from different songs. To extract loops with source separation, we use a nonnegative tensor factorization method improved with a sparsity constraint."
[G-06] Quantifying Disruptive Influence in the AllMusic Guide Flavio Figueiredo; Nazareno Andrade"What is disruption? Different from being popular, being disruptive usually means bringing something ground-breaking to the table. In this work, we measure and detail how artists are disruptive using a human-curated music corpora."
[G-07] Leveraging knowledge bases and parallel annotations for music genre translationElena Epure; Anis KHLIF; Romain Hennequin"In this paper, we explore the problem of translation of music genres between multiple tag systems, with or without common annotated corpus."
[G-08] Generating Structured Drum Pattern Using Variational Autoencoder and Self-similarity MatrixI-CHIEH WEI; Chih-Wei Wu; Li Su"A drum pattern generation model based on VAE-GAN is presented; the proposed method generates symbolic drum patterns given a melodic track. Self-similarity matrix (SSM) is incorporated in the process for encapsulating structural information."
[G-09] Rendering Music Performance With Interpretation Variations Using Conditional Variational RNN Akira Maezawa; Kazuhiko Yamamoto; Takuya Fujishima"Our performance rendering method discovers latent sources of expressive variety, and also allows users to control such sources of expressive variations when rendering."
[G-10] An Interactive Workflow for Generating Chord Labels for Homorhythmic Music in Symbolic FormatsYaolong Ju; Samuel Howes; Cory McKay; Nathaniel Condit-Schultz; Jorge Calvo-Zaragoza; Ichiro Fujinaga"An Interactive Workflow for Generating Chord Labels for Homorhythmic Music in Symbolic Formats"
[G-11] Quantifying Musical Style: Ranking Symbolic Music based on Similarity to a Style Jeffrey Ens; Philippe Pasquier"StyleRank is a method to rank MIDI files based on their similarity to a style defined by an arbitrary corpus."
[G-12] Audio Query-based Music Source SeparationJie Hwan Lee; Hyeong-Seok Choi; Kyogu Lee"An audio-query based source separation method that is capable of separating the music source regardless of the number and/or kind of target signals. Various useful scenarios are suggested such as zero-shot separation, latent interpolation and etc."
[G-13] Mosaic Style Transfer Using Sparse Autocorrelograms Daniel MacKinlay; Zdravko Botev"We apply sparse dictionary decomposition twice to autocorrelograms of signals, to get a novel analysis of and method for mosaicing music style transfer, which has the novel feature of handling time-scaling of the source audio naturally."
[G-14] Automatic Choreography Generation with Convolutional Encoder-decoder Network Juheon Lee; Seohyun Kim; Kyogu Lee"In this paper, we proposed an encoder-decoder neural network that generates choreography that matches with given music. As a result of the evaluation, we showed that the proposed network created a natural choreography that matched the music."
[G-15] Hierarchical Classification Networks for Singing Voice Segmentation and Transcription Fu Zih-Sing; Li Su"A note transcription method for singing voice, implemented by novel hierarchical classification networks, achieves the performance better than before."
[G-16] VirtuosoNet: A Hierarchical RNN-based System for Modeling Expressive Piano Performance Dasaem Jeong; Taegyun Kwon; Yoojin Kim; Kyogu Lee; Juhan Nam"We present an RNN-based model that reads MusicXML and generates human-like performance MIDI. The model employs a hierarchical approach by using attention network and an independent measure-level estimation module. We share our code and dataset."
[G-17] MIDI Passage Retrieval Using Cell Phone Pictures of Sheet MusicDaniel Yang; Thitaree Tanprasert; Teerapat Jenrungrot; Mengyi Shan; Timothy Tsai"We develop a system which enables a person to take a cell phone picture of a page of sheet music, and to automatically retrieve the matching portion of a corresponding MIDI file."
[G-18] A Convolutional Approach to Melody Line Identification in Symbolic Scores Federico Simonetta; Carlos Eduardo Cancino-Chacón; Stavros Ntalampiras; Gerhard Widmer"We propose a new approach to identifying the most salient melody line in a symbolic score, consisting of a CNN estimating the probability that each note in the score belongs to the melody. This task is important for both MIR and Musicology."