List of accepted papers

Session A

Tuesday, November 5, 9.15-12.30 h

[A-00] Anniversary paper: Data Usage in MIR: History & Future RecommendationsWenqin Chen (Smith College); Jessica Keast (Smith College); Jordan Moody (Smith College); Corinne Moriarty (Smith College); Felicia Villalobos (Smith College); Virtue Winter (Smith College); Xueqi Zhang (Smith College); Xuanqi Lyu (Smith College); Elizabeth Freeman (Smith College); Jessie Wang (Smith College); Sherry Cai (Smith College); Katherine Kinnaird (Smith College)"This paper examines the unique issues of data access that MIR has faced over the last 20 years. We explore datasets used in ISMIR papers, examine the evolution of data access over time, and offer three proposals to increase equity of access to data."

[A-01] Zero-shot Learning for Audio-based Music Classification and TaggingJeong Choi (KAIST); Jongpil Lee (KAIST); Jiyoung Park (NAVER Corp.); Juhan Nam (KAIST)"Investigated the paradigm of zero-shot learning applied to music domain. Organized 2 side information setups for music calssification task. Proposed a data split scheme and associated evaluation settings for the multi-label zero-shot learning."

[A-02] Learning Notation Graph Construction for Full-Pipeline Optical Music RecognitionAlexander Pacha (TU Wien); Jorge Calvo-Zaragoza (University of Alicante); Jan Hajic, jr. (Charles University)"An Optical Music Recognition system must infer the relationships between detected symbols to understand the semantics of a music score. This notation assembly stage is formulated as a machine learning problem and solved using deep learning."

[A-03] An Attention Mechanism for Musical Instrument RecognitionSiddharth Gururani (Georgia Institute of Technology); Mohit Sharma (Georgia Institute of Technology); Alexander Lerch (Georgia Institute of Technology)"Instrument recognition in multi-instrument recordings is formulated as a multi-instance multi-label classification problem. We train a model on the weakly labeled OpenMIC dataset using an attention mechanism to aggregate predictions over time."

[A-04] MIDI-Sheet Music Alignment Using Bootleg Score SynthesisThitaree Tanprasert (Harvey Mudd College); Teerapat Jenrungrot (Harvey Mudd College); Meinard Müller (International Audio Laboratories Erlangen); Timothy Tsai (Harvey Mudd College)"We propose a mid-level representation called a bootleg score representation which enables alignment between sheet music images and MIDI."

[A-05] mirdata: Software for Reproducible Usage of DatasetsRachel Bittner (Spotify); Magdalena Fuentes (L2S CentraleSupéléc and LTCI Télécom ParisTech); David Rubinstein (Spotify); Andreas Jansson (Spotify); Keunwoo Choi (Spotify); Thor Kell (Spotify)"The lack of a standardized way to access and load commonly used datasets is a hurdle towards accelerated and reproducible research. To mitigate this, we present a tool for easy access to data and means to check the integrity of a dataset."

[A-06] Cover Detection Using Dominant Melody EmbeddingsGuillaume Doras (Sacem); Geoffroy Peeters (Telecom ParisTech)"We propose a cover detection method based on vector embedding extraction out of audio dominant melody. This architecture improves state-of-the-art accuracy on large datasets, and scales to query collections of thousands of tracks in a few seconds."

[A-07] Identifying Expressive Semantics in Orchestral Conducting KinematicsYu-Fen Huang (Academia Sinica); Tsung-Ping Chen (Academia Sinica); Nikki Moran (Edinburgh University); Simon Coleman (Edinburgh University); Li Su (Academia Sinica)"As the pioneering investigation on conducting movement using RNN, we highlight the potential for this framework to be applied to further explore other issues in music conducting."

[A-08] The RomanText Format: A Flexible and Standard Method for Representing Roman Numerial AnalysesMark Gotham (Cornell University); Dmitri Tymoczko ; Michael Cuthbert "We provide a technical standard, converter code and example corpora for Roman-numeral analysis, enabling a range of computational, musical, and pedagogical use cases."

[A-09] 20 Years of Playlists: A Statistical Analysis on Popularity and DiversityLorenzo Porcaro (Pompeu Fabra University); Emilia Gomez (Universitat Pompeu Fabra)"We find extremely valuable to compare playlist datasets generated in different contexts, as it allows to understand how changes in the listening experience are affecting playlist creation strategies."

[A-10] Identification and Cross-Document Alignment of Measures in Music Score ImagesSimon Waloschek (Center of Music and Film Informatics, Detmold University of Music); Aristotelis Hadjakos (Center of Music and Film Informatics, Detmold University of Music); Alexander Pacha (TU Wien)"Musicologists regularly compare multiple sources of the same musical piece. To enable cross-source navigation in music score image, we propose a machine-learning approach which automatically detects and aligns measures across multiple sources."

[A-11] Query-by-Blending: A Music Exploration System Blending Latent Vector Representations of Lyric Word, Song Audio, and ArtistKento Watanabe (National Institute of Advanced Industrial Science and Technology (AIST)); Masataka Goto (National Institute of Advanced Industrial Science and Technology (AIST))"Query-by-Blending is a music exploration system that lets users find music by combining three musical aspects: lyric word, song audio, and artist. We propose an embedding method of constructing a unified vector space by using unsupervised learning."

[A-12] Improving Structure Evaluation Through Automatic Hierarchy ExpansionBrian McFee (New York University); Katherine Kinnaird (Smith College)"We propose a method to expose latent hierarchical content in structural segmentation labels. This results in more accurate comparisons between multi-level segmentations."

[A-13] Conditioned-U-Net: Introducing a Control Mechanism in the U-Net for Multiple Source SeparationsGabriel Meseguer Brocal (IRCAM); Geoffroy Peeters (Telecom ParisTech)"In this paper, we apply conditioning learning to source separation and introduce a control mechanism to the standard U-Net architecture. The control mechanism allows multiple instrument separations with just one model without losing performance."

[A-14] An Initial Computational Model for Musical Schemata TheoryAndreas Katsiavalos ; Tom Collins (tbc); Bret Battey ("This paper presents a novel classifier for short polyphonic passages in Classical works that performs musical schemata recognition and prototype extraction with the utilisation of high-level musical constructs and similarity functions."

Session B

Tuesday, November 5, 13.30-17.00 h

[B-00] Anniversary paper: Music Performance Analysis: A SurveyAlexander Lerch (Georgia Institute of Technology); Claire Arthur (Georgia Institute of Technology); Ashis Pati (Georgia Institute of Technology); Siddharth Gururani (Georgia Institute of Technology)"Music is a performing art. Even so, the performance itself is only infrequently explicitly acknowledged in MIR research. This paper surveys music performance research with the goal of increasing awareness for this topic in the ISMIR community."

[B-01] Evolution of the Informational Complexity of Contemporary Western MusicThomas Parmer (Indiana University); Yong-Yeol Ahn (Indiana University)"We find evidence for a global, inverted U-shaped relationship between complexity and hedonistic value within Western contemporary music, suggesting that the most popular songs cluster around average complexity values."

[B-02] Deep Unsupervised Drum TranscriptionKeunwoo Choi (Spotify); Kyunghyun Cho (New York University)"DrummerNet is a drum transcriber trained in an unsupervised fashion. DrummerNet learns to transcribe by learning to reconstruct the audio with the transcription estimate. Unsupervised learning + a large dataset allow DrummerNet to be less-biased."

[B-03] Estimating Unobserved Audio Features for Target-Based OrchestrationJon Gillick (UC Berkeley); Carmine-Emanuele Cella (University of California, Berkeley); David Bamman (UC Berkeley)"We show that neural networks can predict features of the sum of 30 or more individual music notes based only on precomputed features of the source notes. This holds promise for computationally expensive applications like target-based orchestration."

[B-04] Towards Automatically Correcting Tapped Beat Annotations for Music RecordingsJonathan Driedger (Chordify); Hendrik Schreiber (tagtraum industries incorporated); Bas de Haas (Chordify); Meinard Müller (International Audio Laboratories Erlangen)"A framework for correcting beat annotations that were created by humans tapping to the beat of music recordings. It includes an automated correction procedure, visualizations to inspect the correction process, and a new dataset of beat annotations."

[B-05] Algorithmic Ability to Predict the Musical Future: Datasets and EvaluationBerit Janssen (Utrecht University); Tom Collins (tbc); Iris Yuping Ren (Utrecht University)"We propose a cover detection method based on vector embedding extraction out of audio dominant melody. This architecture improves state-of-the-art accuracy on large datasets, and scales to query collections of thousands of tracks in a few seconds."

[B-06] Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music RetrievalStefan Balke (Johannes Kepler University Linz); Matthias Dorfer (Johannes Kepler University); Luis Carvalho (Johannes Kepler University); Andreas Arzt (Johannes Kepler University); Gerhard Widmer (Johannes Kepler University)"The amount of temporal context given to a CNN is adapted by an additional soft-attention network, enabling the network to react to local and global tempo deviations in the input audio spectrogram."

[B-07] Contributing to New Musicological Theories with Computational Methods: The Case of Centonization in Arab-Andalusian MusicThomas Nuttall (Music Technology Group, Universitat Pompeu Fabra, Barcelona); Miguel García-Casado (Music Technology Group, Universitat Pompeu Fabra, Barcelona); Víctor Núñez-Tarifa (Music Technology Group, Universitat Pompeu Fabra, Barcelona); Rafael Caro Repetto (Music Technology Group, Universitat Pompeu Fabra, Barcelona); Xavier Serra (Universitat Pompeu Fabra )"Here we demonstrate how relatively uncomplicated statistical methods can support and contribute to new musicological theory, namely that developed by expert performer and researcher of Arab-Andalusian music of the Moroccan tradition, Amin Chaachoo."

[B-08] Temporal Convolutional Networks for Speech and Music Detection in Radio BroadcastQuentin Lemaire (KTH Royal Institute of Technology); Andre Holzapfel (KTH Royal Institute of Technology in Stockholm)"This study shows that a novel deep neural network architecture for sequential data (non-causal Temporal Convolution Network) can outperform state-of-the-art architectures in the task of speech and music detection."

[B-09] Towards Explainable Music Emotion Recognition: The Route via Mid-level FeaturesShreyan Chowdhury (Johannes Kepler University Linz); Andreu Vall Portabella (Johannes Kepler University); Verena Haunschmid (Johannes Kepler University Linz); Gerhard Widmer (Johannes Kepler University)"Explainable predictions of emotion from music can be obtained by introducing an intermediate representation of mid-level perceptual features in the predictor deep neural network."

[B-10] Community-Based Cover Song DetectionJonathan Donier (Spotify Ltd)"We approach cover song detection by considering larger sets of potential versions for a given work, and create and exploit the graph of relationships between these versions. We show a significant improvement in performance over a 1-vs-1 method."

[B-11] Tracking Beats and Microtiming in Afro-Latin American Music Using Conditional Random Fields and Deep LearningMagdalena Fuentes (L2S CentraleSupéléc and LTCI Télécom ParisTech); Lucas Maia (Universidade Federal do Rio de Janeiro); Martín Rocamora (Universidad de la República); Luiz Biscainho (UFRJ); Helene-Camille Crayencour (CNRS); Slim Essid (Telecom Paristech); Juan Bello (New York University)"A CRF model is able to automatically and jointly track beats and microtiming in timekeeper instruments of Afro-Latin American music, in particular samba and candombe. This allows the study of microtiming profiles' dependency on genre and performer."

[B-12] Harmony Transformer: Incorporating Chord Segmentation into Harmony RecognitionTsung-Ping Chen (Academia Sinica); Li Su (Academia Sinica)"Incorporating chord segmentation into chord recognition using the Transformer model achieves improved performance over prior art."

[B-13] Statistical Music Structure Analysis Based on a Homogeneity-, Repetitiveness-, and Regularity-Aware Hierarchical Hidden Semi-Markov ModelGo Shibata ( Kyoto University); Ryo Nishikimi (Kyoto University); Eita NAKAMURA (Kyoto University); Kazuyoshi Yoshii (Kyoto University)"This paper proposes a solid statistical approach to music structure analysis based on a homogeneity-, repetitiveness-, and regularity-aware hierarchical hidden semi-Markov model."

[B-14] Towards Measuring Intonation Quality of Choir Recordings: A Case Study on Bruckner's Locus IsteChristof Weiss (International Audio Laboratories Erlangen); Sebastian J. Schlecht (International Audio Laboratories Erlangen); Sebastian Rosenzweig (International Audio Laboratories Erlangen); Meinard Müller (International Audio Laboratories Erlangen)"This paper proposes an intonation cost measure for assessing the intonation quality of choir singing. While capturing local frequency deviations, the measure includes a grid shift compensation for cases when the entire choir is drifting in pitch."

[B-15] Guitar Tablature Estimation with a Convolutional Neural NetworkAndrew Wiggins (Drexel University); Youngmoo Kim (Drexel University)"We propose a guitar tablature estimation system that uses a convolutional neural network to predict fingerings used by the guitarist from audio of an acoustic guitar performance."

Session C

Wednesday, November 6, 09.00-12.30 h

[C-00] Anniversary paper: Intelligent User Interfaces for Music Discovery: The Past 20 Years and What's to ComePeter Knees (Vienna University of Technology); Markus Schedl (Johannes Kepler University); Masataka Goto (National Institute of Advanced Industrial Science and Technology (AIST))"We reflect on the evolution of music discovery interfaces from using content-based analysis, to metadata, to interaction data, while access and listening habits shift from personal collections to streaming services; and extrapolate future trends."

[C-01] Learning a Joint Embedding Space of Monophonic and Mixed Music Signals for Singing VoiceKyungyun Lee (KAIST); Juhan Nam (KAIST)"The paper introduces a new method of obtaining a consistent singing voice representation from both monophonic and mixed music signals. Also, it presents a simple music mashup pipeline to create a large synthetic singer dataset."

[C-02] Augmenting Music Listening Experiences on Voice AssistantsMorteza Behrooz (University of California Santa Cruz); Sarah Mennicken (Spotify); Jennifer Thom (Spotify); Rohit Kumar (Spotify); Henriette Cramer (Spotify)"Using metadata about playlists, artists, and tracks, we present an approach inspired by story generation techniques to dynamically augment music streaming sessions on smart speakers with contextualized transitions."

[C-03] Coupled Recurrent Models for Polyphonic Music CompositionJohn Thickstun (University of Washington); Zaid Harchaoui (University of Washington); Dean Foster (Amazon); Sham Kakade (University of Washington)"This paper investigates automatic music composition via parameterized, probabilistic models of scores. We consider ways to exploit the structure of music to strengthen these models, borrowing ideas from convolutional and recurrent neural networks."

[C-04] Hit Song Prediction: Leveraging Low- and High-Level Audio FeaturesEva Zangerle (University of Innsbruck); Michael Vötter (University of Innsbruck); Ramona Huber (University of Innsbruck); Yi-Hsuan Yang (Academia Sinica)"We show that for predicting the potential success of a song, both low- and high-level audio features are important. We use a deep and wide neural network to model these features and perform a regression task on the track’s rank in the charts."

[C-05] Da-TACOS: A Dataset for Cover Song Identification and UnderstandingFurkan Yesiler (Universitat Pompeu Fabra); Chris Tralie ; Albin Correya (Music Technology Group, Universitat Pompeu Fabra); Diego Furtado Silva (Universidade Federal de São Carlos); Philip Tovstogan (Music Technology Group, Universitat Pompeu Fabra); Emilia Gomez (Universitat Pompeu Fabra); Xavier Serra (Universitat Pompeu Fabra )"This work aims to understand the links among cover songs with computational approaches and to improve reproducibility of Cover Song Identification task by providing a benchmark dataset and frameworks for comparative algorithm evaluation."

[C-06] Harmonic Syntax in Time: Rhythm Improves Grammatical Models of HarmonyDaniel Harasim (École Polytechnique Fédérale de Lausanne); Timothy O'Donnell (McGill University); Martin Rohrmeier (Lausanne)"This paper integrates rhythm into harmonic syntax models of harmony using a novel grammar of rhythmic phrases."

[C-07] Learning to Traverse Latent Spaces for Musical Score InpaintingAshis Pati (Georgia Institute of Technology); Alexander Lerch (Georgia Institute of Technology); Gaëtan Hadjeres (Sony CSL)"Recurrent Neural Networks can be trained using latent embeddings of a Variational Auto-Encoder-based model to to perform interactive music generation tasks such as inpainting."

[C-08] Detecting Stable Regions in Frequency Trajectories for Tonal Analysis of Traditional Georgian Vocal MusicSebastian Rosenzweig (International Audio Laboratories Erlangen); Frank Scherbaum (University of Potsdam); Meinard Müller (International Audio Laboratories Erlangen)"This paper gives a mathematically rigorous description of two conceptually different approaches (one based on morphological operations, the other based on binary time-frequency masks) for detecting stable regions in frequency trajectories."

[C-09] The AcousticBrainz Genre Dataset: Multi-Source, Multi-Level, Multi-Label, and Large-ScaleDmitry Bogdanov (Universitat Pompeu Fabra); Alastair Porter (Universitat Pompeu Fabra); Hendrik Schreiber (tagtraum industries incorporated); Julián Urbano (Delft University of Technology); Sergio Oramas (Pandora Media Inc.)"The AcousticBrainz Genre Dataset allows researchers to explore how the same music pieces are annotated differently by different communities following their own genre taxonomies, and how these differences can be addressed by genre recognition systems."

[C-10] Data-Driven Song Recognition Estimation Using Collective Memory Dynamics ModelsChristos Koutlis (Center for Research and Technology Hellas); Manos Schinas (CERTH); Vasiliki Gkatziaki (CERTH-ITI); Symeon Papadopoulos (Information Technologies Institute / Centre for Research & Technology - Hellas, GR); Yiannis Kompatsiaris (CERTH-ITI)"In this paper a composite track recognition model based on chart data, YouTube views and Spotify popularity is proposed and is evaluated on real data obtained from a survey conducted in Sweden."

[C-11] Towards Interpretable Polyphonic Transcription with Invertible Neural NetworksRainer Kelz (Austrian Research Institute for Artificial Intelligence (OFAI)); Gerhard Widmer (Johannes Kepler University)"Invertible Neural Networks enable direct interpretability of the latent space."

[C-12] Learning to Generate Music With SentimentLucas Ferreira (University of California, Santa Cruz); Jim Whitehead (University of California, Santa Cruz)"A new LSTM method for generating symbolic music with sentiment."

[C-13] Backtracking Search Heuristics for Solving the All-partition Array ProblemBrian Bemman (Aalborg University); David Meredith (Aalborg University)"This paper provides search heuristics for use with a greedy backtracking algorithm which solve a hard variant of a set-covering problem found in 12-tone serial music."

[C-14] Modeling and Learning Structural Breaks in Sonata FormsLaurent Feisthauer (Université de Lille); Louis Bigo (Université de Lille); Mathieu Giraud (CNRS, Université de Lille)"We trained a neural network with high-level musical feature to find medial caesura in string quartet movements written by Mozart. It finds correctly the MC for a little over half of the corpus."

[C-15] Auto-adaptive Resonance Equalization using Dilated Residual NetworksMaarten Grachten (N/A); Emmanuel Deruty (Sony CSL Paris); Alexandre Tanguy (Yascore)"We propose a method to fully automate resonance equalization in mixing and mastering musical audio. The method predicts the resonance attenuation factor using neural networks trained and evaluated on ground truth collected from sound engineers."

Session D

Wednesday, November 6, 14.30-17.30 h

[D-01] Analyzing User Interactions with Music Information Retrieval System: An Eye-tracking ApproachXiao Hu (The University of Hong Kong); Ying Que (The University of Hong Kong); Noriko Kando (National Institute of Informatics); Wenwei Lian (The University of Hong Kong)"Eye movement measures can be used in investigating user interactions with MIR systems."

[D-02] A Cross-Scape Plot Representation for Visualizing Symbolic Melodic SimilaritySaebyul Park (KAIST); Taegyun Kwon (KAIST); Jongpil Lee (KAIST); Jeounghoon Kim (KAIST); Juhan Nam (KAIST)"We propose a cross-scape plot representation to visualize multi-scaled melody similarity between two symbolic music. We evaluate its effectiveness on examples from folk music collections with similarity-based categories and plagiarism cases."

[D-03] JosquIntab: A Dataset for Content-based Computational Analysis of Music in Lute TablatureReinier de Valk (Department of Computing, Goldsmiths, University of London, UK); Ryaan Ahmed (MIT School of Humanities, Arts, and Social Sciences); Tim Crawford (Department of Computing, Goldsmiths, University of London, UK)"We present JosquIntab, a dataset of automatically created transcriptions (MIDI/MEI) of 64 lute intabulations; the creation algorithm; and our evaluation method. In two use cases, we demonstrate its usefulness for both MIR and musicological research."

[D-04] A Dataset of Rhythmic Pattern Reproductions and Baseline Automatic Assessment SystemFelipe Falcão (Universidade Federal de Campina Grande); Baris Bozkurt (Izmir Demokrasi Universitesi); Xavier Serra (Universitat Pompeu Fabra ); Nazareno Andrade (Universidade Federal de Campina Grande); Ozan Baysal (Istanbul Technical University)"This present work is an effort to address the shortage of music datasets designed for rhythmic assessment. A new dataset and baseline rhythmic assessment system are provided in order to support comparative studies about rhythmic assessment."

[D-05] Self-Supervised Methods for Learning Semantic Similarity in MusicMason Bretan (Samsung Research America); Larry Heck (Samsung Research America)"By combining self-supervised learning techniques based on contextual prediction with adversarial training we demonstrate it is possible to impose a prior distribution on a learned latent space without degrading the quality of the features."

[D-06] Blending Acoustic and Language Model Predictions for Automatic Music TranscriptionAdrien Ycart (Centre for Digital Music, Queen Mary University of London); Andrew McLeod (Kyoto University); Emmanouil Benetos (Queen Mary University of London); Kazuyoshi Yoshii (Kyoto University)"Dynamically integrating predictions from an acoustic and a language model with a blending model improves automatic music transcription performance on the MAPS dataset. Results are further improved by operating on 16th-note timesteps rather than 40ms."

[D-07] Modelling the Syntax of North Indian Melodies with a Generalized Graph GrammarChristoph Finkensiep (EPFL); Richard Widdess (SOAS University of Londond); Martin Rohrmeier (Lausanne)"Note- and interval-based models of hierarchical structure can be unified with a graph representation. Furthermore, leaps in melodies can be explained by latent structures such as the relative stability of pitches in a mode."

[D-08] A Comparative Study of Neural Models for Polyphonic Music Sequence TransductionAdrien Ycart (Centre for Digital Music, Queen Mary University of London); Daniel Stoller (Queen Mary University of London); Emmanouil Benetos (Queen Mary University of London)"A systematic study using various neural models and automatic music transcription systems shows that a cross-entropy-loss CNN improves transduction performance, while an LSTM does not. Using an adversarial set-up also does not yield improvement."

[D-09] Learning Similarity Metrics for Melody RetrievalFolgert Karsdorp (Meertens Institute); Peter Kranenburg (Meertens Insitute, Amsterdam, Netherlands); Enrique Manjavacas (University of Antwerp )"We compare different recurrent neural architectures to represent symbolic melodies as continuous vectors. We show how duplet and triplet loss functions can be used to learn distributional representations of symbolic music in an induced melody space."

[D-10] Multi-Task Learning of Tempo and Beat: Learning One to Improve the OtherSebastian Böck (Austrian Research Institute for Artificial Intelligence); Matthew Davies (INESC TEC); Peter Knees (Vienna University of Technology)"Multi-task learning helps to improve beat tracking accuracy if additional tempo information is used."

[D-11] Can We Increase Inter- and Intra-Rater Agreement in Modeling General Music Similarity?Arthur Flexer (Austrian Research Institute for Artificial Intelligence); Taric Lallai (Austrian Research Institute for Artificial Intelligence)"Models of general music similarity are problematic due to the subjective nature of music perception, which is shown and discussed by conducting a user experiment trying to improve the MIREX `Audio Music Similarity' task."

[D-12] AIST Dance Video Database: Multi-Genre, Multi-Dancer, and Multi-Camera Database for Dance Information ProcessingShuhei Tsuchida (National Institute of Advanced Industrial Science and Technology (AIST)); Satoru Fukayama (National Institute of Advanced Industrial Science and Technology (AIST)); Masahiro Hamasaki (National Institute of Advanced Industrial Science and Technology (AIST)); Masataka Goto (National Institute of Advanced Industrial Science and Technology (AIST))"AIST Dance Video Database is the first large-scale database containing original street dance videos with copyright-cleared music. It accelerates research of dance information processing such as dance-motion classification and dancer identification."

[D-13] Microtiming Analysis in Traditional Shetland Fiddle MusicEstefania Cano (Fraunhofer IDMT); Scott Beveridge (Sonquito)"The analysis of microtiming variations on a corpus of Shetland fiddle music, revealed characteristic patterns in the duration of beats and eighth notes that may be related to the suitability of fiddle music as an accompaniment to dancing."

[D-14] SUPRA: Digitizing the Stanford University Piano Roll ArchiveZhengshan Shi (Stanford University,CCRMA); Craig Sapp (Stanford University); Kumaran Arul (Stanford University); Jerry McBride (Stanford University); Julius Smith (Stanford University)"This paper describes the digitization process of SUPRA, an online database of historical piano roll recordings, which has resulted in an initial dataset of 478 performances of pianists from the early twentieth century transcribed to MIDI format."

[D-15] Fast and Flexible Neural Audio SynthesisLamtharn Hantrakul (Google Brain); Jesse Engel (Google); Adam Roberts (Google Brain); Chenjie Gu (Deepmind); Lamtharn Hantrakul (Google Brain)"We present an autoregressive WaveRNN model capable of synthesizing realistic audio that closely follows fine-scale temporal conditioning for loudness and fundamental frequency."

Session E

Thursday, November 7, 09.00-12.30 h

[E-00] Anniversary paper: 20 Years of Automatic Chord Recognition from AudioJohan Pauwels (Queen Mary University of London); Ken O'Hanlon (QMUL); Emilia Gomez (Universitat Pompeu Fabra); Mark B. Sandler (Queen Mary University of London)"Looking back on 20 years of automatic chord recognition in order to move forwards"

[E-01] DeepSRGM - Sequence Classification and Ranking in Indian Classical Music Via Deep LearningSathwik Tejaswi Madhusudhan (University of Illinois, Urbana Champaign); Girish Chowdhary (University of Illinois at Urbana Champaign)"In this work, we propose deep learning based methods for Raga recognition and sequence ranking in Indian classical music. Our approach employs efficient pre-possessing and learns temporal sequences in music data using LSTM Recurrent Neural Networks."

[E-02] Modeling Music Modality with a Key-Class Invariant Pitch Chroma CNNAnders Elowsson (KTH Royal Institute of Technology); Anders Friberg (KTH Royal Institute of Technology)"When analyzing musical harmony with a CNN it can be beneficial to: start from a pretrained pitch transcription system using deep layered learning, compute a pitch chroma within the CNN, and promote key invariance through pooling across key class."

[E-03] Convolutional Composer ClassificationHarsh Verma (University of Washington); John Thickstun (University of Washington)"This paper investigates the effectiveness simple convolutional models for attributing composers to musical scores, evaluated on a corpus of 2,500 scores authored by a variety of composers spanning the Renaissance era to the early 20th century."

[E-04] A Diplomatic Edition of Il Lauro Secco: Ground Truth for OMR of White Mensural NotationEmilia Parada-Cabaleiro (University of Augsburg); Anton Batliner (University of Augsburg); Björn Schuller (University of Augsburg)"We present a symbolic representation in mensural notation of the anthology Il Lauro Secco. For musicological analysis we encoded the repertoire in **mens and MEI; to support OMR research we present ground truth in agnostic and semantic formats."

[E-05] The Harmonix Set: Beats, Downbeats, and Functional Segment Annotations of Western Popular MusicOriol Nieto (Pandora); Matthew McCallum (Pandora); Matthew Davies (INESC TEC); Andrew Robertson (Ableton); Adam Stark (MI·MU); Eran Egozy (MIT)"Human annotated dataset containing beats, downbeats, and structural segmentation for over 900 pop tracks."

[E-06] FMP Notebooks: Educational Material for Teaching and Learning Fundamentals of Music ProcessingMeinard Müller (International Audio Laboratories Erlangen); Frank Zalkow (International Audio Laboratories Erlangen)"The FMP notebooks include open-source Python code, Jupyter notebooks, detailed explanations, as well as numerous audio and music examples for teaching and learning MIR and audio signal processing."

[E-07] Automatic Assessment of Sight-reading ExercisesJiawen Huang (Georgia Institute of Technology); Alexander Lerch (Georgia Institute of Technology)"This paper shows the relevancy of different features as well as the contribution of different feature groups to different assessment categories for sight-reading exercises."

[E-08] Supervised Symbolic Music Style Translation Using Synthetic DataOndrej Cífka (Télécom Paris); Umut Simsekli (Telecom ParisTech); Gael Richard (Telecom Paristech)"Synthetic data is useful for learning to efficiently transform musical style."

[E-09] Deep Music Analogy Via Latent Representation DisentanglementRuihan Yang (New York University Shanghai); Dingsu Wang (NYU Shanghai); Ziyu Wang (NYU Shanghai); Tianyao Chen (NYU Shanghai); Junyan Jiang (Carnegie Mellon University); Gus Xia (New York University Shanghai)"We contribute a representation disentanglement method tailored for music composition, which enables to achieve domain-free music analogy-making."

[E-10] Query by Video: Cross-modal Music RetrievalBochen Li (University of Rochester); Aparna Kumar (Spotify)"This paper presents a cross-modal distance learning model to retrieve music for videos based on emotion concepts. The emotion constraints on the model allow for efficient training."

[E-11] Investigating CNN-based Instrument Family Recognition for Western Classical Music RecordingsMichael Taenzer (Fraunhofer IDMT); Jakob Abeßer (Fraunhofer IDMT); Stylianos I. Mimilakis (Fraunhofer IDMT); Christof Weiss (International Audio Laboratories Erlangen); Meinard Müller (International Audio Laboratories Erlangen)"This paper describes extensive experiments for CNN-based instrument family recognition systems. In particular, it studies the effect of data normalization, pre-processing, and augmentation techniques on the generalization capability of the models."

[E-12] A Bi-Directional Transformer for Musical Chord RecognitionJonggwon Park (Seoul National University); Kyoyun Choi (Seoul National University); Sungwook Jeon (Seoul National University); Dokyun Kim (Seoul National University); Jonghun Park (Seoul National University)"We propose bi-directional Transformer model based on self-attention mechanism for chord recognition. Through an attention map analysis, we visualize how attention was performed and conclude that the model can effectively capture long-term dependency."

[E-13] SAMBASET: A Dataset of Historical Samba de Enredo Recordings for Computational Music AnalysisLucas Maia (Universidade Federal do Rio de Janeiro); Magdalena Fuentes (L2S CentraleSupéléc and LTCI Télécom ParisTech); Luiz Biscainho (UFRJ); Martín Rocamora (Universidad de la República); Slim Essid (Telecom Paristech)"SAMBASET is a large samba de enredo dataset that includes rich metadata, beat and downbeat annotations. It could provide challenges to state-of-the-art algorithms in MIR tasks such as rhythmic analysis, vocal F0 and chord estimation, among others."

[E-14] Deep-Rhythm for Global Tempo Estimation in MusicHadrien Foroughmand (IRCAM Lab - CNRS - Sorbonne Université); Geoffroy Peeters (Telecom ParisTech)"Estimation of tempo or rhythm description using a new 4D representation of the harmonic series related to tempo used as an input to a convolutional neural network which is trained to estimate the tempo or the rhythm pattern classes."

[E-15] Large-vocabulary Chord Transcription Via Chord Structure DecompositionJunyan Jiang (Carnegie Mellon University); Ke Chen (Fudan University); Wei Li (Fudan University); Gus Xia (New York University Shanghai)"In this paper, we propose a new model for large-vocabulary chord recognition by chord structure decomposition with state-of-the-art performance on different metrics."

Session F

Thursday, November 7, 13.30-17.00 h

[F-01] BandNet: A Neural Network-based, Multi-Instrument Beatles-Style MIDI Music Composition MachineYichao Zhou (UC Berkeley); Wei Chu (Liulishuo); Sam Young (UCLA); Xin Chen (Snap Inc)"We propose a recurrent neural network (RNN)-based MIDI music composition machine that is able to learn musical knowledge from existing Beatles' music and generate full songs in the style of the Beatles with little human intervention."

[F-02] Can We Listen To It Together?: Factors Influencing Reception of Music Recommendations and Post-Recommendation BehaviorJin Ha Lee (University of Washington); Liz Pritchard (University of Washington); Chris Hubbles "In addition to the aesthetic qualities of music and the respondent’s taste, expectations regarding the delivery, familiarity, trust in the recommender’s abilities, and the rationale for suggestions affected people’s reception of recommendations."

[F-03] Adversarial Learning for Improved Onsets and Frames Music TranscriptionJong Wook Kim (New York University); Juan Bello (New York University)"Piano roll prediction in music transcription can be improved by appending an additional loss incurred by an adversarial discriminator."

[F-04] Automatic Music Transcription and Ethnomusicology: a User StudyAndre Holzapfel (KTH Royal Institute of Technology in Stockholm); Emmanouil Benetos (Queen Mary University of London)"After decades of developing Automatic Music Transcription (AMT) systems, this paper conducts a first user study with experienced transcribers to shed light on the potential and drawbacks of incorporating AMT into manual transcription practice."

[F-05] LakhNES: Improving Multi-instrumental Music Generation with Cross-domain Pre-trainingChris Donahue (UC San Diego); Huanru Henry Mao (UC San Diego); Yiting Ethan Li (UC San Diego); Garrison Cottrell (University of California, San Diego); Julian McAuley (UCSD)"We use transfer learning to improve multi-instrumental music generation by first pre-training a Transformer on a large heterogeneous music dataset (Lakh MIDI) and subsequently fine tuning it on a domain of interest (NES-MDB)."

[F-06] Taking Form: A Representation Standard, Conversion Code, and Example Corpora for Recording, Visualizing, and Studying Analyses of Musical FormMark Gotham (Cornell University); Matthew Ireland (Cambridge University)"We provide new specification standards for representing human analyses of musical form, along with corpora of examples, and code for working with them."

[F-07] Learning Complex Basis Functions for Invariant Representations of AudioStefan Lattner (Sony Computer Science Laboratories, Paris); Monika Dörfler (University of Vienna); Andreas Arzt (Johannes Kepler University)"The "Complex Autoencoder" learns features invariant to transposition and time-shift of audio in CQT representation. The features are competitive in a repeated section discovery, and in an audio-to-score alignment task."

[F-08] Folded CQT RCNN For Real-time Recognition of Instrument Playing TechniquesJean-Francois DUCHER (IRCAM); Philippe Esling (IRCAM)"We extend state-of-the-art deep learning models for instrument recognition to the real-time classification of instrument playing techniques. Our models generalize better with a proper taxonomy and an adapted input transform."

[F-09] humdrumR: a New Take on an Old Approach to Computational MusicologyNathaniel Condit-Schultz (Georgia Institute of Technology); Claire Arthur (Georgia Institute of Technology)"Describes a new software toolkit for computational musicology research."

[F-10] Tunes Together: Perception and Experience of Collaborative PlaylistsSo Yeon Park (Stanford University); Audrey Laplante (Université de Montréal); Jin Ha Lee (University of Washington); Blair Kaneshiro (Stanford University)"Collaborative playlists (CPs) are critical in bringing back social connectedness to music enjoyment. We characterize purposes and connotations of CPs as well as elucidate similarities and differences between users and non-users with the CP Framework."

[F-11] A Holistic Approach to Polyphonic Music Transcription with Neural NetworksMiguel Roman (University of Alicante); Antonio Pertusa (University of Alicante); Jorge Calvo-Zaragoza (University of Alicante)"A neural network architecture is trained in an end-to-end manner to transcribe music scores in humdrum **kern format from polyphonic audio files."

[F-12] Generalized Metrics for Single-f0 Estimation EvaluationRachel Bittner (Spotify); Juan Jose Bosch (Spotify)"We show a variety of limitations in widely used metrics for measuring the accuracy of single-f0 estimation systems, and propose a generalization which considers non-binary voicing decisions and a weighted scoring of pitch estimations."

[F-13] Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational AutoencodersYin-Jyun Luo (Singapore University of Technology and Design); Kat Agres (ASTAR IHPC); Dorien Herremans (SUTD)"We disentangle pitch and timbre of musical instrument sounds by learning separate interpretable latent spaces using Gaussian mixture variational autoencoders. The model is verified by controllable sound synthesis and many-to-many timbre transfer."

[F-14] The ISMIR Explorer - A Visual Interface for Exploring 20 Years of ISMIR PublicationsThomas Low (Otto von Guericke University); Christian Hentschel (Hasso-Plattner-Institut); Sayantan Polley (Otto von Guericke University); Anustup Das (Otto von Guericke University); Harald Sack (FIZ Karlsruhe); Andreas Nurnberger ( Magdeburg University); Sebastian Stober (Otto von Guericke University)"We present a visual user interface for exploring the cumulative ISMIR proceedings based on locally aligned neighborhood maps containing semantically similar papers. Use this to search for related work or to discover interesing new topics!"

[F-15] Pattern Clustering in Monophonic Music by Learning a Non-Linear Embedding From Human AnnotationsTimothy de Reuse (McGill University); Ichiro Fujinaga (McGill University)"Musical pattern discovery can be taken as a clustering task, incorporating manual annotations of repeated patterns as a way of specifying the kinds of patterns desired."

[F-16] A Study of Annotation and Alignment Accuracy for Performance Comparison in Complex Orchestral MusicThassilo Gadermaier (Johannes Kepler University Linz); Gerhard Widmer (Johannes Kepler University)"Annotations of the "beat" of complex orchestral music have considerable uncertainty due to disagreement of annotators. A comparison of typical uncertainties to accuracies achieved by transfer of annotations using dynamic time-warping is given."

[F-17] Mapping Timing Strategies in Drum PerformanceGeorge Sioros (University of Oslo); Guilherme Câmara (University of Oslo); Anne Danielsen (University of Oslo)"We present a novel method for the analysis and visualization of microtiming relations between instruments and apply it to drum performances with three different timing profiles (on, pushed and laidback) from a laboratory experiment."

[F-18] Improving Singing Aid System for Laryngectomees With Statistical Voice Conversion and VAE-SPACELi Li (University of Tsukuba); Tomoki Toda (Nagoya University); Kazuho Morikawa (Graduate School of Informatics, Nagoya University); Kazuhiro Kobayashi (Nagoya University); Shoji Makino (University of Tsukuba)"An improved singing aid system for laryngectomees is developed, which converts EL speeches into singing voices according to the melodic information by applying a statistical VC approach to enhance phonetic features and VAE-SPACE to control pitch."

Session G

Friday, November 8, 09.00-12.30 h

[G-01] Approachable Music Composition with Machine Learning at ScaleCheng-Zhi Anna Huang (Google Brain); Curtis Hawthorne (Google Brain); Adam Roberts (Google Brain); Monica Dinculescu (Google Brain); James Wexler ; Leon Hong (Google); Jacob Howcroft (Google)"We show behind the scenes how the Bach Doodle works, the design, how we sped up the machine learning model Coconet to run in the browser. We are also releasing a dataset of 21.6 million melody and harmonization pairs, along with user ratings."

[G-02] Scalable Searching and Ranking for Melodic Pattern QueriesPhilippe Rigaux (CNAM); Nicolas Travers (DVRC)"We focus in this paper on the scalable content-based retrieval problem. We consider the search mechanism with a monophonic query pattern in order to retrieve from a very large collection of scores one or more fragments "similar" to this pattern."

[G-03] Adaptive Time–Frequency Scattering for Periodic Modulation Recognition in Music SignalsChanghong Wang (Queen Mary University of London); Emmanouil Benetos (Queen Mary University of London); Vincent Lostanlen (Cornell Lab of Ornithology); Elaine Chew (CNRS-UMR9912/STMS IRCAM, Paris, France)"Scattering transform provides a versatile and compact representation for analysing playing techniques."

[G-04] Controlling Symbolic Music Generation based on Concept Learning from Domain KnowledgeTaketo Akama (Sony CSL)"ExtRes is a generative model that learns decoupled concept spaces, given human domain knowledge. It provides concept-aware (e.g., rhythm, contour) controllability in interpolation and variation generation for symbolic music."

[G-05] Unmixer: An Interface for Extracting and Remixing LoopsJordan Smith (Queen Mary University of London); Yuta Kawasaki (National Institute of Advanced Industrial Science and Technology (AIST)); Masataka Goto (National Institute of Advanced Industrial Science and Technology (AIST))"Unmixer is a web interface where users can upload music, extract loops, remix them, and mash-up loops from different songs. To extract loops with source separation, we use a nonnegative tensor factorization method improved with a sparsity constraint."

[G-06] Quantifying Disruptive Influence in the AllMusic GuideFlavio Figueiredo (UFMG); Nazareno Andrade (Universidade Federal de Campina Grande)"What is disruption? Different from being popular, being disruptive usually means bringing something ground-breaking to the table. In this work, we measure and detail how artists are disruptive using a human-curated music corpora."

[G-07] Leveraging knowledge bases and parallel annotations for music genre translationElena Epure (Deezer R&D); Anis KHLIF (Deezer R&D); Romain Hennequin (DEEZER)"In this paper, we explore the problem of translation of music genres between multiple tag systems, with or without common annotated corpus."

[G-08] Generating Structured Drum Pattern Using Variational Autoencoder and Self-similarity MatrixI-CHIEH WEI (Institute of Information Science Academia Sinica); Chih-Wei Wu (Netflix, Inc.); Li Su (Academia Sinica)"A drum pattern generation model based on VAE-GAN is presented; the proposed method generates symbolic drum patterns given a melodic track. Self-similarity matrix (SSM) is incorporated in the process for encapsulating structural information."

[G-09] Rendering Music Performance With Interpretation Variations Using Conditional Variational RNNAkira Maezawa (Yamaha Corporation); Kazuhiko Yamamoto (Yamaha Corporation); Takuya Fujishima (Yamaha Corporation)"Our performance rendering method discovers latent sources of expressive variety, and also allows users to control such sources of expressive variations when rendering."

[G-10] An Interactive Workflow for Generating Chord Labels for Homorhythmic Music in Symbolic FormatsYaolong Ju (McGill University); Samuel Howes (McGill University); Cory McKay (Marianopolis College); Nathaniel Condit-Schultz (Georgia Institute of Technology); Jorge Calvo-Zaragoza (University of Alicante); Ichiro Fujinaga (McGill University)"An Interactive Workflow for Generating Chord Labels for Homorhythmic Music in Symbolic Formats"

[G-11] Quantifying Musical Style: Ranking Symbolic Music based on Similarity to a StyleJeffrey Ens (Simon Fraser University); Philippe Pasquier (Simon Fraser University)"StyleRank is a method to rank MIDI files based on their similarity to a style defined by an arbitrary corpus."

[G-12] Audio Query-based Music Source SeparationJie Hwan Lee (Seoul National University ); Hyeong-Seok Choi (Seoul National University); Kyogu Lee (Seoul National University)"An audio-query based source separation method that is capable of separating the music source regardless of the number and/or kind of target signals. Various useful scenarios are suggested such as zero-shot separation, latent interpolation and etc."

[G-13] Mosaic Style Transfer Using Sparse AutocorrelogramsDaniel MacKinlay (UNSW Sydney); Zdravko Botev (UNSW Sydney)"We apply sparse dictionary decomposition twice to autocorrelograms of signals, to get a novel analysis of and method for mosaicing music style transfer, which has the novel feature of handling time-scaling of the source audio naturally."

[G-14] Automatic Choreography Generation with Convolutional Encoder-decoder NetworkJuheon Lee (Seoul National University); Seohyun Kim (Seoul National University); Kyogu Lee (Seoul National University)"In this paper, we proposed an encoder-decoder neural network that generates choreography that matches with given music. As a result of the evaluation, we showed that the proposed network created a natural choreography that matched the music."

[G-15] Hierarchical Classification Networks for Singing Voice Segmentation and TranscriptionFu Zih-Sing (National Taiwan University); Li Su (Academia Sinica)"A note transcription method for singing voice, implemented by novel hierarchical classification networks, achieves the performance better than before."

[G-16] VirtuosoNet: A Hierarchical RNN-based System for Modeling Expressive Piano PerformanceDasaem Jeong (KAIST); Taegyun Kwon (KAIST); Yoojin Kim (KAIST); Kyogu Lee (Seoul National University); Juhan Nam (KAIST)"We present an RNN-based model that reads MusicXML and generates human-like performance MIDI. The model employs a hierarchical approach by using attention network and an independent measure-level estimation module. We share our code and dataset."

[G-17] MIDI Passage Retrieval Using Cell Phone Pictures of Sheet MusicDaniel Yang (Harvey Mudd College); Thitaree Tanprasert (Harvey Mudd College); Teerapat Jenrungrot (Harvey Mudd College); Mengyi Shan (Harvey Mudd College); Timothy Tsai (Harvey Mudd College)"We develop a system which enables a person to take a cell phone picture of a page of sheet music, and to automatically retrieve the matching portion of a corresponding MIDI file."

[G-18] A Convolutional Approach to Melody Line Identification in Symbolic ScoresFederico Simonetta (Università di Milano); Carlos Eduardo Cancino-Chacón (Austrian Research Institute for Artificial Intelligence); Stavros Ntalampiras (University of Milan); Gerhard Widmer (Johannes Kepler University)"We propose a new approach to identifying the most salient melody line in a symbolic score, consisting of a CNN estimating the probability that each note in the score belongs to the melody. This task is important for both MIR and Musicology."