読みたいリスト
#list
#mir
Codified audio language modeling learns useful representations for music information retrieval
Melody transcription via generative pre-training
A diffusion-inspired training strategy for singing voice extraction in the waveform domain
MT3: Multi-Task Multitrack Music Transcription
Sketching the Expression: Flexible Rendering of Expressive Piano Performance with Self-Supervised Learning
MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation
CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval
Perceptual Vs. Automated Judgements of Music Copyright Infringement
VOCAL MELODY EXTRACTION USING PATCH-BASED CNN
Automatic lyrics alignment and transcription in polyphonic music: Does background music help?
Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming
#sythesis
MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling
Continuous descriptor-based control for deep audio synthesis
Masked Autoencoders that Listen
#feature_conditioning
Film: Visual reasoning with a general conditioning layer
Attentional feature fusion
Arbitrary style transfer in real-time with adaptive instance normalization
Guided Image-to-Image Translation with Bi-Directional Feature Transformation
Hierarchical question-image co-attention for visual question answering
#analysis_voice
Robust Bayesian pitch tracking based on the harmonic model
Phoneme-to-audio alignment with recurrent neural networks for speaking and singing voice
The “Overdrive” mode in the “Complete Vocal Technique”: a preliminary study
Production Strategies of Vocal Attitudes
#analysis_music
The evolution of popular music: USA 1960–2010
Investigating style evolution of Western classical music: A computational approach
P4KxSpotify: A Dataset of Pitchfork Music Reviews and Spotify Musical Features
EVOLUTION OF THE INFORMATIONAL COMPLEXITY OF CONTEMPORARY WESTERN MUSIC
Score-informed analysis of tuning, intonation, pitch modulation, and dynamics in jazz solos
#data_availability
Learning Multiple Dense Prediction Tasks from Partially Annotated Data
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
Connectionist temporal localization for sound event detection with sequential labeling
Interactive Multi-Class Tiny-Object Detection
W-CTC: A CONNECTIONIST TEMPORAL CLASSIFICATION LOSS WITH WILD CARDS
Weighted Training for Cross-Task Learning
Deep Hough-Transform Line Priors
#generation
S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation
Analysis of Acoustic Features Affecting “Singing-ness” and Its Application to Singing-Voice Synthesis from Speaking-Voice
The Singing Tutor: Expression Categorization and Segmentation of the Singing Voice
Automatic Characterization of Dynamics and Articulation of Expressive Monophonic Recordings
Relationships Between Lyrics and Melody in Popular Music