McGill Billboard
For each piece in the McGill Billboard dataset, the nonnegative-least-squares (NNLS) chromagram is computed with the Chordino VAMP plugin, in which the default settings of the plugin are adopted. Combining the 12-D treble chroma and the 12-D bass chroma, for each track we obtain a 24-by-T-dimensional chromagram, where T represents the length of the track. 7 Each input sequence for the Harmony Transformer contains 100 segments (around 23 sec), and is generated through a sliding window of frame size 21 with hop size 5. Following 13 (see Section 2.2), pieces with id numbers smaller than 1000 are used for training, and the remaining for testing; also, identical pieces are filtered out. As a result, there are 5,647 sequences for training and 1,628 sequences for testing.