Procedural music generation project (2018)

(c) Ed Earl 2019; all rights reserved.

Listen here:

The idea of this project is to produce new music by changing the chords of existing music and finding new notes to fit the new chords, while maintaining certain musical features of the original song, such as which notes are chord tones and the way in which parts move. Only the note pitch values are changed - the timing, volume and duration of notes are preserved (with a minor exception described below). My expectation was that the imperfections in the techniques used to do this might produce surprising results, and the resulting MIDI artefacts usually sound musical, sometimes interesting, without being immediately recognisable as deriving from the reference song in an obvious way.

The inputs are (1) a musical work in MIDI format; (2) chord annotations for each timestep of the work; and (3) a set of new chords. (Obtaining chord annotations (2) is the most onerous part of the process, as MIDI data with accompanying chord annotations are not widely available. I combined data from many sources and also annotated many works by hand.) The input chords are ranked by frequency of occurrence and mapped to the set of new chords. Each chord in the original list of annotations is then replaced according to the mapping. This gives a new list of chord annotations for each timestep.

Secondary musical features are computed from the input music as follows.

  • The music is segmented into independent monophonic voices using an algorithm based on Mearns, “The Computational Analysis of Harmony in Western Art Music”, 2013.
  • Each note is labelled, with reference to the original chord annotations, as either a chord tone, a recognised non-chord tone such as a passing tone or neighbour tone, or an unrecognised non-chord tone, using my own hand-written algorithm. (The labelling process uses “horizontal” information such as whether a note was approached by skip or by step, which is why the music needs to be segmented into voices.)
  • A matrix is derived from the notes in which each value is the pitch delta from the previous note. For each voice, the probability of every first order transition is calculated, i.e. the probability of a pitch delta of size x, given that the previous pitch delta in the same voice was of size y. Intuitively, this captures information such as “an octave jump up in the bassline is usually followed by a drop of a third”.
  • Meredith’s SIATECCompress algorithm (“Music analysis and point-set compression”, Meredith, 2015) is used to find short patterns of notes which are repeated in different places in the input work, and the locations of the repetitions. Intuitively, this captures information such as “the bassline plays the same pattern of four notes in bars 1, 5, 16 and 20”.

New music is generated using a “generate-and-test” strategy. Notes are changed at random (within constraints), and the resulting music is assigned a score; if the score is better than before, the changes are kept. For the first 100 iterations, every note pitch value is changed; thereafter, one randomly selected note pitch value is changed per iteration. As an optimisation, a list is kept of all single-note changes which have been tried, allowing the algorithm to avoid wasting its time by testing the same change more than once. A run of 1000 iterations seems to strike a good balance between time taken and quality of the output.

Note pitch values are changed using the following sequence of operations.

  • A note pitch value is chosen randomly from within the range of the part in the input music.
  • The locations of repetitions of short patterns of notes, found using SIATECCompress, are preserved exactly, by copying the note pitch value from one location to another. Using the same example as before, if a note pitch value is changed in the bassline in bar 16, the same value would be copied to the equivalent beat position in bars 1, 5 and 20. (This doesn’t mean that the actual note pitch values are the same as the input song, but that the locations of repetitions are the same.)
  • Every note pitch value is quantized to the current scale, the scale being derived from the chord annotation. This is done regardless of whether the note is labelled as a chord tone.
  • Clashing ties are broken. In the original music, a note could be tied over a chord change, sounding consonant against both chords. In the new music, tied notes will not necessarily sound consonant when the chord changes; when a tied note is unacceptably dissonant (defined as falling outside the scale), the note duration is shortened to make the note end when the chord changes.
  • Within each block of notes which share the same chord annotation, the note lowest in pitch is adjusted to the root note of the chord. This is an arbitrary step with little justification other than that it makes the generated music sound more coherent.

The scoring method has a big impact on the character of the musical artefacts which are produced, the aim being for better scores to be associated with higher quality musical artefacts. In this case, the standard of quality was some combination of sounding musically convincing, surprising and interesting to my ears.

A common practical problem I have encountered with the “generate-and-test” strategy is a kind of paralysis in which the search gets stuck in local minima, unable to improve the score by making small changes. This problem seems to be exacerbated by using a larger number of score components; therefore, I tried to find an effective scoring implementation based on as few components as possible. After testing many scoring methods, and tweaking score weights by trial and error, I arrived at the following.

  • The new notes are labelled with reference to the new chord annotations, and a score penalty is assigned based on the proportion of labels which differ from the original labels. More weight is assigned to voices which play mostly chord tones in the input music. This is supposed to incentivise a pattern of chord tones and non-chord tones which is as similar as possible to the original music.
  • A score penalty is assigned based on the difference between each voice’s average note pitch in the new music and in the original music.This incentivises voices to play in a similar register to the original music.
  • Note deltas are calculated for the new music. The probability of each note delta, given the previous note delta, is calculated from the first order transition table derived from the original song, and a score component is derived from the probabilities. This incentivises each voice to have similar patterns of movement to the original music, and also constrains the voices to move in ways which are generally musically convincing (for example, avoiding sudden large skips).