Making Music from Images With Python

To say that music is an integral component in my life is a massive understatement. I've been a guitar player for over 20 years and not a day goes by where I don't listen to it. As a matter of fact, before embarking in my academic journey as a physical chemist over the past 12 years I was on the brink of pursuing a career in music.

Though, I wasn't able to pursue that path, I am still able to enjoy music and do cool things with it and through it. For instance, in this project, I thought it'd be cool to make a program that could make music from images. There have been some pretty neat attempts to generate such a thing in the past, however, the results are not quite ... musical ...

For the remainder of this article I'll be showing my attempt at generating songs from images that are (in my humble opinion) pretty cool sounding if I say so myself.

This post will be primarily detailing the main results and showcase some neat examples of the program. If you want to see the full code, you can find it here and/or here.

The Basic Idea

My strategy and thought process to accomplish this is as follows:

Images are made of pixels
Pixels are composed of arrays of numbers that designate color
Color is described via color spaces like RGB or HSV
The color space could be partitioned into sections
Musical scales are subdivided into notes via sound intervals
Sound is vibration and so each note is associated with a frequency
Therefore, the subdivision of a color space could be mapped to a specific note in a musical scale. This note would then have a frequency associated with it.

Let's give it a go!

Using the HSV Color Space

HSV is color space that is controlled by 3 values. The 3 values are Hue, Saturation, and Brightness.

Hue is defined as "the degree to which a stimulus can be described as similar to or different from stimuli that are described as red, orange, yellow, green, blue, violet". In other words, Hue represents color.

Saturation is defined as "colorfulness of an area judged in proportion to its brightness". In other words, Saturation represents the amount to which a color is mixed with white.

Brightness is defined as "perception elicited by the luminance of a visual target". In other words, Saturation represents the amount to which a color is mixed with black.

Hue values of basic colors:

Orange 0-44
Yellow 44- 76
Green 76-150
Blue 150-260
Violet 260-320
Red 320-360

I'll work in HSV color space because it already is naturally subdivided which makes the subsequent frequency mapping a bit more intuitive and the hue channel (which is mostly reponsible for color in this space) is defused from the other two channels which simplifies things.

Here's an example of a comparison between color spaces for an image I found online

Extracting the Hue channel

Now that we have our image in HSV, let's extract the hue (H) value from every pixel. This can be done via a nested for loop over the height and width of the image.

Now that I have an array containing the H value for each pixel, I'll place that result into a pandas dataframe. Each row in the dataframe is a pixel and thus each column will contain information about that pixel. I'll call this dataframe pixels_df which can be seen below.

The dataframe is currently composed of a single column called 'hues' where each row represents the H-channel for each pixel on the image I loaded.

Converting hues to frequencies

My initial idea at converting a hue value into a frequency involved a mapping between a predetermined set of frequencies to the H value. The mapping function is shown below.

The function takes the H value and an array containing frequencies to map H to as inputs. The example uses an array called scale_freqs to define the frequencies. The frequencies used in scale_freqs correspond to the A Harmonic Minor Scale.

Then, an array of threshold values (called thresholds) for H is defined. This array of thresholds can then be used to convert H into a frequency from scale_freqs using a lammbda function.

Converting NumPy Array into Playable Audio

Cool! Now, that I have an array of frequencies, I'll convert the notes column into a numpy array called frequencies since I can then use this to make a playable audio file :] To do this I can use the wavfile.write function that is built in scipy and make sure that I use the appropriate data type conversion (for 1D arrays it is np.float32).

You can hear the song that I made using the first 60 pixels from the image below (I could try using all of the 230,400 pixels that make up that image but that song would end up being several hours long).

Song 1 - Also known as the song that came before Blur's hit song from 1997 Song 2

That's pretty neat! Let me play with it a bit more.

Including Octaves

I decided to include the effect of octaves (i.e., make notes sound higher or lower) into my 'song-making' routine. The octave to be used for a given note will be chosen at random from an array.

Let's give it a listen!

Song of Octaves (the lesser known cousin of Song of Storms)

Awesome! Now we are getting some variety. However... We do have all these pixels, how about we try using them by picking the frequencies from random pixels?

Calcucore

Sweet! We now basically have a song generator that we can play around with for as long as we wish!

I know it's a bit of a meme, but "Is this math rock?"

Generating Other Scales

So far I've shown how music can be generated from images using the A-Harmonic Minor Scale. As cool as this scale is, it would be nice to be able to get more variety not only in terms of the starting note (the tonic) of our scale, but also be able to include other intervals besides those defined by the harmonic minor scale structure. This will allow our program to add a lot more flavor and variety to the generated songs.

To do this, I first needed a way to procedurally generate frequencies for whatever tonic note we wish to use. Katie He has a wonderful article found here that has a really neat exploration of python and music. I adapted one of her functions in my work to map piano notes to frequencies as shown below:

The function above will serve as the starting point for my song/scale generating routine can be used to generate a dictionary that maps musical notes corresponding to the 88 keys in a standard piano to frequencies in units of Hertz as shown below.

Then we need to define the scale intervals in terms of tones so that we can index out notes.

Having this, we can now find the index of our scale in the list of tones from before. This is needed because I'll then reindex the list so that it starts with our desired tonic

After this, I can define a bunch of different scale arrays where each element corresponds to an index from the reindex array I made in the previous step.

Almost ready! I'll also define intervals to use here in case I want to make harmonies for the songs we make.

And now I can take the results from the previous steps and make a song!

Let's test it out on a few images!

Examples of Image To Music Routine

Pixel Art Song - Made using A Harmonic Minor and the 3rd Octave range

Waterfall Song - Made using D Major and the 1st Octave range

Peacock Song - Made using E Dorian and the 3rd Octave range

Cat Song - Made using F Aeolian and the 2nd Octave range

Water Song - Made using B Lydian and the 2nd Octave range

Earth Song - Made using G Melodic Minor and the 3rd Octave range

Catterina Song - Made using A Harmonic Minor and the 3rd Octave range

Including harmony in songs through 2D NumPy arrays

The routine I developed allows harmony to be added to our song. The user can define what harmony to use (i.e., perfect fifth, minor 6th, etc.) and then the correct note interval is deduced from that using the routines I showed earlier. Below I'll show an example of the song made from an image, the corresponding harmony and both of them put together into a single .wav file via a 2D numpy array.

From the documentation for scipy.io.wavfile.write, if I want to write a 2D array into a .wav file, the 2D array must be have dimensions in the form of (Nsamples, Nchannels). Notice how the shape of our array is currently (2, 264600). This means we have Nchannels = 2 and Nsamples = 264600. To ensure our numpy array has the correct shape for scipy.io.wavfile.write I'll transpose the array first.

Nature Song - Made using A# Harmonic Minor, the 2nd Octave range, and minor 3rd harmony.

Adding Effects to Our Music with Pedalboard

Now I'm going to load the .wav files and do some extra manipulation on it using the pedalboard module from Spotify which is awesome. You can read more about the pedalboard library here and here.

Water Song - Made using B Lydian and the 2nd Octave range

Catterina Song - Made using A Harmonic Minor and the 3rd Octave range

Nature Song - Made using A# Harmonic Minor, the 2nd Octave range, and minor 3rd harmony.

Using Librosa For Mapping Other Musical Quantities

Librosa is a wonderful package that allows one to carry out a variety of operations on sound data. Here I used it to readily convert frequencies into 'Notes' and 'Midi Numbers'. Musical Instrument Digital Interface (MIDI) files are used as a file format that can connect to a wide variety of electronic musical instruments, computers and other audio devices. As such, having the ability to save our songs into this format would enable other interested musicians or programmers to take the song and experiment with them.

Making a MIDI from our Song

Now that I've generated a dataframe containing frequencies, notes and midi numbers I can make a midi file out of it! I could then use this MIDI file to generate sheet music for our song :]

To make a MIDI file, I'll make use of the midiutil package. This package allows us to build MIDI files from an array of MIDI numbers. You can configure your file in a variety of ways by setting up volume, tempos and tracks. For now, I'll just make a single track midi file

Conclusion

I showed how music can be made from images and how our songs can be exported into .wav files for subsequent processing. I showed how harmonies can be built as well using this method and more complex, rich and/or weird harmonies could be built from this. There's other things that I would like to add to this method but I'll save that for another day. There's tons of experimentation that can be done with this.

I had fun making this project and I hope you have fun using it and building upon it!