Making Music From Images

This project will showcase how an image can be turned into sound in Python. The program will take a picture as input and produce a numpy array composed of frequencies that can be played.

The basic idea is as follows:

Images are made of pixels.
Pixels are composed of arrays of numbers that designate color
Color is described via color spaces like RGB or HSV for example
The color could be potentially mapped into a wavelength
Wavelength can be readily converted into a frequency
Sound is vibration that can be chracterized by frequencies
Therefore, an image could be translated into sound

With that in mind, let's get started!

Modules Used in this work

Here are the most important modules used for this project:

open cv : Used to carry out operations on images
numpy : Used to perform computations on array data
pandas : Used to load, process, analyze, operate and export dataframes
matplotlib.pyplot :Used for plotting/visualizing our results
librosa : Used for musical/audio operations

The API documentation for each of these modules can be found here:

In [36]:

#Importing modules
import cv2
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
import pandas as pd
import IPython.display as ipd
import librosa
from midiutil import MIDIFile
import random
from pedalboard import Pedalboard, Chorus, Reverb
from pedalboard.io import AudioFile

Loading Image

I'll start by loading the image using the imread function in OpenCV. The imread function loads images in BGR format by default so I'll change this to RGB to ensure that the colors are parsed/displayed correctly. The image on the left is how imread loads the image (BGR format) and the image on the right is after converting the color space to RGB (this is how the picture looked from the source file.

In [2]:

#Load the image
ori_img = cv2.imread('colors.jpg')
img = cv2.cvtColor(ori_img, cv2.COLOR_BGR2RGB)

#Get shape of image
height, width, depth = img.shape
dpi = plt.rcParams['figure.dpi']
figsize = width / float(dpi), height / float(dpi)

#Plot the image
fig, axs = plt.subplots(1, 2, figsize = figsize)
axs[0].title.set_text('BGR') 
axs[0].imshow(ori_img)
axs[1].title.set_text('RGB') 
axs[1].imshow(img)
plt.show()
print('           Image Properties')
print('Height = ',height, 'Width = ', width)
print('Number of pixels in image = ', height * width)

           Image Properties
Height =  360 Width =  640
Number of pixels in image =  230400

Using HSV Color Space

HSV is color space that is controlled by 3 values. The 3 values are Hue, Saturation, and Brightness.

Hue is defined as "the degree to which a stimulus can be described as similar to or different from stimuli that are described as red, orange, yellow, green, blue, violet". In other words, Hue represents color.

Saturation is defined as "colorfulness of an area judged in proportion to its brightness". In other words, Saturation represents the amount to which a color is mixed with white.

Brightness is defined as "perception elicited by the luminance of a visual target". In other words, Saturation represents the amount to which a color is mixed with black.

Hue values of basic colors:

Orange 0-44
Yellow 44- 76
Green 76-150
Blue 150-260
Violet 260-320
Red 320-360

I'll work in HSV color space because I figured it be a little easier to work

In [3]:

#Need function that reads pixel hue value 
hsv = cv2.cvtColor(ori_img, cv2.COLOR_BGR2HSV)
#Plot the image
fig, axs = plt.subplots(1, 3, figsize = (15,15))
names = ['BGR','RGB','HSV']
imgs  = [ori_img, img, hsv]
i = 0
for elem in imgs:
    axs[i].title.set_text(names[i])
    axs[i].imshow(elem)
    axs[i].grid(False)
    i += 1
plt.show()

Extract Hue from Image

Now that we have our image in HSV, let's extract the hue (H) value from every pixel. This can be done via a nested for loop over the height and width of the image.

In [4]:

i=0 ; j=0
#Initialize array the will contain Hues for every pixel in image
hues = [] 
for i in range(height):
    for j in range(width):
        hue = hsv[i][j][0] #This is the hue value at pixel coordinate (i,j)
        hues.append(hue)

Now that we have an array containing the H value for every pixel, I'll place that result into a pandas dataframe. Each row in the dataframe is a pixel and thus each column will contain information about that pixel. I'll call this dataframe pixels_df

In [5]:

pixels_df = pd.DataFrame(hues, columns=['hues'])
pixels_df

Out[5]:

	hues
0	113
1	89
2	99
3	94
4	87
...	...
230395	100
230396	100
230397	103
230398	103
230399	98

230400 rows × 1 columns

Converting hues to frequencies (1st Idea)

My initial idea at converting a hue value into a frequency involved a simple mapping between a predetermined set of frequencies to the H value. The mapping function is shown below. The function takes the H value and an array containing frequencies to map H to as inputs. Below, the example uses an array called scale_freqs to define the frequencies. The frequencies used in scale_freqs correspond to the A Harmonic Minor Scale. Then, an array of threshold values (called thresholds) for H is defined. This array of thresholds can then be used to convert H into a frequency from scale_freqs.

In [6]:

#Define frequencies that make up A-Harmonic Minor Scale
scale_freqs = [220.00, 246.94 ,261.63, 293.66, 329.63, 349.23, 415.30] 
def hue2freq(h,scale_freqs):
    thresholds = [26 , 52 , 78 , 104,  128 , 154 , 180]
    note = scale_freqs[0]
    if (h <= thresholds[0]):
         note = scale_freqs[0]
    elif (h > thresholds[0]) & (h <= thresholds[1]):
        note = scale_freqs[1]
    elif (h > thresholds[1]) & (h <= thresholds[2]):
        note = scale_freqs[2]
    elif (h > thresholds[2]) & (h <= thresholds[3]):
        note = scale_freqs[3]
    elif (h > thresholds[3]) & (h <= thresholds[4]):    
        note = scale_freqs[4]
    elif (h > thresholds[4]) & (h <= thresholds[5]):
        note = scale_freqs[5]
    elif (h > thresholds[5]) & (h <= thresholds[6]):
        note = scale_freqs[6]
    else:
        note = scale_freqs[0]
    
    return note

I can then apply this mapping using a lambda function to every row in the hues column to get the frequencies associated with each H value. The results of this process will be saved into a column called notes

In [7]:

pixels_df['notes'] = pixels_df.apply(lambda row : hue2freq(row['hues'],scale_freqs), axis = 1)     
pixels_df

Out[7]:

	hues	notes
0	113	329.63
1	89	293.66
2	99	293.66
3	94	293.66
4	87	293.66
...	...	...
230395	100	293.66
230396	100	293.66
230397	103	293.66
230398	103	293.66
230399	98	293.66

230400 rows × 2 columns

Cool! Now, I'll convert the notes column into a numpy array called frequencies since I can then use this to make a playable audio file :]

In [8]:

frequencies = pixels_df['notes'].to_numpy()

Finally, I can make a song out of the pixels using the method below. The picture I am using has 230,400 pixels. Even though, I could make a song that includes every pixel, I decided to restrict my song to include only the first 30 pixels for now because if I were to use all of them in order, the song would be over 6 hours long if I were to give every note a duration of 0.1s

In [9]:

song = np.array([]) 
sr = 22050 # sample rate
T = 0.1    # 0.1 second duration
t = np.linspace(0, T, int(T*sr), endpoint=False) # time variable
#Make a song with numpy array :]
#nPixels = int(len(frequencies))#All pixels in image
nPixels = 60
for i in range(nPixels):  
    val = frequencies[i]
    note  = 0.5*np.sin(2*np.pi*val*t)
    song  = np.concatenate([song, note])
ipd.Audio(song, rate=sr) # load a NumPy array

Out[9]:

That's pretty neat! Let me play with it a bit more. I decided to include the effect of octaves (i.e., make notes sound higher or lower) into my 'song-making' routine. The octave to be used for a given note will be chosen at random from an array.

In [10]:

song = np.array([]) 
octaves = np.array([0.5,1,2])
sr = 22050 # sample rate
T = 0.1    # 0.1 second duration
t = np.linspace(0, T, int(T*sr), endpoint=False) # time variable
#Make a song with numpy array :]
#nPixels = int(len(frequencies))#All pixels in image
nPixels = 60
for i in range(nPixels):
    octave = random.choice(octaves)
    val =  octave * frequencies[i]
    note  = 0.5*np.sin(2*np.pi*val*t)
    song  = np.concatenate([song, note])
ipd.Audio(song, rate=sr) # load a NumPy array

Out[10]:

Awesome! We do have all these pixels, how about we try using them by picking the frequencies from random pixels?

In [11]:

song = np.array([]) 
octaves = np.array([1/2,1,2])
sr = 22050 # sample rate
T = 0.1    # 0.1 second duration
t = np.linspace(0, T, int(T*sr), endpoint=False) # time variable
#Make a song with numpy array :]
#nPixels = int(len(frequencies))#All pixels in image
nPixels = 60
for i in range(nPixels):
    octave = random.choice(octaves)
    val =  octave * random.choice(frequencies)
    note  = 0.5*np.sin(2*np.pi*val*t)
    song  = np.concatenate([song, note])
ipd.Audio(song, rate=sr) # load a NumPy array

Out[11]:

I know it's a bit of a meme, but "Is this math rock?"

Let me compile everything so far into a single function

In [144]:

def img2music(img, scale = [220.00, 246.94 ,261.63, 293.66, 329.63, 349.23, 415.30],
              sr = 22050, T = 0.1, nPixels = 60, useOctaves = True, randomPixels = False,
              harmonize = 'U0'):
    """
    Args:
        img    :     (array) image to process
        scale  :     (array) array containing frequencies to map H values to
        sr     :     (int) sample rate to use for resulting song
        T      :     (int) time in seconds for dutation of each note in song
        nPixels:     (int) how many pixels to use to make song
    Returns:
        song   :     (array) Numpy array of frequencies. Can be played by ipd.Audio(song, rate = sr)
    """
    #Convert image to HSV
    hsv = cv2.cvtColor(ori_img, cv2.COLOR_BGR2HSV)
    
    #Get shape of image
    height, width, depth = ori_img.shape

    i=0 ; j=0 ; k=0
    #Initialize array the will contain Hues for every pixel in image
    hues = [] 
    for i in range(height):
        for j in range(width):
            hue = hsv[i][j][0] #This is the hue value at pixel coordinate (i,j)
            hues.append(hue)
            
    #Make dataframe containing hues and frequencies
    pixels_df = pd.DataFrame(hues, columns=['hues'])
    pixels_df['frequencies'] = pixels_df.apply(lambda row : hue2freq(row['hues'],scale), axis = 1) 
    frequencies = pixels_df['frequencies'].to_numpy()
    
    #Make harmony dictionary (i.e. fundamental, perfect fifth, major third, octave)
    #unison           = U0 ; semitone         = ST ; major second     = M2
    #minor third      = m3 ; major third      = M3 ; perfect fourth   = P4
    #diatonic tritone = DT ; perfect fifth    = P5 ; minor sixth      = m6
    #major sixth      = M6 ; minor seventh    = m7 ; major seventh    = M7
    #octave           = O8
    harmony_select = {'U0' : 1,
                      'ST' : 16/15,
                      'M2' : 9/8,
                      'm3' : 6/5,
                      'M3' : 5/4,
                      'P4' : 4/3,
                      'DT' : 45/32,
                      'P5' : 3/2,
                      'm6': 8/5,
                      'M6': 5/3,
                      'm7': 9/5,
                      'M7': 15/8,
                      'O8': 2
                     }
    harmony = np.array([]) #This array will contain the song harmony
    harmony_val = harmony_select[harmonize] #This will select the ratio for the desired harmony
                                               
    song_freqs = np.array([]) #This array will contain the chosen frequencies used in our song :]
    song = np.array([])       #This array will contain the song signal
    octaves = np.array([0.5,1,2])#Go an octave below, same note, or go an octave above
    t = np.linspace(0, T, int(T*sr), endpoint=False) # time variable
    #Make a song with numpy array :]
    #nPixels = int(len(frequencies))#All pixels in image
    for k in range(nPixels):
        if useOctaves:
            octave = random.choice(octaves)
        else:
            octave = 1
        
        if randomPixels == False:
            val =  octave * frequencies[k]
        else:
            val = octave * random.choice(frequencies)
            
        #Make note and harmony note    
        note   = 0.5*np.sin(2*np.pi*val*t)
        h_note = 0.5*np.sin(2*np.pi*harmony_val*val*t)  
        
        #Place notes into corresponfing arrays
        song       = np.concatenate([song, note])
        harmony    = np.concatenate([harmony, h_note])                                     
        #song_freqs = np.concatenate([song_freqs, val])
                                               
    return song, pixels_df, harmony

One more thing that would be nice to have is a procedural way to generate musical scales. Katie He has a lovely set of routines made that I copied below from this article https://towardsdatascience.com/music-in-python-2f054deb41f4

In [13]:

def get_piano_notes():   
    # White keys are in Uppercase and black keys (sharps) are in lowercase
    octave = ['C', 'c', 'D', 'd', 'E', 'F', 'f', 'G', 'g', 'A', 'a', 'B'] 
    base_freq = 440 #Frequency of Note A4
    keys = np.array([x+str(y) for y in range(0,9) for x in octave])
    # Trim to standard 88 keys
    start = np.where(keys == 'A0')[0][0]
    end = np.where(keys == 'C8')[0][0]
    keys = keys[start:end+1]
    
    note_freqs = dict(zip(keys, [2**((n+1-49)/12)*base_freq for n in range(len(keys))]))
    note_freqs[''] = 0.0 # stop
    return note_freqs

def get_sine_wave(frequency, duration, sample_rate=44100, amplitude=4096):
    t = np.linspace(0, duration, int(sample_rate*duration)) # Time axis
    wave = amplitude*np.sin(2*np.pi*frequency*t)
    return wave

I'll build upon those routines and use them to make scales:

In [141]:

def makeScale(whichOctave, whichKey, whichScale, makeHarmony = 'U0'):
    
    #Load note dictionary
    note_freqs = get_piano_notes()
    
    #Define tones. Upper case are white keys in piano. Lower case are black keys
    scale_intervals = ['A','a','B','C','c','D','d','E','F','f','G','g']
    
    #Find index of desired key
    index = scale_intervals.index(whichKey)
    
    #Redefine scale interval so that scale intervals begins with whichKey
    new_scale = scale_intervals[index:12] + scale_intervals[:index]
    
    #Choose scale
    if whichScale == 'AEOLIAN':
        scale = [0, 2, 3, 5, 7, 8, 10]
    elif whichScale == 'BLUES':
        scale = [0, 2, 3, 4, 5, 7, 9, 10, 11]
    elif whichScale == 'PHYRIGIAN':
        scale = [0, 1, 3, 5, 7, 8, 10]
    elif whichScale == 'CHROMATIC':
        scale = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
    elif whichScale == 'DIATONIC_MINOR':
        scale = [0, 2, 3, 5, 7, 8, 10]
    elif whichScale == 'DORIAN':
        scale = [0, 2, 3, 5, 7, 9, 10]
    elif whichScale == 'HARMONIC_MINOR':
        scale = [0, 2, 3, 5, 7, 8, 11]
    elif whichScale == 'LYDIAN':
        scale = [0, 2, 4, 6, 7, 9, 11]
    elif whichScale == 'MAJOR':
        scale = [0, 2, 4, 5, 7, 9, 11]
    elif whichScale == 'MELODIC_MINOR':
        scale = [0, 2, 3, 5, 7, 8, 9, 10, 11]
    elif whichScale == 'MINOR':    
        scale = [0, 2, 3, 5, 7, 8, 10]
    elif whichScale == 'MIXOLYDIAN':     
        scale = [0, 2, 4, 5, 7, 9, 10]
    elif whichScale == 'NATURAL_MINOR':   
        scale = [0, 2, 3, 5, 7, 8, 10]
    elif whichScale == 'PENTATONIC':    
        scale = [0, 2, 4, 7, 9]
    else:
        print('Invalid scale name')
    
    #Make harmony dictionary (i.e. fundamental, perfect fifth, major third, octave)
    #unison           = U0
    #semitone         = ST
    #major second     = M2
    #minor third      = m3
    #major third      = M3
    #perfect fourth   = P4
    #diatonic tritone = DT
    #perfect fifth    = P5
    #minor sixth      = m6
    #major sixth      = M6
    #minor seventh    = m7
    #major seventh    = M7
    #octave           = O8
    harmony_select = {'U0' : 1,
                      'ST' : 16/15,
                      'M2' : 9/8,
                      'm3' : 6/5,
                      'M3' : 5/4,
                      'P4' : 4/3,
                      'DT' : 45/32,
                      'P5' : 3/2,
                      'm6': 8/5,
                      'M6': 5/3,
                      'm7': 9/5,
                      'M7': 15/8,
                      'O8': 2
                     }
    
    #Get length of scale (i.e., how many notes in scale)
    nNotes = len(scale)
    
    #Initialize arrays
    freqs = []
    #harmony = []
    #harmony_val = harmony_select[makeHarmony]
    for i in range(nNotes):
        note = new_scale[scale[i]] + str(whichOctave)
        freqToAdd = note_freqs[note]
        freqs.append(freqToAdd)
        #harmony.append(harmony_val*freqToAdd)
    return freqs#,harmony

In [132]:

test_scale,test_harmony = makeScale(3, 'a', 'HARMONIC_MINOR',makeHarmony = 'm6')
print(test_scale)
print(test_harmony)

[233.08188075904496, 130.8127826502993, 138.59131548843604, 155.56349186104046, 174.61411571650194, 184.9972113558172, 220.0]
[372.93100921447194, 209.3004522404789, 221.74610478149768, 248.90158697766475, 279.38258514640313, 295.9955381693075, 352.0]

Cool! The scale generator I made could easily accomodate new scales. Build your own scales :]

Now I'll load a few images for demonstration

In [138]:

#Pixel Art
pixel_art = cv2.imread('pixel_art1.png')
pixel_art2 = cv2.cvtColor(pixel_art, cv2.COLOR_BGR2RGB)
plt.figure()
plt.imshow(pixel_art2)
plt.grid(False)
plt.show()

In [16]:

pixel_scale = makeScale(3, 'a', 'HARMONIC_MINOR')
pixel_song, pixel_df  = img2music(pixel_art, pixel_scale, T = 0.2, randomPixels = True)
ipd.Audio(pixel_song, rate = sr)

Out[16]:

In [17]:

pixel_df

Out[17]:

	hues	frequencies
0	113	174.614116
1	89	155.563492
2	99	155.563492
3	94	155.563492
4	87	155.563492
...	...	...
230395	100	155.563492
230396	100	155.563492
230397	103	155.563492
230398	103	155.563492
230399	98	155.563492

230400 rows × 2 columns

In [18]:

#Waterfall
waterfall = cv2.imread('waterfall.jpg')
waterfall2 = cv2.cvtColor(waterfall, cv2.COLOR_BGR2RGB)
plt.figure()
plt.imshow(waterfall2)
plt.grid(False)
plt.show()

In [19]:

waterfall_scale = makeScale(1, 'd', 'MAJOR')
waterfall_song, waterfall_df  = img2music(waterfall, waterfall_scale, T = 0.3,
                                          randomPixels = True, useOctaves = True)
ipd.Audio(waterfall_song, rate = sr)

Out[19]:

In [20]:

waterfall_df

Out[20]:

	hues	frequencies
0	113	58.270470
1	89	51.913087
2	99	51.913087
3	94	51.913087
4	87	51.913087
...	...	...
230395	100	51.913087
230396	100	51.913087
230397	103	51.913087
230398	103	51.913087
230399	98	51.913087

230400 rows × 2 columns

In [21]:

#Peacock
peacock = cv2.imread('peacock.jpg')
peacock2 = cv2.cvtColor(peacock, cv2.COLOR_BGR2RGB)
plt.figure()
plt.imshow(peacock2)
plt.grid(False)
plt.show()

In [22]:

peacock_scale = makeScale(3, 'E', 'DORIAN')
peacock_song, peacock_df  = img2music(peacock, peacock_scale, T = 0.2, randomPixels = False,
                          useOctaves = True, nPixels = 120)
ipd.Audio(peacock_song, rate = sr)

Out[22]:

In [23]:

peacock_df

Out[23]:

	hues	frequencies
0	113	246.941651
1	89	220.000000
2	99	220.000000
3	94	220.000000
4	87	220.000000
...	...	...
230395	100	220.000000
230396	100	220.000000
230397	103	220.000000
230398	103	220.000000
230399	98	220.000000

230400 rows × 2 columns

In [24]:

#Cat
cat = cv2.imread('cat1.jpg')
cat2 = cv2.cvtColor(cat, cv2.COLOR_BGR2RGB)
plt.figure()
plt.imshow(cat2)
plt.grid(False)
plt.show()

In [25]:

cat_scale = makeScale(2, 'f', 'AEOLIAN')
cat_song, cat_df  = img2music(cat, cat_scale, T = 0.4, randomPixels = True,
                          useOctaves = True, nPixels = 120)
ipd.Audio(cat_song, rate = sr)

Out[25]:

In [26]:

cat_df

Out[26]:

	hues	frequencies
0	113	69.295658
1	89	123.470825
2	99	123.470825
3	94	123.470825
4	87	123.470825
...	...	...
230395	100	123.470825
230396	100	123.470825
230397	103	123.470825
230398	103	123.470825
230399	98	123.470825

230400 rows × 2 columns

In [27]:

#water
water = cv2.imread('water.jpg')
water2 = cv2.cvtColor(water, cv2.COLOR_BGR2RGB)
plt.figure()
plt.imshow(water2)
plt.grid(False)
plt.show()

In [35]:

water_scale = makeScale(2, 'B', 'LYDIAN')
water_song, water_df  = img2music(water, water_scale, T = 0.2, randomPixels = False,
                          useOctaves = True, nPixels = 60)
ipd.Audio(water_song, rate = sr)

Out[35]:

In [29]:

water_df

Out[29]:

	hues	frequencies
0	113	92.498606
1	89	87.307058
2	99	87.307058
3	94	87.307058
4	87	87.307058
...	...	...
230395	100	87.307058
230396	100	87.307058
230397	103	87.307058
230398	103	87.307058
230399	98	87.307058

230400 rows × 2 columns

In [30]:

#earth
earth = cv2.imread('earth.jpg')
earth2 = cv2.cvtColor(earth, cv2.COLOR_BGR2RGB)
plt.figure()
plt.imshow(earth2)
plt.grid(False)
plt.show()

In [31]:

earth_scale = makeScale(3, 'g', 'MELODIC_MINOR')
earth_song, earth_df  = img2music(earth, earth_scale, T = 0.3, randomPixels = False,
                          useOctaves = True, nPixels = 60)
ipd.Audio(earth_song, rate = sr)

Out[31]:

In [32]:

earth_df

Out[32]:

	hues	frequencies
0	113	155.563492
1	89	138.591315
2	99	138.591315
3	94	138.591315
4	87	138.591315
...	...	...
230395	100	138.591315
230396	100	138.591315
230397	103	138.591315
230398	103	138.591315
230399	98	138.591315

230400 rows × 2 columns

In [33]:

#old_building
old_building = cv2.imread('old_building.jpeg')
old_building2 = cv2.cvtColor(old_building, cv2.COLOR_BGR2RGB)
plt.figure()
plt.imshow(old_building2)
plt.grid(False)
plt.show()

In [34]:

old_building_scale = makeScale(2, 'd', 'PHYRIGIAN')
old_building_song, old_building_df  = img2music(old_building, old_building_scale,
                                     T = 0.3, randomPixels = True, useOctaves = True, nPixels = 60)
ipd.Audio(old_building_song, rate = sr)

Out[34]:

In [68]:

#mom
mom = cv2.imread('mami.jpg')
mom2 = cv2.cvtColor(mom, cv2.COLOR_BGR2RGB)
plt.figure()
plt.imshow(mom2)
plt.grid(False)
plt.show()

In [69]:

mom_scale = makeScale(3, 'g', 'MAJOR')
mom_song, mom_df  = img2music(mom, anto_scale,
                                     T = 0.3, randomPixels = True, useOctaves = True, nPixels = 60)
ipd.Audio(mom_song, rate = sr)

Out[69]:

In [70]:

#old_building
catterina = cv2.imread('catterina.jpg')
catterina2 = cv2.cvtColor(catterina, cv2.COLOR_BGR2RGB)
plt.figure()
plt.imshow(catterina2)
plt.grid(False)
plt.show()

In [76]:

catterina_scale = makeScale(3, 'A', 'HARMONIC_MINOR')
catterina_song, catterina_df  = img2music(catterina, catterina_scale,
                                     T = 0.2, randomPixels = True, useOctaves = True, nPixels = 60)
ipd.Audio(catterina_song, rate = sr)

Out[76]:

Cool! The scale generator I made could easily accomodate new scales. Build your own scales :]

Exporting song into a .wav file

The following code can be used to export the song into a .wav file. Since the numpy arrays we are generating are dtype = float32 we need to specifiy that in the data paramter.

In [84]:

from scipy.io import wavfile
wavfile.write('earth_song.wav'    , rate = 22050, data = earth_song.astype(np.float32))
wavfile.write('water_song.wav'    , rate = 22050, data = water_song.astype(np.float32))
wavfile.write('catterina_song.wav', rate = 22050, data = catterina_song.astype(np.float32))

I'll also do it now for an example in which I'm using harmony

In [149]:

#nature
nature = cv2.imread('nature1.webp')
nature2 = cv2.cvtColor(nature, cv2.COLOR_BGR2RGB)
plt.figure()
plt.imshow(nature2)
plt.grid(False)
plt.show()

In [150]:

nature_scale = makeScale(3, 'a', 'HARMONIC_MINOR')
nature_song, nature_df, nature_harmony = img2music(nature, nature_scale, 
                                                 T = 0.2, randomPixels = True, harmonize = 'm3')
#This is the original song we made from the picture
ipd.Audio(nature_song, rate = sr)

Out[150]:

In [151]:

#This is the harmony to the song we made from the picture
ipd.Audio(nature_harmony, rate = sr)

Out[151]:

The song and harmony arrays are both 1D. I can combine them into a 2D array using np.vstack. This will allow us to save our harmonized song into a single .wav file :]

In [157]:

nature_harmony_combined = np.vstack((nature_song, nature_harmony))
ipd.Audio(combined, rate = sr)

Out[157]:

In [176]:

print(nature_harmony_combined.shape)

(2, 264600)

From the documentation for scipy.io.wavfile.write, if want to write a 2D array into a .wav file, the 2D array must be have dimensions in the form of (Nsamples, Nchannels). Notice how the shape of our array is currently (2, 264600). This means we have Nchannels = 2 and Nsamples = 264600. To ensure our numpy array has the correct shape for scipy.io.wavfile.write I'll transpose the array first.

In [173]:

wavfile.write('nature_harmony_combined.wav', rate = 22050, 
              data = nature_harmony_combined.T.astype(np.float32))

Adding Effects to Our Music with Pedalboard

Now I'm going to load the .wav files and do some extra manipulation on it using the pedalboard module from Spotify. You can read more about the pedalboard library here and here.

In [208]:

from pedalboard import Pedalboard, Chorus, Reverb, Compressor, Gain, LadderFilter 
from pedalboard import Phaser, Delay, PitchShift, Distortion
from pedalboard.io import AudioFile
# Read in a whole audio file:
with AudioFile('water_song.wav', 'r') as f:
    audio = f.read(f.frames)
    samplerate = f.samplerate

# Make a Pedalboard object, containing multiple plugins:
board = Pedalboard([
    #Delay(delay_seconds=0.25, mix=1.0),
    Compressor(threshold_db=-100, ratio=25),
    Gain(gain_db=150),
    Chorus(),
    LadderFilter(mode=LadderFilter.Mode.HPF12, cutoff_hz=900),
    Phaser(),
    Reverb(room_size=0.5),
])

# Run the audio through this pedalboard!
effected = board(audio, samplerate)

# Write the audio back as a wav file:
with AudioFile('processed-water_song.wav', 'w', samplerate, effected.shape[0]) as f:
    f.write(effected)

ipd.Audio('processed-water_song.wav')

Out[208]:

In [121]:

# Read in a whole audio file:
with AudioFile('catterina_song.wav', 'r') as f:
    audio = f.read(f.frames)
    samplerate = f.samplerate
print(samplerate)
# Make a Pedalboard object, containing multiple plugins:
board = Pedalboard([
    LadderFilter(mode=LadderFilter.Mode.HPF12, cutoff_hz=100),
    Delay(delay_seconds = 0.3),
    Reverb(room_size = 0.6, wet_level=0.2, width = 1.0),
    PitchShift(semitones = 6),
])

# Run the audio through this pedalboard!
effected = board(audio, samplerate)

# Write the audio back as a wav file:
with AudioFile('processed-catterina_song.wav', 'w', samplerate, effected.shape[0]) as f:
    f.write(effected)

ipd.Audio('processed-catterina_song.wav')

22050.0

Out[121]:

In [226]:

# Read in a whole audio file:
with AudioFile('nature_harmony_combined.wav', 'r') as f:
    audio = f.read(f.frames)
    samplerate = f.samplerate
# Make a Pedalboard object, containing multiple plugins:
board = Pedalboard([
    LadderFilter(mode=LadderFilter.Mode.HPF12, cutoff_hz=100),
    Delay(delay_seconds = 0.1),
    Reverb(room_size = 1, wet_level=0.1, width = 0.5),
    PitchShift(semitones = 6),
    #Chorus(rate_hz = 15),
    Phaser(rate_hz = 5, depth = 0.5, centre_frequency_hz = 500.0),
])

# Run the audio through this pedalboard!
effected = board(audio, samplerate)

# Write the audio back as a wav file:
with AudioFile('processed-nature_harmony_combined.wav', 'w', samplerate, effected.shape[0]) as f:
    f.write(effected)

ipd.Audio('processed-nature_harmony_combined.wav')

Out[226]:

Neat!

Using Librosa For Mapping Other Musical Quantities

Librosa is a wonderful package that allows one to carry out a variety of operations on sound data. Here I used it to readily convert frequencies into 'Notes' and 'Midi Numbers'.

In [177]:

#Convert frequency to a note
catterina_df['notes'] = catterina_df.apply(lambda row : librosa.hz_to_note(row['frequencies']), 
                                           axis = 1)  
#Convert note to a midi number
catterina_df['midi_number'] = catterina_df.apply(lambda row : librosa.note_to_midi(row['notes']), 
                                                 axis = 1)    
catterina_df

Out[177]:

	hues	frequencies	notes	midi_number
0	113	164.813778	E3	52
1	89	146.832384	D3	50
2	99	146.832384	D3	50
3	94	146.832384	D3	50
4	87	146.832384	D3	50
...	...	...	...	...
230395	100	146.832384	D3	50
230396	100	146.832384	D3	50
230397	103	146.832384	D3	50
230398	103	146.832384	D3	50
230399	98	146.832384	D3	50

230400 rows × 4 columns

Making a MIDI from our Song

Now that I've generated a dataframe containing frequencies, notes and midi numbers I can make a midi file out of it! I could then use this MIDI file to generate sheet music for our song :]

To make a MIDI file, I'll make use of the midiutil package. This package allows us to build MIDI files from an array of MIDI numbers. You can configure your file in a variety of ways by setting up volume, tempos and tracks. For now, I'll just make a single track midi file

In [178]:

#Convert midi number column to a numpy array
midi_number = catterina_df['midi_number'].to_numpy()

In [124]:

degrees  = list(midi_number) # MIDI note number
track    = 0
channel  = 0
time     = 0   # In beats
duration = 1   # In beats
tempo    = 240  # In BPM
volume   = 100 # 0-127, as per the MIDI standard

MyMIDI = MIDIFile(1) # One track, defaults to format 1 (tempo track
                     # automatically created)
MyMIDI.addTempo(track,time, tempo)

for pitch in degrees:
    MyMIDI.addNote(track, channel, pitch, time, duration, volume)
    time = time + 1
with open("catterina.mid", "wb") as output_file:
    MyMIDI.writeFile(output_file)

Converting hues to frequencies (2nd Idea)

My second idea to convert color into sound was via a 'spectral' method. This is something that I'm still playing around with.

In [242]:

#Convert hue to wavelength[nm] via interpolation. Assume spectrum is contained between 400-650nm
def hue2wl(h, wlMax = 650, wlMin = 400, hMax = 270, hMin = 0):
    #h *= 2
    hMax /= 2
    hMin /= 2
    wlRange = wlMax - wlMin
    hRange = hMax - hMin
    wl =  wlMax - ((h* (wlRange))/(hRange))
    return wl

In [99]:

#Array with hue values from 0 degrees to 270 degrees
h_array = np.arange(0,270,1)
h_array.shape

# define vectorized sigmoid
hue2wl_v = np.vectorize(hue2wl)
test = hue2wl_v(h_array)
test.shape
np.min(test)

plt.title("Interpolation of Hue and Wavelength") 
plt.xlabel("Hue()") 
plt.ylabel("Wavelength[nm]") 
plt.scatter(h_array, test, c = cm.gist_rainbow_r(np.abs(h_array)), edgecolor='none')
plt.gca().invert_yaxis()
plt.style.use('seaborn-darkgrid')
plt.show()

In [185]:

img = cv2.imread('colors.jpg')
#Convert a hue value to wavelength via interpolation
#Assume that visible spectrum is contained between 400-650nm
def hue2wl(h, wlMax = 650, wlMin = 400, hMax = 270, hMin = 0):
    #h *= 2
    hMax /= 2
    hMin /= 2
    wlRange = wlMax - wlMin
    hRange = hMax - hMin
    wl =  wlMax - ((h* (wlRange))/(hRange))
    return wl

def wl2freq(wl):
    wavelength = wl
    sol = 299792458.00 #this is the speed of light in m/s
    sol *= 1e9 #Convert speed of light to nm/s
    freq = (sol / wavelength) * (1e-12)
    return freq

def img2music2(img, fName):
    
    #Get height and width of image
    height, width, _ = img.shape
    
    #Convet from BGR to HSV
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    
    #Populate hues array with H channel for each pixel
    i=0 ; j=0
    hues = []
    for i in range(height):
        for j in range(width):
            hue = hsv[i][j][0] #This is the hue value at pixel coordinate (i,j)
            hues.append(hue)
            
    #Make pandas dataframe        
    hues_df = pd.DataFrame(hues, columns=['hues'])
    hues_df['nm'] = hues_df.apply(lambda row : hue2wl(row['hues']), axis = 1)  
    hues_df['freq'] = hues_df.apply(lambda row : wl2freq(row['nm']), axis = 1) 
    hues_df['notes'] = hues_df.apply(lambda row : librosa.hz_to_note(row['freq']), axis = 1)  
    hues_df['midi_number'] = hues_df.apply(lambda row : librosa.note_to_midi(row['notes']), axis = 1) 
    
    print("Done making song from image!") 
    
    return hues_df

In [182]:

df = img2music2(img,'color')
df

Done making song from image!

Out[182]:

	hues	nm	freq	notes	midi_number
0	113	440.740741	680.201375	F5	77
1	89	485.185185	617.892852	D♯5	75
2	99	466.666667	642.412410	E5	76
3	94	475.925926	629.914114	D♯5	75
4	87	488.888889	613.211846	D♯5	75
...	...	...	...	...	...
230395	100	464.814815	644.971822	E5	76
230396	100	464.814815	644.971822	E5	76
230397	103	459.259259	652.773900	E5	76
230398	103	459.259259	652.773900	E5	76
230399	98	468.518519	639.873231	D♯5	75

230400 rows × 5 columns

In [183]:

#Convert midi number column to a numpy array
sr = 22050 # sample rate
song = df['freq'].to_numpy()
ipd.Audio(song, rate = sr) # load a NumPy array

Out[183]:

In [184]:

a_HarmonicMinor = [220.00, 246.94 ,261.63, 293.66, 329.63, 349.23, 415.30, 440.00] 
frequencies = df['freq'].to_numpy()
song = np.array([]) 
harmony = np.array([]) 
octaves = np.array([1/4,1,2,1,2])
sr = 22050 # sample rate
T = 0.25    # 0.1 second duration
t = np.linspace(0, T, int(T*sr), endpoint=False) # time variable
#Make a song with numpy array :]
nPixels = int(len(frequencies)/height)
nPixels = 30
#for j in tqdm(range(nPixels), desc="Processing Frame"):#Add progress bar for frames processed      
for i in range(nPixels):  
    octave = random.choice(octaves)
    val =  octave * frequencies[i]
    note  = 0.5*np.sin(2*np.pi*val*t)
    song  = np.concatenate([song, note])
ipd.Audio(song, rate=sr) # load a NumPy array

Out[184]:

Conclusion

I showed how musis can be made from images and how our songs can be exported into .wav files for subsequent processing. There's tons of expereimentation that can be done with this. I had fun making this project and I hope you have using it and building upon it!