Determining Soccer Player Jersey Colors from Video Footage Via K-Means Clustering

In this project, I took a series of video footage provided by Trace taken at different soccer games and determined the color of the player jersey colors using the K-Means Clustering Algorithm. This notebook will detail my process towards accomplishing that goal. The routine developed here takes video footage as input and generates a pandas dataframe containing the results of the clustering process as output. I had limited experience in terms of doing image/video processing in Python prior to this project so this was an incredible learning experience for me!

The rough outline of my process involved the following steps:

Learn the basics of image/video processing/manipulation
Learn about classification algorithms to identify objects in images (i.e., generate bounding boxes
Learn about different color spaces
Learn how to extract color in different ways
Create algorithm that identifies dominant color in bounding boxes in each frame of the video
Refine the algorithm to handle special cases

This project required me to do data cleanup, clustering, image/video processing, elementary classification of objects in images, reading json files and a variety of dataframe/numpy array/list operations.With that said, let's get started!

Chapter 1: Image/Video Processing Basics
Chapter 2: Extracting Color From Images
Chapter 3: Identifying humans in Images
Chapter 4: Working with the Video Files
Chapter 5: Loading the JSON files and Checking Bounding Boxes
Chapter 6: Applying KMeans Clustering to Bounding Boxes from JSON+MP4 Data
Conclusions

Modules Used in this work

Here are the most important modules used for this project:

open cv : Used to carry out operations on images
PIL : Used to carry out operations on images
numpy : Used to perform computations on array data
pandas : Used to load, process, analyze, operate and export dataframes
sklearn : Used to carry out the K-means clustering routine
matplotlib.pyplot :Used for plotting/visualizing our results
json : Used to handle json files that contained player bounding box information
numpy : Used to perform computations on array data

The API documentation for each of these modules can be found here:

In [278]:

#Importing modules
import cv2
import numpy as np
from PIL import Image, ImageChops
import scipy
import scipy.misc
import scipy.cluster
import sys, glob , time, utils, struct, urllib, os, os.path
import urllib.request
import imutils
import json
import matplotlib.pyplot as plt
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import pandas as pd
from tqdm import *
np.set_printoptions(threshold=sys.maxsize)

Image/Video Processing Basics

This section will just cover things I had to learn about image/video processing manipulation before I could carry out the project. Feel free to skip this section :]

To try out my hand at the various processing techniques available, I decided to use one of my favorite moments in soccer history as a reference image. That moment is the brilliant goal that Ronaldinho scored against Real Madrid while playing for FC Barcelona back in November 19, 2005 shown below.

This image not only brings back great memories but it also has several elements that will be useful for me to consider moving forward. For instance, it has some pretty clear player objects that I could try to generate bounding boxes for, it has plenty of 'field' as part of the image that I could use to learn how to mask certain colors, and it has players from both teams which I could use to start figuring out how to classify stuff.

Loading Images

The first thing I need to learn how to do is how to load images into my notebook. If you have the image saved in your computer then you can simply use the cv2.imread function. For the image I'm using for this portion of my work however, I'm getting it via a URL. Loading an image then requires us to:

Pass our URL into urllib.request.urlopen
Create a numpy array from the image in the URL
Use cv2.imdecode to read image data from a memory cache and convert it into image format.
Since cv2.imdecode loads images by default in BGR format, I'll use cv2.cvtColor(img, cv2.COLOR_BGR2RGB) to process and render the image in original RGB

The results of this process are shown below:

In [2]:

#Render image from URL
req = urllib.request.urlopen('https://www.sportbible.com/cdn-cgi/image/width=648,quality=70,format=webp,fit=pad,dpr=1/https%3A%2F%2Fs3-images.sportbible.com%2Fs3%2Fcontent%2Fcf2701795dd2a49b4d404d9fa38f99fd.jpg')
arr = np.asarray(bytearray(req.read()), dtype=np.uint8)
bgr_img = cv2.imdecode(arr, -1) # 'Load it as it is'

# Determine the figures size in inches to fit image
dpi = plt.rcParams['figure.dpi']
height, width, depth = bgr_img.shape
figsize = width / float(dpi), height / float(dpi)

plt.figure(figsize=figsize)
plt.imshow(bgr_img)
plt.show()

In [3]:

#Convert image to RGB from BGR
rgb_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2RGB)
plt.figure(figsize=figsize)
plt.imshow(rgb_img)
plt.show()

Rotating an image

There's a few different methods to rotate an image. The imutils package has the easiest implementation via the imutils.rotate_bound function since all it requires is the image to be rotated and the angle by which we want to rotate our image. In addition to this, this function ensures that the displayed rotated image is not cropped and is fully contained within the bounds. The other methods require the construction of a rotation matrix first followed by the application of the rotation matrix.

In [4]:

#Rotating an image
rotated0 = imutils.rotate_bound(rgb_img,0)
rotated45 = imutils.rotate_bound(rgb_img,45)
rotated90 = imutils.rotate_bound(rgb_img,90)
fig,axs = plt.subplots(1,3, figsize=(30,15))
axs[0].imshow(rotated0)
axs[1].imshow(rotated45)
axs[2].imshow(rotated90)
plt.show()

Cropping Images

When loading images through opencv, the image is loaded as a numpy array. Then, to crop the image we can simply use numpy slicing to crop stuff. Multiple ways to crop things exist. I'll show a simple example here where we can crop an image by different percentages of the height and width. There's fancier ways to crop things by defining regions of interest (ROIs) and contouring which I'll show in a later section.

In [5]:

#Need to find the starting/ending column and row index first for the desired cropping
cropIni = [0.15,0.3,0.45]

#Crop width and height of image by 15% each
startRow1 = int(height*cropIni[0])       ;startCol1 = int(width*cropIni[0])
endRow1   = int(height*(1-cropIni[0]))   ;endCol1   = int(width*(1-cropIni[0]))

#Crop width and height of image by 30% each
startRow2= int(height*cropIni[1])        ;startCol2 = int(width*cropIni[1])
endRow2   = int(height*(1-cropIni[1]))   ;endCol2   = int(width*(1-cropIni[1]))

#Crop width and height of image by 40% each
startRow3 = int(height*cropIni[2])       ;startCol3 = int(width*cropIni[2])
endRow3   = int(height*(1-cropIni[2]))   ;endCol3   = int(width*(1-cropIni[2]))

#This is just slicing the array 
fig,axs = plt.subplots(1,3, figsize=(30,15))
crop1 = rgb_img[startRow1:endRow1, startCol1:endCol1]
crop2 = rgb_img[startRow2:endRow2, startCol2:endCol2]
crop3 = rgb_img[startRow3:endRow3, startCol3:endCol3]
axs[0].imshow(crop1)
axs[1].imshow(crop2)
axs[2].imshow(crop3)
plt.show()

Resizing Images

There are many ways to resize images. Here I'll show how an image can resized using the resize function in OpenCV. Even though the images look identical, it can be seen that the size (height and width) of the image are changing when we resize it.

In [6]:

#Resizing an image
#cv2.resize(src, dsize[, dst[, fx[, fy[, interpolation]]]])
xscale = [0.75,0.5,0.25]
yscale = [0.75,0.5,0.25]

rimg1 = cv2.resize(rgb_img, (0,0), fx=xscale[0], fy=yscale[0])
rimg2 = cv2.resize(rgb_img, (0,0), fx=xscale[1], fy=yscale[1])
rimg3 = cv2.resize(rgb_img, (0,0), fx=xscale[2], fy=yscale[2])

fig,axs = plt.subplots(1,3, figsize=(30,15))
axs[0].imshow(rimg1)
axs[1].imshow(rimg2)
axs[2].imshow(rimg3)
plt.show()
print("The width, height and depth of this image are ",rimg1.shape)
print("The width, height and depth of this image are ",rimg2.shape)
print("The width, height and depth of this image are ",rimg3.shape)

The width, height and depth of this image are  (304, 486, 3)
The width, height and depth of this image are  (202, 324, 3)
The width, height and depth of this image are  (101, 162, 3)

Adjusting brightness/contrast of Images

Adjusting the brightness/contrast of images can be done via the addWeighted function in OpenCV. This is a process referred to as blending. This function uses the following transformation to make those adjustments to the image:

$result = \alpha src1 + \beta src2 + \gamma$

In the equation above the blended image is modified by the applying the $\alpha$ value to the source image, the $\beta$ value to some other image (it can be the same source image), and increasing its value by $\gamma$.

The effects of blending are shown in the plots below. The first row of plots shows the effect of varying $\alpha$ while keeping the other two parameters constant ($\alpha$ decreases from left to right). The second row of plots shows the effect of varying $\beta$ while keeping the other two parameters constant ($\beta$ increases from left to right). The third row of plots shows the effect of varying $\gamma$ while keeping the other two parameters constant ($\gamma$ increases from left to right).

Decreasing $\alpha$ causes the image to darken.

Increasing $\beta$ causes the image to have more contrast.

Decreasing $\gamma$ causes the image to soften.

In [7]:

#cv2.addWeighted(source_img1, alpha, source_img2, beta, gamma)
alpha = [0.75, 0.5, 0.25]
beta =  [0, 1 , 10]
gamma = [0, 10 ,100]

#Vary alpha
alpha_img1 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[0], gamma[0])
alpha_img2 = cv2.addWeighted(rgb_img, alpha[1], rgb_img, beta[0], gamma[0])
alpha_img3 = cv2.addWeighted(rgb_img, alpha[2], rgb_img, beta[0], gamma[0])

#Vary beta
beta_img1 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[0], gamma[0])
beta_img2 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[1], gamma[0])
beta_img3 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[2], gamma[0])

#Vary gamma
gamma_img1 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[0], gamma[0])
gamma_img2 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[0], gamma[1])
gamma_img3 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[0], gamma[2])

#Plot results
fig,axs = plt.subplots(3,3, figsize=(30,30))
#These images 
axs[0,0].imshow(alpha_img1)
axs[0,1].imshow(alpha_img2)
axs[0,2].imshow(alpha_img3)

axs[1,0].imshow(beta_img1)
axs[1,1].imshow(beta_img2)
axs[1,2].imshow(beta_img3)

axs[2,0].imshow(gamma_img1)
axs[2,1].imshow(gamma_img2)
axs[2,2].imshow(gamma_img3)
plt.show()

Change Color Space of Images

There are a variety of color spaces used in image processing that can facilitate a wide array of tasks like edge detection and masking to name a few. Converting between color spaces can be readily done with OpenCV through the cvtColor function

A few common color spaces are listed below

RGB -> Many images are initially encoded using this format
HSV -> Provides greater control on color Hues
GRAY -> Makes many image processing methods more accurate

The image will be displayed using the methods above.

In [8]:

gray_img = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2GRAY)
bgr_img = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2BGR)
hsv_img = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2HSV)

fig,axs = plt.subplots(1,4, figsize=(30,15))
axs[0].imshow(rgb_img)
axs[1].imshow(gray_img)
axs[2].imshow(bgr_img)
axs[3].imshow(hsv_img)
plt.show()

Blurring Images

Blurring is an important operation when trying to detect edges (i.e., the lines that delineate the transition from one group of pixels to another) since it makes the transition between object boundaries smoother. This can be used to separate an object from a background for instance

There's four categories I looked into for this project

Average blurring -> Fast but may not preserve object edges
Gaussian blurring -> Slower than Average blurring but better at edge preservation
Median filtering -> Robust to outliers
Bilateral filtering -> Much slower than above methods. More parameters (more tunable).

The effects of using different blurring methods are shown in the plots below. The first row of plots shows the effect of using average blurring while increasing the kernel size from left to right. The second row of plots shows the effect of using gaussian blurring while increasing the kernel size from left to right. The third row of plots shows the effect of using median blurring while increasing the kernel size from left to right. The fourth row of plots shows the effect of using bilateral blurring while increasing the diameter, sigmaColor, and sigmaSpace parameters from left to right.

In [9]:

params = [(3, 20, 5, 5), (9, 20, 40, 20), (15, 20, 160, 60)]
fig,axs = plt.subplots(4, 3, figsize=(30,30))
i = 0
for (k, diameter, sigmaColor, sigmaSpace) in params:
    simpleblur_image = cv2.blur(rgb_img, (k,k))
    gaussblur_image = cv2.GaussianBlur(rgb_img, (k,k), 0)
    medianblur_image = cv2.medianBlur(rgb_img, k)
    bilateralblur_image = cv2.bilateralFilter(rgb_img, diameter, sigmaColor, sigmaSpace)

    axs[0,i].imshow(simpleblur_image)
    axs[1,i].imshow(gaussblur_image)
    axs[2,i].imshow(medianblur_image)
    axs[3,i].imshow(bilateralblur_image)
    i+=1
#Plot results
plt.show()

Detecting Edges in Images

Edge detection is an image-processing technique that allows for the identification of the boundaries (i.e., edges) of objects within an image. Edges allow us to identify the underlying structure of an image and hence make them one of the most important bits of information that we need from images.

The Canny algorithm was used below to detect the edges on the image.

In [10]:

#cv2.Canny(image, minVal, maxVal)
img_gray = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2GRAY)
thresholds = [(5,150), (100,150), (200,225)]
fig,axs = plt.subplots(1,4, figsize=(30,15))
i = 0
axs[i].imshow(rgb_img)
for (minVal, maxVal) in thresholds:
    edge_img = cv2.Canny(img_gray, minVal, maxVal, apertureSize = 3, L2gradient = False)
    axs[i+1].imshow(edge_img)
    i += 1
    
plt.show()

Masking Colors in Images

Oftentimes, one may want to show only specific colors in an image. This can be accomplished by masking. The inRange function in OpenCV allows this to be readily done when working in HSV space.

The images shown below are (from left to right) the result of applying no mask, of masking the green hues, the red hues and blue hues respectively.

In [11]:

#Remove green background/field from image prior to clustering 
green = np.array([60,255,255]) #This is green in HSV
loGreen = np.array([30,25,25]) #low green threshold
hiGreen = np.array([90,255,255]) #Upper green threshold

loBlue = np.array([0,25,25]) #low red threshold
hiBlue = np.array([30,255,255]) #Upper red threshold

loRed = np.array([120,25,25]) #low blue threshold
hiRed = np.array([180,255,255]) #Upper blue threshold

#Convert image to HSV
hsv = cv2.cvtColor(rgb_img, cv2.COLOR_BGR2HSV)

gmask = cv2.inRange(hsv, loGreen, hiGreen)
rmask = cv2.inRange(hsv, loRed  , hiRed)
bmask = cv2.inRange(hsv, loBlue , hiBlue)

gresult = rgb_img.copy()
bresult = rgb_img.copy()
rresult = rgb_img.copy()

gresult[gmask==255] = (255,255,255)
bresult[bmask==255] = (255,255,255)
rresult[rmask==255] = (255,255,255)

fig,axs = plt.subplots(1,4, figsize=(30,15))
axs[0].imshow(rgb_img)
axs[1].imshow(gresult)
axs[2].imshow(rresult)
axs[3].imshow(bresult)
plt.show()

Drawing Contours in Images

Oftentimes, one may want to show only specific colors in an image. This can be accomplished by masking. The inRange function in OpenCV allows this to be readily done when working in HSV space.

The images shown below are (from left to right) the result of applying no mask, of masking the green hues, the red hues and blue hues respectively.

In [12]:

#Getting image contours
img_gray = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2GRAY)
gaussblur_image = cv2.GaussianBlur(img_gray, (15,15), 0)

retval, thresh = cv2.threshold(gaussblur_image, 125, 130, cv2.THRESH_TOZERO)
img_contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
thresh_img = rgb_img.copy()
cv2.drawContours(thresh_img, img_contours, -1, (127, 50, 250),2)

fig,axs = plt.subplots(1,1, figsize=(30,15))
axs.imshow(thresh_img)
plt.show()

Selecting Regions of Interest in Images

Selecting an ROI is another form of cropping. The method shown here is a good way to quickly crop your images if not having to process too many of them.

In [13]:

#Select ROI from image
imagedraw = cv2.selectROI('select',rgb_img)
cv2.waitKey(0)
cv2.destroyWindow('select')
#cropping the area of the image within the bounding box using imCrop() function
roi_image = rgb_img[int(imagedraw[1]):int(imagedraw[1]+imagedraw[3]), 
                    int(imagedraw[0]):int(imagedraw[0]+imagedraw[2])] 

fig,axs = plt.subplots(1,1, figsize=(5,5))
axs.imshow(roi_image)
plt.show()

Extracting color from images

At this point, I felt pretty comfortable manipulating images and doing basic processing operations that I was confident would be sufficient to acheive my goal of determining the player color from images. In order to determine color I tried the following things:

Extract color at a single pixel
Extract color via pixel by pixel averaging
Use K-Means clustering to get k-colors in image

Extract color at a single pixel

Extracting color at a single pixel can be easily done by providing the x,y coordinates of a pixel to the image array. I wrote a little function that does this for a variety of pixels in the image below. The result of this is shown below for 17 different pixels.

In [14]:

#Get color from single pixel in image
#Make list of pixel coordinates based on image shape
y = range(0, height, 25)
x = range(0, width,  25)
#Combine lists above into a list of tuples
merged_list = tuple(zip(x, y)) 

#Initialize the plot
fig,axs = plt.subplots(1, len(y), figsize=(30,30))
i = 0
#Iterate over elements in tuple list of pixel coordinates
for (x, y) in merged_list:
    #Return rgb tuple at x,y coordinate
    r, g, b = (rgb_img[x, y])
    # Creating rgb array from rgb tuple
    color_of_pix = np.zeros((5, 5, 3), np.uint8)
    color_of_pix[:] = [r, g, b]
    #Display rgb array 
    axs[i].imshow(color_of_pix)
    i += 1
    
plt.show()

Extract dominant color via pixel by pixel averaging

Now that we can extract color at a single pixel, we can extend the method to determine the average color of the image. Passing an x,y coordinate to our image array returns an RGB tuple for a pixel. Then, by adding the value of each element in the tuple at each pixel we can get the "total counts" associated with each of the RGB channels. Finally, we can divide the counts in each of the RGB color chanels by the total number of pixels in our image to get the average color of the image. The result of this process is shown below where the average color turns out to be a light brown which through visual inspection of the image does appear to be reasonable. However, can we improve this?

In [15]:

#Determining most frequently occurring color pixel by pixel
def most_common_used_color(img):
    # Get width and height of Image
    height, width, depth = img.shape
    # Initialize Variable
    r_total = 0
    g_total = 0
    b_total = 0
    count = 0
    # Iterate through each pixel
    for x in range(0, height):
        for y in range(0, width):
            # r,g,b value of pixel
            r, g, b = (img[x, y])
 
            r_total += r
            g_total += g
            b_total += b
            count += 1 
    return (r_total/count, g_total/count, b_total/count)
#Function to convert RGB channels to hex code
def rgb2hex(rgb_tuple):
    r = round(rgb_tuple[0])
    g = round(rgb_tuple[1])
    b = round(rgb_tuple[2])
    return "#{:02x}{:02x}{:02x}".format(r,g,b)

# call function
common_color = most_common_used_color(rgb_img)
print(common_color)
print(rgb2hex(common_color))
color_of_pix = np.zeros((5, 5, 3), np.uint8)
color_of_pix[:] = [common_color[0], common_color[1], common_color[2]]
fig,axs = plt.subplots(1, 2, figsize=(10,10))
axs[0].imshow(rgb_img)
axs[1].imshow(color_of_pix)
plt.show()

(119.83675506782502, 116.59529797286999, 97.04080170705686)
#787561

Extract dominant colors via K-Means Clustering

The player jersey color detection routine can be further improved by making use of the K-Means clustering algorithm. This routine will allow us to extract however many "dominant colors" in our image by specifying the number of clusters, k, that the routine should use. Determining the value of k, can be done a priori if one knows how many clusters the data should fall into. Otherwise, a commonly use method to determine the k value is through the elbow method as is shown below. An elbow method plot takes the distortions of the model and plots them against the value of k. The inflection point (aka the elbow) of the graph is the k value that should be used. The results of the elbow plot using the image we've been working with indicates that the optimal k-value is somewhere between 3-4. Given that, I'll try both :]

In [16]:

#Determine optimal k value for clustering using elbow method
distortions = [] #Initialize array with distortions from each clustering run
K = range(1,11) #Explore k values between 1 and 10
#Run the clustering routine 
for k in K:
    #Convert image into a 1D array
    flat_img = np.reshape(rgb_img,(-1,3))
    kmeanModel = KMeans(n_clusters=k)
    kmeanModel.fit(flat_img)
    distortions.append(kmeanModel.inertia_)

plt.figure(figsize=(16,8))
plt.style.use('Solarize_Light2')
plt.plot(K, distortions, 'bx-')
plt.yticks(fontsize=20)
plt.xticks(fontsize=20)
plt.xlabel('k', fontsize=20)
plt.ylabel('Distortion', fontsize=20)
plt.title('Elbow Method showing optimal k')
plt.grid(True)
plt.show()

Running K-Means Clustering on Image

Having established that k should be either 3 or 4, I can write a little routine that will take an image and determine the k-dominant colors in it. The results for k = 3, k = 4, and k = 10 (which I did just for funsies) cases are shown below.

In [17]:

def KMeansTest(img,clusters):
    """
    Args:
        path2img   : (str) path to cropped player bounding box
        clusters   : (int) how many clusters to use for KMEANS
    Returns:
        rgb_array :  (tuple) Dominant colors in image in RGB format
    """
    org_img = img.copy()
    #print('Org image shape --> ',img.shape)

    #Convert image into a 1D array
    flat_img = np.reshape(img,(-1,3))
    arrayLen = flat_img.shape
        
    #Do the clustering
    kmeans = KMeans(n_clusters = clusters, random_state=0, tol = 1e-4)
    kmeans.fit(flat_img)
    #Define the array with centroids
    dominant_colors = np.array(kmeans.cluster_centers_,dtype='uint')
    #Calculate percentages 
    percentages = (np.unique(kmeans.labels_,return_counts=True)[1])/flat_img.shape[0]
        
    #Combine centroids representing dominant colors and percentages associated with each centroid into an array
    pc = list(zip(percentages,dominant_colors))
    pc = sorted(pc,reverse=True)
    
    i = 0
    rgb_array = []
    for i in range(clusters):
        dummy_array = pc[i][1]
        rgb_array.append(dummy_array)
        i += 1
            
    return rgb_array

def plotKMeansResult(nClusters,rgb_array):
    """
    Args:
        rgb_array   : (tuple) Dominant colors in image in RGB format
        nClusters   : (int) how many clusters were used for KMEANS
    """

    fig,axs = plt.subplots(1, nClusters, figsize=(20,20))

    i = 0
    for i in range(nClusters):
        color_of_pix    = np.zeros((5, 5, 3), np.uint8)
        color_of_pix[:] = [rgb_array[i][0], rgb_array[i][1], rgb_array[i][2]]
        axs[i].grid(False)
        axs[i].imshow(color_of_pix)
        i += 1
    plt.show()

In [18]:

#Call K-Means function with K = 3
nClusters = 3
rgb_array = KMeansTest(rgb_img, nClusters)
plotKMeansResult(nClusters,rgb_array)

In [19]:

#Call K-Means function with K = 4
nClusters = 4
rgb_array = KMeansTest(rgb_img, nClusters)
plotKMeansResult(nClusters,rgb_array)

In [20]:

#Call K-Means function with K = 10
nClusters = 10
rgb_array = KMeansTest(rgb_img, nClusters)
plotKMeansResult(nClusters,rgb_array)

Identifying humans in Images

Before jumping into trying my hand at clustering the video footage,there was one more thing I was interested in. The problem I also wanted to look into is how to classify/identify players/humans in pictures.

After a bit of reading, I came across the HOG package in OpenCV that contains databases of trained models capable of detecting different objects like cats, faces, and humans.

In [21]:

#Detecting humans with HOG
path2xml = r'C:\Users\vmurc\Documents\GitHub\opencv\data\haarcascades\haarcascade_fullbody.xml'
fbCascade = cv2.CascadeClassifier(path2xml)
# Initializing the HOG person detector
image = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2GRAY)

hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
   
# Resizing the Image
image = imutils.resize(image, width = min(1000, image.shape[1]))
# Detecting all the regions in the image that has a person inside it
#(regions, _) = hog.detectMultiScale(image, winStride = (2,2), padding = (4, 4), scale = 1.1)
players = fbCascade.detectMultiScale(image, scaleFactor = 1.005, minSize=(20, 20), minNeighbors = 1)                                    
image2 = rgb_img.copy()
# Drawing the regions in the Image
i=0
for (x, y, w, h) in players:
    cv2.rectangle(image2, (x, y),  (x + w, y + h), (0, 255, 0), 3)
    currentbox = image2[y:y+h,x:x+w]
    i+=1
    
fig,axs = plt.subplots(1, 2, figsize=(20,20))
axs[0].imshow(rgb_img)
axs[0].grid(False)
axs[1].imshow(image2)
axs[1].grid(False)
plt.show()

I spent a bit of time playing around with the parameters of the detector and I wasn't able to get much better results. I tried using the DefaultPeopleDetector and the Haar cascade classifier haarcascade_fullbody and I wasn't able to get the results I wanted. I think that all the audience in the background is potentially throwing off the detector.

Even though, detecting the players themselves is not part of the project (I was given a json file containing player bounding box coordinates), I still wanted to ensure that I had a succesful attempt at using the HOG detector. I tried a different image shown below that I thought would give me a succesful detection.After playing with the parameters for a few minutes, I found a combination that worked! I decided to generate the image using just the player bounding box (BB) and apply the K-means routine on the contents of that BB. and the results are shown below

I'll need to look more into ways of refining/automating the parameters of the detector function but I'm content with my progress on this so far.

In [22]:

#Render new image from URL
req = urllib.request.urlopen('https://i.pinimg.com/736x/73/f5/d6/73f5d6a847c9308f35864ffe2fa729c4.jpg')
arr = np.asarray(bytearray(req.read()), dtype=np.uint8)
bgr_img2 = cv2.imdecode(arr, -1) # 'Load it as it is'
rgb_img2 = cv2.cvtColor(bgr_img2, cv2.COLOR_BGR2RGB)

#Detecting humans with HOG
path2xml = r'C:\Users\vmurc\Documents\GitHub\opencv\data\haarcascades\haarcascade_fullbody.xml'
fbCascade = cv2.CascadeClassifier(path2xml)
# Initializing the HOG person detector
image = cv2.cvtColor(rgb_img2, cv2.COLOR_RGB2GRAY)
# Resizing the Image
image = imutils.resize(image, width = min(50000, image.shape[1]))
# Detecting all the regions in the image that has a person inside it
players = fbCascade.detectMultiScale(image, scaleFactor = 1.01, minSize=(300, 300), minNeighbors = 1)                                    
image2 = rgb_img2.copy()
# Drawing the regions in the Image
i = 0
for (x, y, w, h) in players:
    cv2.rectangle(image2, (x, y),  (x + w, y + h), (0, 255, 0), 3)
    currentbox = image2[y:y+h,x:x+w]
    i += 1
i = 0
img_list = [rgb_img2, image2, currentbox]
fig,axs = plt.subplots(1, 3, figsize=(20,20))
for img in img_list:
    plt.style.use('ggplot')
    axs[i].grid(False)
    axs[i].imshow(img)
    i+=1
plt.show()   

#Apply K-Means function on the bounding box with K = 4
nClusters = 4
rgb_array = KMeansTest(currentbox, nClusters)
plotKMeansResult(nClusters,rgb_array)        

Working with Videos and Extracting frames

Having familiarized myself with a variety of image manipulation/processing techniques and had a good understanding of how K-Means could be implemented to extract the dominant colors in an image, I decided to start processing the video footage since I was confident that I had the basis for developing a functional routine to deterimine the jersey color from images. The first thing that needs to be done is get the video files which I can do with the following routine:

In [23]:

#Establish paths to MP4 data
rMP4Path = '\game_1779\LCamera' #Relative path containing the MP4 data from L Camera
lMP4Path = '\game_1779\RCamera' #Relative path containing the MP4 data from R Camera

def getListOfFiles(rPath , fType):
    """
    Args:
        rPath:     (str) path to file
        fType:     (str) type of file to look for (i.e., .mp4, .json, etc.)
    Returns:
        lFiles:    (list) List of files in rPath of type fType
    """ 
    #1. Establish the current working directory
    directory = os.getcwd()
    #2. List all files in rPath of type fType
    lFiles = glob.glob(directory + rPath + "\*" + fType)
    return lFiles

rc_mp4s = getListOfFiles(rMP4Path , ".mp4")
n_mp4_RC = len(rc_mp4s)
print("There are " + str(n_mp4_RC) + " MP4 files for R Camera")
print(rc_mp4s)

There are 19 MP4 files for R Camera
['C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_001.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_002.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_003.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_004.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_005.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_006.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_007.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_008.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_009.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_010.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_011.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_012.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_013.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_014.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_015.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_016.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_017.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_018.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_019.MP4']

In [24]:

lc_mp4s = getListOfFiles(lMP4Path , ".mp4")
n_mp4_LC = len(lc_mp4s)
print("There are " + str(n_mp4_RC) + " MP4 files for L Camera")
print(lc_mp4s)

There are 19 MP4 files for L Camera
['C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_001.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_002.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_003.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_004.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_005.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_006.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_007.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_008.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_009.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_010.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_011.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_012.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_013.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_014.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_015.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_016.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_017.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_018.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_019.MP4']

Now that I have the lists of mp4 files, I can extract individual frames from the video using the following routine:

In [25]:

def get_frame(video_file, frame_index):
    """
    Args:
        video_file:     (str) path to .MP4 video file
        frame_index:    (int) query frame index
    Returns:
        frame:          (ndarray, size (y, x, 3)) video frame
                        Uses OpenCV BGR channels
    """

    video_capture = cv2.VideoCapture(video_file)
    video_capture.set(cv2.CAP_PROP_POS_FRAMES, frame_index)
    success, frame = video_capture.read()
    if not success:
        raise ValueError(
            "Couldn't retrieve frame {0} from video {1}".format(
                frame_index,
                video_file
            )
        )

    return frame

#Get frame 2500 for the first mp4 video in each directory
whichFrame = 2500
lc_frame = get_frame(rc_mp4s[0], whichFrame)
rc_frame = get_frame(lc_mp4s[0], whichFrame)
#Convert color from BGR to RGB
lc_frame = cv2.cvtColor(lc_frame, cv2.COLOR_BGR2RGB)
rc_frame = cv2.cvtColor(rc_frame, cv2.COLOR_BGR2RGB)

lc_rc = [lc_frame,rc_frame]
fig,axs = plt.subplots(1, 2, figsize=(20,20))
i = 0
for frame in lc_rc:
    plt.style.use('ggplot')
    axs[i].grid(False)
    axs[i].imshow(frame)
    i += 1
plt.show()

One more thing that I want to extract from the video is the number of frames in it. This can be done using the following routine:

In [26]:

#Determine number of frames in video
def count_frames(video_file):
    """
    Args:
        video_file:     (str) path to .MP4 video file
    Returns:
        nFrames:        (int) Number of frames in mp4
    """
    cap = cv2.VideoCapture(video_file)
    length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    
    return(length)

nfRC = count_frames(rc_mp4s[0])
nfLC = count_frames(lc_mp4s[0])
print(nfRC, nfLC)

7230 7230

Let's also find out if all the videos have the same number of frames

In [27]:

for i in range(len(rc_mp4s)):
    nfRC = count_frames(rc_mp4s[i])
    nfLC = count_frames(lc_mp4s[i])
    print(i+1, nfRC, nfLC)

1 7230 7230
2 7200 7200
3 7200 7200
4 7200 7200
5 7200 7200
6 7200 7200
7 7200 7200
8 7200 7200
9 7200 7200
10 7200 7200
11 7200 7200
12 7200 7200
13 7200 7200
14 7200 7200
15 7200 7200
16 7200 7200
17 7200 7200
18 7200 7200
19 6314 6228

The different mp4 files have different number of frames. That's good to check as we build the processing routine.

Loading the JSON files and Checking Bounding Boxes

The JSON files contained player bounding box coordinates. The first thing that needs to be done is load the json files which can be done using the routine I made earlier

In [29]:

#Generate pandas dataframe containing player jersey colors from mp4 and json data provided by Trace
directory = os.getcwd()
jsonPath =  '\game_1779\object_detector'
jsonList = getListOfFiles(jsonPath , ".json")
n_json = len(jsonList)
print("There are " + str(n_json) + " json files.")
print([s.replace(directory + jsonPath + '\\', '') for s in jsonList]) #Remove directory when printing

There are 38 json files.
['video_CAMB_CAMCAMBA_20180727_133419_001.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_002.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_003.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_004.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_005.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_006.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_007.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_008.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_009.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_010.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_011.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_012.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_013.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_014.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_015.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_016.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_017.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_018.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_019.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_001.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_002.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_003.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_004.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_005.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_006.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_007.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_008.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_009.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_010.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_011.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_012.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_013.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_014.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_015.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_016.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_017.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_018.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_019.MP4.json']

The json files are associated with an MP4 file, so I'll have to make sure that when I'm batch processing all the files, to ensure that the right json file is being paired with the right MP4 file. To do this, I'll first strip the paths and the .json extension from the list of filenames I had generated earlier and place that result into a list called json_strip

In [30]:

json_strip = [s.replace(directory + jsonPath + '\\', '') for s in jsonList]
json_strip = [s.replace(".json", '') for s in json_strip]
print(len(json_strip))
print(json_strip)

38
['video_CAMB_CAMCAMBA_20180727_133419_001.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_002.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_003.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_004.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_005.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_006.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_007.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_008.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_009.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_010.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_011.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_012.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_013.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_014.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_015.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_016.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_017.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_018.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_019.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_001.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_002.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_003.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_004.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_005.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_006.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_007.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_008.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_009.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_010.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_011.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_012.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_013.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_014.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_015.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_016.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_017.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_018.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_019.MP4']

Then, I'll strip the paths from the list of MP4 files for each camera (LCAMERA and RCAMERA) I had generated earlier and place that result into two lists called mp4_strip_LC and mp4_strip_RC respectively.

In [31]:

mp4_strip_LC = [s.replace(directory + lMP4Path + '\\', '') for s in lc_mp4s]
print(mp4_strip_LC)

['video_CAMB_CAMCAMBB_20180727_133418_001.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_002.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_003.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_004.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_005.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_006.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_007.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_008.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_009.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_010.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_011.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_012.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_013.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_014.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_015.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_016.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_017.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_018.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_019.MP4']

In [32]:

mp4_strip_RC = [s.replace(directory + rMP4Path + '\\', '') for s in rc_mp4s]
print(mp4_strip_RC)

['video_CAMB_CAMCAMBA_20180727_133419_001.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_002.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_003.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_004.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_005.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_006.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_007.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_008.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_009.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_010.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_011.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_012.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_013.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_014.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_015.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_016.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_017.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_018.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_019.MP4']

As a sanity check, I'll use a list comprehension to ensure that the stripping process does indeed result in a file match.

In [33]:

#Check that there is an mp4 from LCamera associated with json file
mp4Name = mp4_strip_LC[0]
print(mp4Name)
matches = [match for match in json_strip if mp4Name in match]
print(matches)

video_CAMB_CAMCAMBB_20180727_133418_001.MP4
['video_CAMB_CAMCAMBB_20180727_133418_001.MP4']

In [34]:

#Check that there is an mp4 from RCamera associated with json file
mp4Name = mp4_strip_RC[0]
print(mp4Name)
matches = [match for match in json_strip if mp4Name in match]
print(matches)

video_CAMB_CAMCAMBA_20180727_133419_001.MP4
['video_CAMB_CAMCAMBA_20180727_133419_001.MP4']

Yay! It works! Finally, I'll use those lists to get the index associated with those files from each camera.

In [35]:

#Get the index of the json file containing the data for the [0] mp4 file from LCamera
mp4Name = mp4_strip_LC[0]
json_strip.index(mp4Name)

Out[35]:

In [36]:

#Get the index of the json file containing the data for the [0] mp4 file from RCamera
mp4Name = mp4_strip_RC[0]
json_strip.index(mp4Name)

Out[36]:

Awesome! Now, I'll put these code snippets into a routine that'll allow me to readily match json files to MP4 files with a single function! This will be important for batch processing later.

In [37]:

def matchJSON2MP4(jsonList, jsonPath, MP4list, MP4Path, whichMP4):
    
    json_strip = [s.replace(directory + jsonPath + '\\', '') for s in jsonList]
    json_strip = [s.replace(".json", '') for s in json_strip]
    
    mp4_strip = [s.replace(directory + MP4Path + '\\', '') for s in MP4list]
    
    mp4Name = mp4_strip[whichMP4]
    index    = json_strip.index(mp4Name)
    
    print(index)
    return index

whichMP4 = 0
jval_RC = matchJSON2MP4(jsonList, jsonPath, rc_mp4s, rMP4Path, whichMP4)
jval_LC = matchJSON2MP4(jsonList, jsonPath, lc_mp4s, lMP4Path, whichMP4)

0
19

With the framework above I can start reading the json files and start getting the bound box info. To do this, I'll generate a dictionary that will have the coordinates of each player bounding box (detection) for each frame in the video being processed

In [38]:

#Get dictionary from json file                            
def read_json_dict(path2json):
    """
    Args:
        path2json:     (str) path to .MP4 json file containing player bounding boxes
    Returns:
        bb_dict:       (dict) Dictionary containing bounding boxes in each frame
    """
    # Opening JSON file                     
    f = open(path2json)
    # Returns JSON object as a dictionary
    bb_dict = json.load(f) 
    f.close()
    return(bb_dict)

bb_dict_LC = read_json_dict(jsonList[jval_LC])#This is for first video in the LCamera folder
bb_dict_RC = read_json_dict(jsonList[jval_RC])#This is for first video in the RCamera folder
#print(bb_dict)

The above code gives me the bounding boxes for every frame in the current video.

Next, I'll determine how many bounding boxes there are in a given frame

In [39]:

#Determine number of bounding boxes in frame
def count_bboxes(bb_dict,frame_index):
    """
    Args:
        bb_dict:      (dict) dictionary from json file
        frame:        (int) what frame is being processed
    Returns:
        nDetections:        (int) Number of bounding boxes in frame
    """
    
    bbs = bb_dict['frames'][frame_index]['detections']
    nDetections = len(bbs)
    #print(nDetections, " bounding boxes found in frame ", frame_index)
    return(nDetections)

whichFrame = 0
lc_n_bbs = count_bboxes(bb_dict_LC,whichFrame)
rc_n_bbs = count_bboxes(bb_dict_RC,whichFrame)
print(lc_n_bbs, rc_n_bbs)

1 1

Next, I'll determine which is the first frame that contains detections

In [40]:

#Find first frame that contains detections
def findFirstFrame(bb_dict):
    """
    Args:
        bb_dict:      (dict) dictionary from json file
    Returns:
        firstFrame:        (int) First frame to process in video
    """
    
    firstFrame =  bb_dict['frames'][0]['frame_index']
    print('These is the first frame to process in video ', firstFrame)
    return(firstFrame)

firstFrame_LC = findFirstFrame(bb_dict_LC)
firstFrame_RC = findFirstFrame(bb_dict_RC)
print(firstFrame_LC, firstFrame_RC)

These is the first frame to process in video  0
These is the first frame to process in video  62
0 62

Next, the detections are done for frame_index values that may be different for different videos. Let's figure out what the detection collection interval is for a video based on the json file.

In [41]:

#Find first frame that contains detections
def findFrameSpacing(bb_dict):
    """
    Args:
        bb_dict:      (dict) dictionary from json file
    Returns:
        spacing:      (int) Spacing between frames in json
    """
    
    frame0 =  bb_dict['frames'][0]['frame_index']
    frame1 =  bb_dict['frames'][1]['frame_index']
    spacing = abs(frame1 - frame0)
    print('The frame spacing is ', spacing)
    return(spacing)

frameSpacing_LC = findFrameSpacing(bb_dict_LC)
frameSpacing_RC = findFrameSpacing(bb_dict_RC)
print(frameSpacing_LC, frameSpacing_RC)

The frame spacing is  6
The frame spacing is  6
6 6

Next, let's extract all the bounding box coordinates for the currennt frame from the json file

In [42]:

#Extract bounding boxes for a given frame from json
def get_bb4frame(bb_dict,frame_index):
    """
    Args:
        bb_dict:      (dict) dictionary from json file
        frame:        (int) what frame is being processed
    Returns:
        nDetections:        (int) Number of bounding boxes in frame
    """
    
    bbs = bb_dict['frames'][frame_index]['detections']
    #print('These are the coordinates for all bounding boxes in frame', frame_index)
    #print(bbs)
    return(bbs)

whichFrame = 150
bbs_Frame_LC = get_bb4frame(bb_dict_LC,whichFrame) #BB coordinates for current frame
bbs_Frame_RC = get_bb4frame(bb_dict_RC,whichFrame) #BB coordinates for current frame
print(bbs_Frame_LC) 
print(bbs_Frame_RC)

[[1384, 413, 1420, 479, 0.73], [623, 293, 674, 391, 1.0], [940, 304, 976, 379, 0.94], [914, 276, 935, 333, 0.86], [622, 211, 656, 266, 0.94], [482, 202, 522, 265, 0.89], [534, 206, 555, 263, 0.62], [376, 189, 407, 252, 0.82], [558, 196, 582, 252, 0.93], [735, 202, 751, 246, 0.62], [112, 164, 144, 225, 0.92], [439, 173, 462, 223, 0.98], [363, 162, 384, 210, 0.96], [315, 153, 333, 196, 0.97]]
[[614, 675, 635, 724, 0.9], [1521, 400, 1555, 475, 0.97], [1756, 364, 1800, 460, 1.0], [1786, 333, 1818, 406, 0.95], [2630, 284, 2677, 401, 0.99], [1740, 301, 1760, 346, 0.89], [2093, 262, 2111, 320, 0.97], [1872, 271, 1890, 315, 0.94], [2222, 254, 2261, 312, 0.74], [2347, 226, 2379, 300, 0.99], [2459, 213, 2489, 289, 0.89], [2493, 212, 2525, 279, 0.9], [2521, 187, 2553, 261, 0.74], [2592, 190, 2624, 261, 0.98], [2326, 208, 2344, 256, 0.8], [2271, 210, 2291, 253, 0.96], [2402, 199, 2427, 251, 0.94]]

Finally, let's extract the bounding box coordinates for a specific bounding box from the json file

In [43]:

#Extract bounding box coordinates for a specific bounding box in current frame from json
def makeRectangleFromJSON(bb_dict,whichBB):
    """
    Args:
        bb_dict:      (dict) dictionary from json file
        whichBB:        (int) what bounding box is being processed
    Returns:
        x1 ,y1 ,x2 ,y2:    (tuple) tuple containing pixel coordinates for the upper-left and lower-right corners of the bounding box
    """
    x1 ,y1 ,x2 ,y2 = bb_dict[whichBB][0],bb_dict[whichBB][1],bb_dict[whichBB][2],bb_dict[whichBB][3]
    #print(x1 ,y1 ,x2 ,y2, ' These are the coordinates for bounding box ', whichBB)
    return(x1 ,y1 ,x2 ,y2)

whichBB = 0
x1L ,y1L ,x2L ,y2L = makeRectangleFromJSON(bbs_Frame_LC,whichBB) #BB coordinates for current BB
x1R ,y1R ,x2R ,y2R = makeRectangleFromJSON(bbs_Frame_RC,whichBB) #BB coordinates for current BB
print(x1L ,y1L ,x2L ,y2L)
print(x1R ,y1R ,x2R ,y2R)

1384 413 1420 479
614 675 635 724

Let's see if our method is working by visualizing the bounding boxes!

In [44]:

#Make the list of json files
jsonPath =  '\game_1779\object_detector'
jsonList = getListOfFiles(jsonPath , ".json")

#Establish paths to MP4 data
rMP4Path = '\game_1779\LCamera' #Relative path containing the MP4 data from L Camera
lMP4Path = '\game_1779\RCamera' #Relative path containing the MP4 data from R Camera

#Make the list of mp4 files from each camera
rc_mp4s = getListOfFiles(rMP4Path , ".mp4")
lc_mp4s = getListOfFiles(lMP4Path , ".mp4")

#Find the json file to use for the current video
whichVideo = 0
jval_RC = matchJSON2MP4(jsonList, jsonPath, rc_mp4s, rMP4Path, whichVideo)
jval_LC = matchJSON2MP4(jsonList, jsonPath, lc_mp4s, lMP4Path, whichVideo)
print(jval_RC,jval_LC)

#Get json dictionary of all bounding boxes in video
bb_dict_LC = read_json_dict(jsonList[jval_LC])#This is for first video in the LCamera folder
bb_dict_RC = read_json_dict(jsonList[jval_RC])#This is for first video in the RCamera folder

#Find first frame with detections
firstFrame_LC = findFirstFrame(bb_dict_LC)
firstFrame_RC = findFirstFrame(bb_dict_RC)
print(firstFrame_LC,firstFrame_RC)

#Determine frame spacing
frameSpacing_LC = findFrameSpacing(bb_dict_LC)
frameSpacing_RC = findFrameSpacing(bb_dict_RC)

#Which frame to look at
whichFrame =  300

#Get a frame from a video --> The second input needs to be adjusted by the first
#frame with detections and the spacing to get the right data match
lc_frame = get_frame(lc_mp4s[whichVideo], firstFrame_LC + whichFrame*frameSpacing_LC)
rc_frame = get_frame(rc_mp4s[whichVideo], firstFrame_RC + whichFrame*frameSpacing_RC)
#Convert color from BGR to RGB
lc_frame = cv2.cvtColor(lc_frame, cv2.COLOR_BGR2RGB)
rc_frame = cv2.cvtColor(rc_frame, cv2.COLOR_BGR2RGB)
#Make a copy of the frame to store for display of all the bounding boxes
rc_frame_copy = rc_frame.copy()
lc_frame_copy = lc_frame.copy()

lc_rc = [lc_frame,rc_frame]
fig,axs = plt.subplots(1, 2, figsize=(20,20))
i = 0
for frame in lc_rc:
    plt.style.use('ggplot')
    axs[i].grid(False)
    axs[i].imshow(frame)
    i += 1

#Determine number of bounding boxes in current frame
lc_n_bbs = count_bboxes(bb_dict_LC,whichFrame)
rc_n_bbs = count_bboxes(bb_dict_RC,whichFrame)
print(lc_n_bbs,rc_n_bbs)

#Get BB coordinates for current frame
bbs_Frame_LC = get_bb4frame(bb_dict_LC,whichFrame) #BB coordinates for current frame
bbs_Frame_RC = get_bb4frame(bb_dict_RC,whichFrame) #BB coordinates for current frame

#Plot the individual bounding boxes from each frame for RCamera
fig, axs = plt.subplots(1, rc_n_bbs, figsize=(15,15))
#Loop over bounding boxes in current frame
i = 0
for bb in range(rc_n_bbs): #RCamera
    #Get coordinates for current BB
    x1R ,y1R ,x2R ,y2R = makeRectangleFromJSON(bbs_Frame_RC,bb) #BB coordinates for current BB
    currentbox = rc_frame[y1R:y2R,x1R:x2R]
    cv2.rectangle(rc_frame_copy, (x1R, y1R),  (x2R, y2R), (0, 0, 255), 2)
    plt.style.use('ggplot')
    if rc_n_bbs > 1:
        axs[bb].imshow(currentbox)
    else:
        axs.imshow(currentbox) 

#Plot the individual bounding boxes from each frame for LCamera
fig,axs = plt.subplots(1, lc_n_bbs, figsize=(15,15))
i = 0
for bb in range(lc_n_bbs): #LCamera
    #Get coordinates for current BB
    x1L ,y1L ,x2L ,y2L = makeRectangleFromJSON(bbs_Frame_LC,bb) #BB coordinates for current BB
    currentbox = lc_frame[y1L:y2L,x1L:x2L]
    cv2.rectangle(lc_frame_copy, (x1L, y1L),  (x2L, y2L), (0, 0, 255), 4)
    plt.style.use('ggplot')
    if lc_n_bbs > 1:
        axs[bb].imshow(currentbox)
    else:
        axs.imshow(currentbox)

#Plot frames with bounding boxes drawn in
fig,axs = plt.subplots(1, 2, figsize=(20,20))
frame_w_bbs = [lc_frame_copy, rc_frame_copy]     
i = 0
for frame in frame_w_bbs:
    plt.style.use('ggplot')
    axs[i].grid(False)
    axs[i].imshow(frame)
    i += 1
plt.show()

0
19
0 19
These is the first frame to process in video  0
These is the first frame to process in video  62
0 62
The frame spacing is  6
The frame spacing is  6
24 14

Awesome! The methods so far allow me to succesfully extract the player bounding boxes. A few things that can be seen from some of the bounding boxes. First, there are instances of false positives. False positives in this data entail bounding boxes with no players from it. This is something that will need to be addressed in the future.

Applying KMeans Clustering to Bounding Boxes from JSON

Now let's try to apply the KMeans clustering routine on the bounding boxes and see what happens. I'll stick to processing the same video and frame I've been using so far just so I can focus on the clustering itself for a bit.

In [45]:

#Apply K-Means function on the bounding box with K = 4
nClusters = 4

#Plot the individual bounding boxes from each frame for RCamera
fig, axs = plt.subplots(1, rc_n_bbs, figsize=(15,15))
#Loop over bounding boxes in current frame
i = 0
for bb in range(rc_n_bbs): #RCamera
    #Get coordinates for current BB
    x1R ,y1R ,x2R ,y2R = makeRectangleFromJSON(bbs_Frame_RC,bb) #BB coordinates for current BB
    currentbox = rc_frame[y1R:y2R,x1R:x2R]
    cv2.rectangle(rc_frame_copy, (x1R, y1R),  (x2R, y2R), (0, 0, 255), 2)
    plt.style.use('ggplot')
    if rc_n_bbs > 1:
        axs[bb].imshow(currentbox)
    else:
        axs.imshow(currentbox) 

#Plot the individual bounding boxes from each frame for LCamera
fig,axs = plt.subplots(1, lc_n_bbs, figsize=(15,15))
i = 0
for bb in range(lc_n_bbs): #LCamera
    #Get coordinates for current BB
    x1L ,y1L ,x2L ,y2L = makeRectangleFromJSON(bbs_Frame_LC,bb) #BB coordinates for current BB
    currentbox = lc_frame[y1L:y2L,x1L:x2L]
    cv2.rectangle(lc_frame_copy, (x1L, y1L),  (x2L, y2L), (0, 0, 255), 4)
    plt.style.use('ggplot')
    if lc_n_bbs > 1:
        axs[bb].imshow(currentbox)
    else:
        axs.imshow(currentbox)
        
plt.show()

print("******K-Means Results for BBs from R Camera*************")
#Do K Means clustering. The KMeans could and visualize results
for bb in range(rc_n_bbs): #LCamera
    #Get coordinates for current BB
    x1R ,y1R ,x2R ,y2R = makeRectangleFromJSON(bbs_Frame_RC,bb) #BB coordinates for current BB
    currentbox = rc_frame[y1R:y2R,x1R:x2R]
    rgb_array = KMeansTest(currentbox, nClusters)
    print('KM Results for BB ' + str(bb) + ' RCamera')
    plotKMeansResult(nClusters,rgb_array)  
 
print("******K-Means Results for BBs from L Camera*************")
#Do K Means clustering. The KMeans could and visualize results
for bb in range(lc_n_bbs): #LCamera
    #Get coordinates for current BB
    x1L ,y1L ,x2L ,y2L = makeRectangleFromJSON(bbs_Frame_LC,bb) #BB coordinates for current BB
    currentbox = lc_frame[y1L:y2L,x1L:x2L]   
    rgb_array = KMeansTest(currentbox, nClusters)
    print('KM Results for BB ' + str(bb) + ' LCamera')
    plotKMeansResult(nClusters, rgb_array)  

******K-Means Results for BBs from R Camera*************
KM Results for BB 0 RCamera

KM Results for BB 1 RCamera

KM Results for BB 2 RCamera

KM Results for BB 3 RCamera

KM Results for BB 4 RCamera

KM Results for BB 5 RCamera

KM Results for BB 6 RCamera

KM Results for BB 7 RCamera

KM Results for BB 8 RCamera

KM Results for BB 9 RCamera

KM Results for BB 10 RCamera

KM Results for BB 11 RCamera

KM Results for BB 12 RCamera

KM Results for BB 13 RCamera

******K-Means Results for BBs from L Camera*************
KM Results for BB 0 LCamera

KM Results for BB 1 LCamera

KM Results for BB 2 LCamera

KM Results for BB 3 LCamera

KM Results for BB 4 LCamera

KM Results for BB 5 LCamera

KM Results for BB 6 LCamera

KM Results for BB 7 LCamera

KM Results for BB 8 LCamera

KM Results for BB 9 LCamera

KM Results for BB 10 LCamera

KM Results for BB 11 LCamera

KM Results for BB 12 LCamera

KM Results for BB 13 LCamera

KM Results for BB 14 LCamera

KM Results for BB 15 LCamera

KM Results for BB 16 LCamera

KM Results for BB 17 LCamera

KM Results for BB 18 LCamera

KM Results for BB 19 LCamera

KM Results for BB 20 LCamera

KM Results for BB 21 LCamera

KM Results for BB 22 LCamera

KM Results for BB 23 LCamera

Applying Mask To Image Data

One of the main takeaways from the results above is that the green is the dominant color in all bounding boxes. The green hues mostly come from the presence of the grass in the field. This is where masking will come into play. My approach involves setting up low and high threshold values for each of the 3 values that make up the HSV color space (i.e. Hue, Saturation and Brightness). If a color falls within the scope of this threshold then it will be masked out.

In addition to this, I added a bit of error handling for cases where the masking process removed too many pixels. The clustering routine requires the image to be processed to have at least as many unique pixels as there are clusters. Therefore, if the resulting masked iamge has dimensions lower than desired number of clusters, the image would be ignored. This situation is likely to occur in cases where the bounding box only has field

In [201]:

def KMeansMaskGreen(img, clusters, lowHue, highHue, lowSat, highSat, loBright, hiBright):
    """
    Args:
        path2img   : (str) path to cropped player bounding box
        clusters   : (int) how many clusters to use for KMEANS
    Returns:
        rgb_array :  (tuple) Dominant colors in image in RGB format
    """
    org_img = img.copy()
    #print('Org image shape --> ',img.shape)
    green   = np.array([60,25,25])
    loGreen = np.array([lowHue, lowSat, loBright]) #low green threshold
    hiGreen = np.array([highHue, highSat, hiBright]) #Upper green threshold
    #Convert image to HSV
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    #Make the mask
    mask = cv2.inRange(hsv, loGreen, hiGreen)
    mask_img = img.copy()
    mask_img[mask==255] = (255,255,255)
    #Remove white pixels from image so that they don't interfere with the process 
    mask_img = mask_img[np.all(mask_img !=  255 , axis=-1)] 
    
    #Convert image into a 1D array
    flat_img = np.reshape(mask_img,(-1,3))
    arrayLen = flat_img.shape
    #Ensure that masking didn't remove everything (Generally happens in false positives)  
    if mask_img.shape[0] <= clusters:
        #print('Cropped image has dimensions lower than number of desired clusters.Not clustering current image')
        rgb_array = np.empty((clusters,3,))
        rgb_array[:] = np.nan
        return rgb_array
    else:
        rgb_array = []
        
    #Do the clustering
    kmeans = KMeans(n_clusters = clusters, random_state=0, tol = 1e-4)
    kmeans.fit(flat_img)
    #Define the array with centroids
    dominant_colors = np.array(kmeans.cluster_centers_,dtype='uint')
    #Calculate percentages 
    percentages = (np.unique(kmeans.labels_,return_counts=True)[1])/flat_img.shape[0]
        
    #Combine centroids representing dominant colors and percentages 
    #associated with each centroid into an array
    pc = list(zip(percentages,dominant_colors))
    pc = sorted(reversed(pc), reverse = True, key = lambda x: x[0])
    
    i = 0
    for i in range(clusters):
        #dummy_array = pc[i][1]
        rgb_array.append(pc[i][1])
        i += 1
            
    return rgb_array
            
def plotKMeansResult2(nClusters,rgb_array, clustered_image, mode = 1):
    """
    Args:
        rgb_array   : (tuple) Dominant colors in image in RGB format
        nClusters   : (int) how many clusters were used for KMEANS
    """
    
    i = 0
    if mode == 1: #This display mode shows only the clustered colors 
        fig,axs = plt.subplots(1, nClusters, figsize=(20,20))
        for i in range(nClusters):
            color_of_pix    = np.zeros((5, 5, 3), np.uint8)
            color_of_pix[:] = [rgb_array[i][0], rgb_array[i][1], rgb_array[i][2]]
            axs[i].grid(False)
            axs[i].imshow(color_of_pix)
            i += 1
    elif mode == 2: #This display mode shows the clustered image and the clustered colors 
        fig,axs = plt.subplots(1, nClusters + 1, figsize=(5,5))
        for i in range(nClusters):
            color_of_pix    = np.zeros((5, 5, 3), np.uint8)
            color_of_pix[:] = [rgb_array[i][0], rgb_array[i][1], rgb_array[i][2]]
            axs[0].grid(False)
            axs[0].imshow(clustered_image)
            axs[i+1].imshow(color_of_pix)
            axs[i+1].grid(False)
            i += 1
    else:
        print('Invalid display mode. mode must equal 1 or 2')
        
    plt.show()

In [47]:

nClusters = 4
lowHue  = 20
highHue = 90
lowSat = 20
highSat = 255
loBright = 20
hiBright = 255
print("******K-Means Results for BBs from R Camera*************")
#Do K Means clustering. The KMeans could and visualize results
for bb in range(rc_n_bbs): #LCamera
    #Get coordinates for current BB
    x1R ,y1R ,x2R ,y2R = makeRectangleFromJSON(bbs_Frame_RC,bb) #BB coordinates for current BB
    currentbox = rc_frame[y1R:y2R,x1R:x2R]
    rgb_array = KMeansMaskGreen(currentbox, nClusters, 
                                lowHue, highHue, 
                                lowSat, highSat,  
                                loBright, hiBright)
    print('KM Results for BB ' + str(bb) + ' RCamera')
    plotKMeansResult2(nClusters, rgb_array, currentbox, mode = 2)  

******K-Means Results for BBs from R Camera*************
KM Results for BB 0 RCamera

KM Results for BB 1 RCamera

KM Results for BB 2 RCamera

KM Results for BB 3 RCamera

KM Results for BB 4 RCamera

KM Results for BB 5 RCamera

KM Results for BB 6 RCamera

KM Results for BB 7 RCamera

KM Results for BB 8 RCamera

KM Results for BB 9 RCamera

KM Results for BB 10 RCamera

KM Results for BB 11 RCamera

KM Results for BB 12 RCamera

KM Results for BB 13 RCamera

Using a mask to remove the green colors has helped quite a bit in improving the color detection routine!

Remove Bottom Half of Bounding Box to Focus on Jersey Data

Since the assignment is to determine the jersey colors only, then cropping the bottom portion of the image could also help bolster the analysis since we can focus on the region that matters a bit more.

In [48]:

def crop_image(image,howMuch):
    """
    Args:
        img        : (array) image of player bounding box
        howMuch    : (int) percent of image to crop (between 0 and 100)
    Returns:
        cropped_img :   (array) cropped image
    """
    val = howMuch/100
    cropped_img = image[0:int(image.shape[0]*val),0:int(image.shape[0])]
    return cropped_img

In [49]:

howMuch = 50
print("******K-Means Results for BBs from R Camera*************")
#Do K Means clustering. The KMeans could and visualize results
for bb in range(rc_n_bbs): #LCamera
    #Get coordinates for current BB
    x1R ,y1R ,x2R ,y2R = makeRectangleFromJSON(bbs_Frame_RC,bb) #BB coordinates for current BB
    currentbox = rc_frame[y1R:y2R,x1R:x2R]
    croped_bb = crop_image(currentbox,howMuch)
    rgb_array = KMeansMaskGreen(croped_bb, nClusters, 
                                lowHue, highHue, 
                                lowSat, highSat,  
                                loBright, hiBright)
    print('KM Results for BB ' + str(bb) + ' RCamera')
    plotKMeansResult2(nClusters, rgb_array, croped_bb, mode = 2)  

******K-Means Results for BBs from R Camera*************
KM Results for BB 0 RCamera

KM Results for BB 1 RCamera

KM Results for BB 2 RCamera

KM Results for BB 3 RCamera

KM Results for BB 4 RCamera

KM Results for BB 5 RCamera

KM Results for BB 6 RCamera

KM Results for BB 7 RCamera

KM Results for BB 8 RCamera

KM Results for BB 9 RCamera

KM Results for BB 10 RCamera

KM Results for BB 11 RCamera

KM Results for BB 12 RCamera

KM Results for BB 13 RCamera

Much better!

Processing Entire MP4 file

Let's try processing every frame and see what happens! I took all my routines so far and placed them into a wrapper function below. This wrapper function takes the path to the json files, the path to the mp4 files, the video to be processed, and the number of clusters for k-means as input. The output of this function is a pandas data frame containing the dominant colors in RGB format for each bounding box, in each frame for the current video.</p>

In [279]:

def getJerseyColorsFromMP4(jsonPath,MP4Path,whichVideo,nClusters):
    #Make the list of mp4 and json files from each camera
    print('Retrieving MP4 and JSON files...')
    mp4List  = getListOfFiles(MP4Path  , ".mp4")
    jsonList = getListOfFiles(jsonPath , ".json")
    
    #Find the json file to use for the current video
    jval = matchJSON2MP4(jsonList, jsonPath, mp4List, MP4Path, whichVideo)

    #Get json dictionary of all bounding boxes in video
    bb_dict = read_json_dict(jsonList[jval])#This is for first video in the LCamera folder

    #Find first frame with detections
    firstFrame = findFirstFrame(bb_dict)

    #Determine frame spacing
    frameSpacing = findFrameSpacing(bb_dict)
    
    #Which frame to look at
    whichFrame =  0
    whichFrameAdj = firstFrame + whichFrame*frameSpacing #Adjust for video data to match json detection
    nf = int(count_frames(mp4List[whichVideo])/10) #Number of frames in video
    print('Initializing arrays...')
    #Initialize arrays 
    dom_color1 = []
    dom_color2 = []
    dom_color3 = []
    frame_list = []
    bb_list    = []
    video_list = []
    #Insert loop here for frames
    print('Starting jersey color detection ...')    
    while whichFrameAdj < nf:
        for i in tqdm(range(nf), desc="Processing Frame"):#Add progress bar for frames processed            
            #Get a frame from video
            frame = get_frame(mp4List[whichVideo], whichFrameAdj)
            #Convert color from BGR to RGB
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            #Make a copy of the frame to store for display of all the bounding boxes
            frame_copy = frame.copy()
            #Determine number of bounding boxes in current frame
            n_bbs = count_bboxes(bb_dict,whichFrame)
            #Get BB coordinates for current frame
            bbs_frame = get_bb4frame(bb_dict,whichFrame) #BB coordinates for current frame
            #Loop over bounding boxes in current frame
            for bb in range(n_bbs):
                #print('****Frame ' + str(whichFrameAdj) + ' BB ' + str(bb) + '****')
                frame_list.append(whichFrameAdj) #Append frame ID to list
                bb_list.append(bb)
                video_list.append(whichVideo)

                x1 ,y1 ,x2 ,y2 = makeRectangleFromJSON(bbs_frame,bb) #Coordinates for current BB
                currentbox = frame[y1:y2,x1:x2]
                cv2.rectangle(frame_copy, (x1, y1),  (x2, y2), (0, 0, 255), 2)
                #Crop the bounding box
                croped_bb = crop_image(currentbox,howMuch)
                #Do the clustering
                rgb_array = KMeansMaskGreen(croped_bb, nClusters,
                                            lowHue, highHue, lowSat, highSat, loBright, hiBright)
                #Append dominant RGB colors into respective arrays
                dom_color1.append(rgb_array[0])
                dom_color2.append(rgb_array[1])
                dom_color3.append(rgb_array[2])
            
            whichFrame += 1
            whichFrameAdj = firstFrame + whichFrame*frameSpacing #Adjust for video data to match json
    
    print('Making pandas dataframe containing results...')
    jerseyColor_df = pd.DataFrame({'Video ID': video_list,
                                   'Frame ID': frame_list,
                                   'BB in Frame': bb_list,
                                   'Jersey Color 1': dom_color1,
                                   'Jersey Color 2': dom_color2,
                                   'Jersey Color 3': dom_color3})
    print('PROCESS COMPLETED')
    return jerseyColor_df

In [264]:

jsonPath = '\game_1779\object_detector'#Establish paths to json files
MP4Path  = '\game_1779\LCamera'        #Establish paths to MP4 files
whichVideo = 0 #Which video to look at?
howMuch = 50   #How much to crop off the original bounding box height

nClusters = 4  #k-value for k-means clustering routine
#Masking parameters
lowHue  = 20   #low hue value
highHue = 90   #High hue value
lowSat = 20    #low saturation value
highSat = 255  #High saturation value
loBright = 20  #low brightness value
hiBright = 255 #High brightness value

jerseyColor_df = getJerseyColorsFromMP4(jsonPath,MP4Path,whichVideo,nClusters)

Retrieving MP4 and JSON files...
0
These is the first frame to process in video  62
The frame spacing is  6
Initializing arrays...
Starting jersey color detection ...

Processing Frame: 100%|██████████████████████████████████████████████████████████████| 723/723 [09:48<00:00,  1.23it/s]

Making pandas dataframe containing results...
PROCESS COMPLETED

The function ran without hiccups it seems! In the example shown here, a total of 723 frames were processed in just under 10 minutes. The average frame processing rate was 1.23frame/s. This rate of course depends on how many bounding boxes are being processed for a given frame.

Let's take a look at the dataframe we made.

In [268]:

pd.set_option('display.max_rows', 10)
jerseyColor_df

Out[268]:

	Video ID	Frame ID	BB in Frame	Jersey Color 1	Jersey Color 2	Jersey Color 3
0	0	62	0	[89, 86, 78]	[178, 171, 163]	[63, 58, 50]
1	0	68	0	[99, 94, 74]	[121, 113, 88]	[113, 108, 79]
2	0	68	1	[63, 7, 7]	[14, 9, 7]	[56, 42, 26]
3	0	92	0	[132, 131, 131]	[97, 96, 92]	[170, 172, 164]
4	0	92	1	[84, 90, 101]	[122, 126, 134]	[54, 53, 44]
...	...	...	...	...	...	...
9302	0	4394	19	[54, 33, 10]	[84, 55, 31]	[106, 84, 56]
9303	0	4394	20	[136, 125, 85]	[82, 17, 15]	[99, 72, 54]
9304	0	4394	21	[131, 129, 108]	[93, 89, 97]	[152, 155, 165]
9305	0	4394	22	[60, 33, 15]	[136, 115, 86]	[110, 72, 59]
9306	0	4394	23	[40, 44, 57]	[6, 12, 18]	[141, 142, 147]

9307 rows × 6 columns

The dataframe has the desired structure. As can be seen, a total of 9,307 bounding boxes were processed. This means that 15.8 bounding boxes can be processed per second. Now I'll remove any rows in the dataframe that contained arrays with NaNs (these are the arrays containing false positives).

In [269]:

#Remove nan detections from dataframe
test_df = jerseyColor_df[~jerseyColor_df.applymap(lambda x : np.isnan(x).any()).any(1)]
test_df

Out[269]:

	Video ID	Frame ID	BB in Frame	Jersey Color 1	Jersey Color 2	Jersey Color 3
0	0	62	0	[89, 86, 78]	[178, 171, 163]	[63, 58, 50]
1	0	68	0	[99, 94, 74]	[121, 113, 88]	[113, 108, 79]
2	0	68	1	[63, 7, 7]	[14, 9, 7]	[56, 42, 26]
3	0	92	0	[132, 131, 131]	[97, 96, 92]	[170, 172, 164]
4	0	92	1	[84, 90, 101]	[122, 126, 134]	[54, 53, 44]
...	...	...	...	...	...	...
9302	0	4394	19	[54, 33, 10]	[84, 55, 31]	[106, 84, 56]
9303	0	4394	20	[136, 125, 85]	[82, 17, 15]	[99, 72, 54]
9304	0	4394	21	[131, 129, 108]	[93, 89, 97]	[152, 155, 165]
9305	0	4394	22	[60, 33, 15]	[136, 115, 86]	[110, 72, 59]
9306	0	4394	23	[40, 44, 57]	[6, 12, 18]	[141, 142, 147]

9026 rows × 6 columns

There were 281 false positives in the dataset processed so far which means that the classification process of player objects had is 97% accuracy which is quite good!. Let's now try to see what the top 5 colors in the currently processed MP4 frames are using the KMeans routine on the RGB columns from out dataframe.

In [293]:

jcrgb1 = test_df[['Jersey Color 1']].to_numpy()
jcrgb2 = test_df[['Jersey Color 2']].to_numpy()
jcrgb3 = test_df[['Jersey Color 3']].to_numpy()
jc_list = [jcrgb1,jcrgb2,jcrgb3]
title_List = ['Dominant Colors in Jersey Color 1',
             'Dominant Colors in Jersey Color 2',
             'Dominant Colors in Jersey Color 3']
clusters = 5
listElem = 0
#print(len(jcrgb1[259][0]))
for elem in jc_list:
    totColor = []
    rows = elem.shape[0] - 1
    for i in range(rows-1):
        #print(i,jcrgb1[i][0], len(jcrgb1[i][0]))
        if len(elem[i][0]) != 3:
            rows -= 1
            continue
        else:
            totColor.append(elem[i][0]) 
    #Do KMeans
    kmeans = KMeans(n_clusters = clusters)
    kmeans.fit(totColor)

    #Define the array with centroids
    dominant_colors = np.array(kmeans.cluster_centers_,dtype = 'uint')

    #Calculate percentages 
    percentages = (np.unique(kmeans.labels_,return_counts=True)[1])/(rows+1)

    pc = zip(percentages,dominant_colors)
    pc = sorted(pc,reverse=True)
    
    #Plotting utility
    print(title_List[listElem])
    block = np.ones((50,50,3),dtype='uint')
    plt.figure(figsize=(12,8))
    i=0
    for i in range(clusters):
        plt.subplot(1,clusters,i+1)
        block[:] = pc[i][1]# we have done this to convert bgr(opencv) to rgb(matplotlib) 
        plt.imshow(block)
        plt.xticks([])
        plt.yticks([])
        plt.xlabel(str(round(pc[i][0]*100,2))+'%')
    
    bar = np.ones((50,500,3),dtype='uint')
    plt.figure(figsize=(12,8))
    listElem += 1
    start = 0
    i = 0
    for p,c in pc:
        end = start+int(p*bar.shape[1])
        if i==clusters:
            bar[:,start:] = c
        else:
            bar[:,start:end] = c
        start = end
        i+=1
    plt.imshow(bar)
    plt.xticks([])
    plt.yticks([])
    plt.title('Color Distribution')
    plt.show()

Dominant Colors in Jersey Color 1

Dominant Colors in Jersey Color 2

Dominant Colors in Jersey Color 3

Nice! The clustering process seems to have worked quite well! The team jersey colors for this particular game were red and white. The visualization above shows that my algorithm is capable of doing this quite well! One more thing that would be interesting to incorporate into this routine is the ability to assess whether the player in a given bounding box belongs to Team 1 or Team 2. I have an idea to incorporate this, but I'll have to add this later.

I could now easily wrap this clustering routine within another function that will loop through every mp4 file in a directory. Then I could concatenate the dataframes from each video together to get the player jersey colors for an entire game duration!

In [ ]:

def getJerseyColorsFromGame(jsonPath,MP4Path,whichVideo,nClusters):
    mp4_list = getListOfFiles(MP4Path , ".mp4")
    n_mp4 = len(mp4_list)
    df_list = []
    for vid in range(n_mp4):
        jerseyColor_df = getJerseyColorsFromMP4(jsonPath,MP4Path,vid,nClusters)
        df_list.append(jerseyColor_df)
        
    allJerseyColors = pd.concat(df_list)
    return allJerseyColors

Conclusions

The work shown above demonstrate how the K-Means clustering algorithm can be used to extract the jersey color from soccer game video footage. The results of the clustering process are in agreement with the expected output (i.e., dominant colors are shades of red and white while team jersey colors are red and white).

Determining Soccer Player Jersey Color Using K-Means Clustering