Determining Soccer Player Jersey Color Using K-Means Clustering
jekyll
jupyter_notebook
clustering
image_data
video_data
featured
portfolio
Determining Soccer Player Jersey Colors from Video Footage Via K-Means Clustering
In this project, I took a series of video footage provided by Trace taken at different soccer games and determined the color of the player jersey colors using the K-Means Clustering Algorithm. This notebook will detail my process towards accomplishing that goal. The routine developed here takes video footage as input and generates a pandas dataframe containing the results of the clustering process as output. I had limited experience in terms of doing image/video processing in Python prior to this project so this was an incredible learning experience for me!
The rough outline of my process involved the following steps:
- Learn the basics of image/video processing/manipulation
- Learn about classification algorithms to identify objects in images (i.e., generate bounding boxes
- Learn about different color spaces
- Learn how to extract color in different ways
- Create algorithm that identifies dominant color in bounding boxes in each frame of the video
- Refine the algorithm to handle special cases
This project required me to do data cleanup, clustering, image/video processing, elementary classification of objects in images, reading json files and a variety of dataframe/numpy array/list operations.With that said, let's get started!
Table of Contents
- Chapter 1: Image/Video Processing Basics
- Chapter 2: Extracting Color From Images
- Chapter 3: Identifying humans in Images
- Chapter 4: Working with the Video Files
- Chapter 5: Loading the JSON files and Checking Bounding Boxes
- Chapter 6: Applying KMeans Clustering to Bounding Boxes from JSON+MP4 Data
- Conclusions
Modules Used in this work
Here are the most important modules used for this project:
- open cv : Used to carry out operations on images
- PIL : Used to carry out operations on images
- numpy : Used to perform computations on array data
- pandas : Used to load, process, analyze, operate and export dataframes
- sklearn : Used to carry out the K-means clustering routine
- matplotlib.pyplot :Used for plotting/visualizing our results
- json : Used to handle json files that contained player bounding box information
- numpy : Used to perform computations on array data
The API documentation for each of these modules can be found here:
#Importing modules
import cv2
import numpy as np
from PIL import Image, ImageChops
import scipy
import scipy.misc
import scipy.cluster
import sys, glob , time, utils, struct, urllib, os, os.path
import urllib.request
import imutils
import json
import matplotlib.pyplot as plt
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import pandas as pd
from tqdm import *
np.set_printoptions(threshold=sys.maxsize)
Image/Video Processing Basics
This section will just cover things I had to learn about image/video processing manipulation before I could carry out the project. Feel free to skip this section :]
To try out my hand at the various processing techniques available, I decided to use one of my favorite moments in soccer history as a reference image. That moment is the brilliant goal that Ronaldinho scored against Real Madrid while playing for FC Barcelona back in November 19, 2005 shown below.
This image not only brings back great memories but it also has several elements that will be useful for me to consider moving forward. For instance, it has some pretty clear player objects that I could try to generate bounding boxes for, it has plenty of 'field' as part of the image that I could use to learn how to mask certain colors, and it has players from both teams which I could use to start figuring out how to classify stuff.
Loading Images
The first thing I need to learn how to do is how to load images into my notebook. If you have the image saved in your computer then you can simply use the cv2.imread
function. For the image I'm using for this portion of my work however, I'm getting it via a URL. Loading an image then requires us to:
- Pass our URL into
urllib.request.urlopen
- Create a numpy array from the image in the URL
- Use
cv2.imdecode
to read image data from a memory cache and convert it into image format. - Since
cv2.imdecode
loads images by default in BGR format, I'll usecv2.cvtColor(img, cv2.COLOR_BGR2RGB)
to process and render the image in original RGB
The results of this process are shown below:
#Render image from URL
req = urllib.request.urlopen('https://www.sportbible.com/cdn-cgi/image/width=648,quality=70,format=webp,fit=pad,dpr=1/https%3A%2F%2Fs3-images.sportbible.com%2Fs3%2Fcontent%2Fcf2701795dd2a49b4d404d9fa38f99fd.jpg')
arr = np.asarray(bytearray(req.read()), dtype=np.uint8)
bgr_img = cv2.imdecode(arr, -1) # 'Load it as it is'
# Determine the figures size in inches to fit image
dpi = plt.rcParams['figure.dpi']
height, width, depth = bgr_img.shape
figsize = width / float(dpi), height / float(dpi)
plt.figure(figsize=figsize)
plt.imshow(bgr_img)
plt.show()
#Convert image to RGB from BGR
rgb_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2RGB)
plt.figure(figsize=figsize)
plt.imshow(rgb_img)
plt.show()
Rotating an image
There's a few different methods to rotate an image. The imutils package has the easiest implementation via the imutils.rotate_bound
function since all it requires is the image to be rotated and the angle by which we want to rotate our image. In addition to this, this function ensures that the displayed rotated image is not cropped and is fully contained within the bounds. The other methods require the construction of a rotation matrix first followed by the application of the rotation matrix.
#Rotating an image
rotated0 = imutils.rotate_bound(rgb_img,0)
rotated45 = imutils.rotate_bound(rgb_img,45)
rotated90 = imutils.rotate_bound(rgb_img,90)
fig,axs = plt.subplots(1,3, figsize=(30,15))
axs[0].imshow(rotated0)
axs[1].imshow(rotated45)
axs[2].imshow(rotated90)
plt.show()
Cropping Images
When loading images through opencv, the image is loaded as a numpy array. Then, to crop the image we can simply use numpy slicing to crop stuff. Multiple ways to crop things exist. I'll show a simple example here where we can crop an image by different percentages of the height and width. There's fancier ways to crop things by defining regions of interest (ROIs) and contouring which I'll show in a later section.
#Need to find the starting/ending column and row index first for the desired cropping
cropIni = [0.15,0.3,0.45]
#Crop width and height of image by 15% each
startRow1 = int(height*cropIni[0]) ;startCol1 = int(width*cropIni[0])
endRow1 = int(height*(1-cropIni[0])) ;endCol1 = int(width*(1-cropIni[0]))
#Crop width and height of image by 30% each
startRow2= int(height*cropIni[1]) ;startCol2 = int(width*cropIni[1])
endRow2 = int(height*(1-cropIni[1])) ;endCol2 = int(width*(1-cropIni[1]))
#Crop width and height of image by 40% each
startRow3 = int(height*cropIni[2]) ;startCol3 = int(width*cropIni[2])
endRow3 = int(height*(1-cropIni[2])) ;endCol3 = int(width*(1-cropIni[2]))
#This is just slicing the array
fig,axs = plt.subplots(1,3, figsize=(30,15))
crop1 = rgb_img[startRow1:endRow1, startCol1:endCol1]
crop2 = rgb_img[startRow2:endRow2, startCol2:endCol2]
crop3 = rgb_img[startRow3:endRow3, startCol3:endCol3]
axs[0].imshow(crop1)
axs[1].imshow(crop2)
axs[2].imshow(crop3)
plt.show()
Resizing Images
There are many ways to resize images. Here I'll show how an image can resized using the resize function in OpenCV. Even though the images look identical, it can be seen that the size (height and width) of the image are changing when we resize it.
#Resizing an image
#cv2.resize(src, dsize[, dst[, fx[, fy[, interpolation]]]])
xscale = [0.75,0.5,0.25]
yscale = [0.75,0.5,0.25]
rimg1 = cv2.resize(rgb_img, (0,0), fx=xscale[0], fy=yscale[0])
rimg2 = cv2.resize(rgb_img, (0,0), fx=xscale[1], fy=yscale[1])
rimg3 = cv2.resize(rgb_img, (0,0), fx=xscale[2], fy=yscale[2])
fig,axs = plt.subplots(1,3, figsize=(30,15))
axs[0].imshow(rimg1)
axs[1].imshow(rimg2)
axs[2].imshow(rimg3)
plt.show()
print("The width, height and depth of this image are ",rimg1.shape)
print("The width, height and depth of this image are ",rimg2.shape)
print("The width, height and depth of this image are ",rimg3.shape)
The width, height and depth of this image are (304, 486, 3) The width, height and depth of this image are (202, 324, 3) The width, height and depth of this image are (101, 162, 3)
Adjusting brightness/contrast of Images
Adjusting the brightness/contrast of images can be done via the addWeighted function in OpenCV. This is a process referred to as blending. This function uses the following transformation to make those adjustments to the image:
$result = \alpha src1 + \beta src2 + \gamma$
In the equation above the blended image is modified by the applying the $\alpha$ value to the source image, the $\beta$ value to some other image (it can be the same source image), and increasing its value by $\gamma$.
The effects of blending are shown in the plots below. The first row of plots shows the effect of varying $\alpha$ while keeping the other two parameters constant ($\alpha$ decreases from left to right). The second row of plots shows the effect of varying $\beta$ while keeping the other two parameters constant ($\beta$ increases from left to right). The third row of plots shows the effect of varying $\gamma$ while keeping the other two parameters constant ($\gamma$ increases from left to right).
Decreasing $\alpha$ causes the image to darken.
Increasing $\beta$ causes the image to have more contrast.
Decreasing $\gamma$ causes the image to soften.
#cv2.addWeighted(source_img1, alpha, source_img2, beta, gamma)
alpha = [0.75, 0.5, 0.25]
beta = [0, 1 , 10]
gamma = [0, 10 ,100]
#Vary alpha
alpha_img1 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[0], gamma[0])
alpha_img2 = cv2.addWeighted(rgb_img, alpha[1], rgb_img, beta[0], gamma[0])
alpha_img3 = cv2.addWeighted(rgb_img, alpha[2], rgb_img, beta[0], gamma[0])
#Vary beta
beta_img1 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[0], gamma[0])
beta_img2 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[1], gamma[0])
beta_img3 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[2], gamma[0])
#Vary gamma
gamma_img1 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[0], gamma[0])
gamma_img2 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[0], gamma[1])
gamma_img3 = cv2.addWeighted(rgb_img, alpha[0], rgb_img, beta[0], gamma[2])
#Plot results
fig,axs = plt.subplots(3,3, figsize=(30,30))
#These images
axs[0,0].imshow(alpha_img1)
axs[0,1].imshow(alpha_img2)
axs[0,2].imshow(alpha_img3)
axs[1,0].imshow(beta_img1)
axs[1,1].imshow(beta_img2)
axs[1,2].imshow(beta_img3)
axs[2,0].imshow(gamma_img1)
axs[2,1].imshow(gamma_img2)
axs[2,2].imshow(gamma_img3)
plt.show()
Change Color Space of Images
There are a variety of color spaces used in image processing that can facilitate a wide array of tasks like edge detection and masking to name a few. Converting between color spaces can be readily done with OpenCV through the cvtColor function
A few common color spaces are listed below
- RGB -> Many images are initially encoded using this format
- HSV -> Provides greater control on color Hues
- GRAY -> Makes many image processing methods more accurate
The image will be displayed using the methods above.
gray_img = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2GRAY)
bgr_img = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2BGR)
hsv_img = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2HSV)
fig,axs = plt.subplots(1,4, figsize=(30,15))
axs[0].imshow(rgb_img)
axs[1].imshow(gray_img)
axs[2].imshow(bgr_img)
axs[3].imshow(hsv_img)
plt.show()
Blurring Images
Blurring is an important operation when trying to detect edges (i.e., the lines that delineate the transition from one group of pixels to another) since it makes the transition between object boundaries smoother. This can be used to separate an object from a background for instance
There's four categories I looked into for this project
- Average blurring -> Fast but may not preserve object edges
- Gaussian blurring -> Slower than Average blurring but better at edge preservation
- Median filtering -> Robust to outliers
- Bilateral filtering -> Much slower than above methods. More parameters (more tunable).
The effects of using different blurring methods are shown in the plots below. The first row of plots shows the effect of using average blurring while increasing the kernel size from left to right. The second row of plots shows the effect of using gaussian blurring while increasing the kernel size from left to right. The third row of plots shows the effect of using median blurring while increasing the kernel size from left to right. The fourth row of plots shows the effect of using bilateral blurring while increasing the diameter, sigmaColor, and sigmaSpace parameters from left to right.
params = [(3, 20, 5, 5), (9, 20, 40, 20), (15, 20, 160, 60)]
fig,axs = plt.subplots(4, 3, figsize=(30,30))
i = 0
for (k, diameter, sigmaColor, sigmaSpace) in params:
simpleblur_image = cv2.blur(rgb_img, (k,k))
gaussblur_image = cv2.GaussianBlur(rgb_img, (k,k), 0)
medianblur_image = cv2.medianBlur(rgb_img, k)
bilateralblur_image = cv2.bilateralFilter(rgb_img, diameter, sigmaColor, sigmaSpace)
axs[0,i].imshow(simpleblur_image)
axs[1,i].imshow(gaussblur_image)
axs[2,i].imshow(medianblur_image)
axs[3,i].imshow(bilateralblur_image)
i+=1
#Plot results
plt.show()
Detecting Edges in Images
Edge detection is an image-processing technique that allows for the identification of the boundaries (i.e., edges) of objects within an image. Edges allow us to identify the underlying structure of an image and hence make them one of the most important bits of information that we need from images.
The Canny algorithm was used below to detect the edges on the image.
#cv2.Canny(image, minVal, maxVal)
img_gray = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2GRAY)
thresholds = [(5,150), (100,150), (200,225)]
fig,axs = plt.subplots(1,4, figsize=(30,15))
i = 0
axs[i].imshow(rgb_img)
for (minVal, maxVal) in thresholds:
edge_img = cv2.Canny(img_gray, minVal, maxVal, apertureSize = 3, L2gradient = False)
axs[i+1].imshow(edge_img)
i += 1
plt.show()
Masking Colors in Images
Oftentimes, one may want to show only specific colors in an image. This can be accomplished by masking. The inRange function in OpenCV allows this to be readily done when working in HSV space.
The images shown below are (from left to right) the result of applying no mask, of masking the green hues, the red hues and blue hues respectively.
#Remove green background/field from image prior to clustering
green = np.array([60,255,255]) #This is green in HSV
loGreen = np.array([30,25,25]) #low green threshold
hiGreen = np.array([90,255,255]) #Upper green threshold
loBlue = np.array([0,25,25]) #low red threshold
hiBlue = np.array([30,255,255]) #Upper red threshold
loRed = np.array([120,25,25]) #low blue threshold
hiRed = np.array([180,255,255]) #Upper blue threshold
#Convert image to HSV
hsv = cv2.cvtColor(rgb_img, cv2.COLOR_BGR2HSV)
gmask = cv2.inRange(hsv, loGreen, hiGreen)
rmask = cv2.inRange(hsv, loRed , hiRed)
bmask = cv2.inRange(hsv, loBlue , hiBlue)
gresult = rgb_img.copy()
bresult = rgb_img.copy()
rresult = rgb_img.copy()
gresult[gmask==255] = (255,255,255)
bresult[bmask==255] = (255,255,255)
rresult[rmask==255] = (255,255,255)
fig,axs = plt.subplots(1,4, figsize=(30,15))
axs[0].imshow(rgb_img)
axs[1].imshow(gresult)
axs[2].imshow(rresult)
axs[3].imshow(bresult)
plt.show()
Drawing Contours in Images
Oftentimes, one may want to show only specific colors in an image. This can be accomplished by masking. The inRange function in OpenCV allows this to be readily done when working in HSV space.
The images shown below are (from left to right) the result of applying no mask, of masking the green hues, the red hues and blue hues respectively.
#Getting image contours
img_gray = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2GRAY)
gaussblur_image = cv2.GaussianBlur(img_gray, (15,15), 0)
retval, thresh = cv2.threshold(gaussblur_image, 125, 130, cv2.THRESH_TOZERO)
img_contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
thresh_img = rgb_img.copy()
cv2.drawContours(thresh_img, img_contours, -1, (127, 50, 250),2)
fig,axs = plt.subplots(1,1, figsize=(30,15))
axs.imshow(thresh_img)
plt.show()
Selecting Regions of Interest in Images
Selecting an ROI is another form of cropping. The method shown here is a good way to quickly crop your images if not having to process too many of them.
#Select ROI from image
imagedraw = cv2.selectROI('select',rgb_img)
cv2.waitKey(0)
cv2.destroyWindow('select')
#cropping the area of the image within the bounding box using imCrop() function
roi_image = rgb_img[int(imagedraw[1]):int(imagedraw[1]+imagedraw[3]),
int(imagedraw[0]):int(imagedraw[0]+imagedraw[2])]
fig,axs = plt.subplots(1,1, figsize=(5,5))
axs.imshow(roi_image)
plt.show()
Extracting color from images
At this point, I felt pretty comfortable manipulating images and doing basic processing operations that I was confident would be sufficient to acheive my goal of determining the player color from images. In order to determine color I tried the following things:
- Extract color at a single pixel
- Extract color via pixel by pixel averaging
- Use K-Means clustering to get k-colors in image
Extract color at a single pixel
Extracting color at a single pixel can be easily done by providing the x,y coordinates of a pixel to the image array. I wrote a little function that does this for a variety of pixels in the image below. The result of this is shown below for 17 different pixels.
#Get color from single pixel in image
#Make list of pixel coordinates based on image shape
y = range(0, height, 25)
x = range(0, width, 25)
#Combine lists above into a list of tuples
merged_list = tuple(zip(x, y))
#Initialize the plot
fig,axs = plt.subplots(1, len(y), figsize=(30,30))
i = 0
#Iterate over elements in tuple list of pixel coordinates
for (x, y) in merged_list:
#Return rgb tuple at x,y coordinate
r, g, b = (rgb_img[x, y])
# Creating rgb array from rgb tuple
color_of_pix = np.zeros((5, 5, 3), np.uint8)
color_of_pix[:] = [r, g, b]
#Display rgb array
axs[i].imshow(color_of_pix)
i += 1
plt.show()
Extract dominant color via pixel by pixel averaging
Now that we can extract color at a single pixel, we can extend the method to determine the average color of the image. Passing an x,y coordinate to our image array returns an RGB tuple for a pixel. Then, by adding the value of each element in the tuple at each pixel we can get the "total counts" associated with each of the RGB channels. Finally, we can divide the counts in each of the RGB color chanels by the total number of pixels in our image to get the average color of the image. The result of this process is shown below where the average color turns out to be a light brown which through visual inspection of the image does appear to be reasonable. However, can we improve this?
#Determining most frequently occurring color pixel by pixel
def most_common_used_color(img):
# Get width and height of Image
height, width, depth = img.shape
# Initialize Variable
r_total = 0
g_total = 0
b_total = 0
count = 0
# Iterate through each pixel
for x in range(0, height):
for y in range(0, width):
# r,g,b value of pixel
r, g, b = (img[x, y])
r_total += r
g_total += g
b_total += b
count += 1
return (r_total/count, g_total/count, b_total/count)
#Function to convert RGB channels to hex code
def rgb2hex(rgb_tuple):
r = round(rgb_tuple[0])
g = round(rgb_tuple[1])
b = round(rgb_tuple[2])
return "#{:02x}{:02x}{:02x}".format(r,g,b)
# call function
common_color = most_common_used_color(rgb_img)
print(common_color)
print(rgb2hex(common_color))
color_of_pix = np.zeros((5, 5, 3), np.uint8)
color_of_pix[:] = [common_color[0], common_color[1], common_color[2]]
fig,axs = plt.subplots(1, 2, figsize=(10,10))
axs[0].imshow(rgb_img)
axs[1].imshow(color_of_pix)
plt.show()
(119.83675506782502, 116.59529797286999, 97.04080170705686) #787561
Extract dominant colors via K-Means Clustering
The player jersey color detection routine can be further improved by making use of the K-Means clustering algorithm. This routine will allow us to extract however many "dominant colors" in our image by specifying the number of clusters, k, that the routine should use. Determining the value of k, can be done a priori if one knows how many clusters the data should fall into. Otherwise, a commonly use method to determine the k value is through the elbow method as is shown below. An elbow method plot takes the distortions of the model and plots them against the value of k. The inflection point (aka the elbow) of the graph is the k value that should be used. The results of the elbow plot using the image we've been working with indicates that the optimal k-value is somewhere between 3-4. Given that, I'll try both :]
#Determine optimal k value for clustering using elbow method
distortions = [] #Initialize array with distortions from each clustering run
K = range(1,11) #Explore k values between 1 and 10
#Run the clustering routine
for k in K:
#Convert image into a 1D array
flat_img = np.reshape(rgb_img,(-1,3))
kmeanModel = KMeans(n_clusters=k)
kmeanModel.fit(flat_img)
distortions.append(kmeanModel.inertia_)
plt.figure(figsize=(16,8))
plt.style.use('Solarize_Light2')
plt.plot(K, distortions, 'bx-')
plt.yticks(fontsize=20)
plt.xticks(fontsize=20)
plt.xlabel('k', fontsize=20)
plt.ylabel('Distortion', fontsize=20)
plt.title('Elbow Method showing optimal k')
plt.grid(True)
plt.show()
Running K-Means Clustering on Image
Having established that k should be either 3 or 4, I can write a little routine that will take an image and determine the k-dominant colors in it. The results for k = 3, k = 4, and k = 10 (which I did just for funsies) cases are shown below.
def KMeansTest(img,clusters):
"""
Args:
path2img : (str) path to cropped player bounding box
clusters : (int) how many clusters to use for KMEANS
Returns:
rgb_array : (tuple) Dominant colors in image in RGB format
"""
org_img = img.copy()
#print('Org image shape --> ',img.shape)
#Convert image into a 1D array
flat_img = np.reshape(img,(-1,3))
arrayLen = flat_img.shape
#Do the clustering
kmeans = KMeans(n_clusters = clusters, random_state=0, tol = 1e-4)
kmeans.fit(flat_img)
#Define the array with centroids
dominant_colors = np.array(kmeans.cluster_centers_,dtype='uint')
#Calculate percentages
percentages = (np.unique(kmeans.labels_,return_counts=True)[1])/flat_img.shape[0]
#Combine centroids representing dominant colors and percentages associated with each centroid into an array
pc = list(zip(percentages,dominant_colors))
pc = sorted(pc,reverse=True)
i = 0
rgb_array = []
for i in range(clusters):
dummy_array = pc[i][1]
rgb_array.append(dummy_array)
i += 1
return rgb_array
def plotKMeansResult(nClusters,rgb_array):
"""
Args:
rgb_array : (tuple) Dominant colors in image in RGB format
nClusters : (int) how many clusters were used for KMEANS
"""
fig,axs = plt.subplots(1, nClusters, figsize=(20,20))
i = 0
for i in range(nClusters):
color_of_pix = np.zeros((5, 5, 3), np.uint8)
color_of_pix[:] = [rgb_array[i][0], rgb_array[i][1], rgb_array[i][2]]
axs[i].grid(False)
axs[i].imshow(color_of_pix)
i += 1
plt.show()
#Call K-Means function with K = 3
nClusters = 3
rgb_array = KMeansTest(rgb_img, nClusters)
plotKMeansResult(nClusters,rgb_array)
#Call K-Means function with K = 4
nClusters = 4
rgb_array = KMeansTest(rgb_img, nClusters)
plotKMeansResult(nClusters,rgb_array)
#Call K-Means function with K = 10
nClusters = 10
rgb_array = KMeansTest(rgb_img, nClusters)
plotKMeansResult(nClusters,rgb_array)
Identifying humans in Images
Before jumping into trying my hand at clustering the video footage,there was one more thing I was interested in. The problem I also wanted to look into is how to classify/identify players/humans in pictures.
After a bit of reading, I came across the HOG package in OpenCV that contains databases of trained models capable of detecting different objects like cats, faces, and humans.
#Detecting humans with HOG
path2xml = r'C:\Users\vmurc\Documents\GitHub\opencv\data\haarcascades\haarcascade_fullbody.xml'
fbCascade = cv2.CascadeClassifier(path2xml)
# Initializing the HOG person detector
image = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2GRAY)
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
# Resizing the Image
image = imutils.resize(image, width = min(1000, image.shape[1]))
# Detecting all the regions in the image that has a person inside it
#(regions, _) = hog.detectMultiScale(image, winStride = (2,2), padding = (4, 4), scale = 1.1)
players = fbCascade.detectMultiScale(image, scaleFactor = 1.005, minSize=(20, 20), minNeighbors = 1)
image2 = rgb_img.copy()
# Drawing the regions in the Image
i=0
for (x, y, w, h) in players:
cv2.rectangle(image2, (x, y), (x + w, y + h), (0, 255, 0), 3)
currentbox = image2[y:y+h,x:x+w]
i+=1
fig,axs = plt.subplots(1, 2, figsize=(20,20))
axs[0].imshow(rgb_img)
axs[0].grid(False)
axs[1].imshow(image2)
axs[1].grid(False)
plt.show()
I spent a bit of time playing around with the parameters of the detector and I wasn't able to get much better results. I tried using the DefaultPeopleDetector and the Haar cascade classifier haarcascade_fullbody and I wasn't able to get the results I wanted. I think that all the audience in the background is potentially throwing off the detector.
Even though, detecting the players themselves is not part of the project (I was given a json file containing player bounding box coordinates), I still wanted to ensure that I had a succesful attempt at using the HOG detector. I tried a different image shown below that I thought would give me a succesful detection.After playing with the parameters for a few minutes, I found a combination that worked! I decided to generate the image using just the player bounding box (BB) and apply the K-means routine on the contents of that BB. and the results are shown below
I'll need to look more into ways of refining/automating the parameters of the detector function but I'm content with my progress on this so far.
#Render new image from URL
req = urllib.request.urlopen('https://i.pinimg.com/736x/73/f5/d6/73f5d6a847c9308f35864ffe2fa729c4.jpg')
arr = np.asarray(bytearray(req.read()), dtype=np.uint8)
bgr_img2 = cv2.imdecode(arr, -1) # 'Load it as it is'
rgb_img2 = cv2.cvtColor(bgr_img2, cv2.COLOR_BGR2RGB)
#Detecting humans with HOG
path2xml = r'C:\Users\vmurc\Documents\GitHub\opencv\data\haarcascades\haarcascade_fullbody.xml'
fbCascade = cv2.CascadeClassifier(path2xml)
# Initializing the HOG person detector
image = cv2.cvtColor(rgb_img2, cv2.COLOR_RGB2GRAY)
# Resizing the Image
image = imutils.resize(image, width = min(50000, image.shape[1]))
# Detecting all the regions in the image that has a person inside it
players = fbCascade.detectMultiScale(image, scaleFactor = 1.01, minSize=(300, 300), minNeighbors = 1)
image2 = rgb_img2.copy()
# Drawing the regions in the Image
i = 0
for (x, y, w, h) in players:
cv2.rectangle(image2, (x, y), (x + w, y + h), (0, 255, 0), 3)
currentbox = image2[y:y+h,x:x+w]
i += 1
i = 0
img_list = [rgb_img2, image2, currentbox]
fig,axs = plt.subplots(1, 3, figsize=(20,20))
for img in img_list:
plt.style.use('ggplot')
axs[i].grid(False)
axs[i].imshow(img)
i+=1
plt.show()
#Apply K-Means function on the bounding box with K = 4
nClusters = 4
rgb_array = KMeansTest(currentbox, nClusters)
plotKMeansResult(nClusters,rgb_array)
Working with Videos and Extracting frames
Having familiarized myself with a variety of image manipulation/processing techniques and had a good understanding of how K-Means could be implemented to extract the dominant colors in an image, I decided to start processing the video footage since I was confident that I had the basis for developing a functional routine to deterimine the jersey color from images. The first thing that needs to be done is get the video files which I can do with the following routine:
#Establish paths to MP4 data
rMP4Path = '\game_1779\LCamera' #Relative path containing the MP4 data from L Camera
lMP4Path = '\game_1779\RCamera' #Relative path containing the MP4 data from R Camera
def getListOfFiles(rPath , fType):
"""
Args:
rPath: (str) path to file
fType: (str) type of file to look for (i.e., .mp4, .json, etc.)
Returns:
lFiles: (list) List of files in rPath of type fType
"""
#1. Establish the current working directory
directory = os.getcwd()
#2. List all files in rPath of type fType
lFiles = glob.glob(directory + rPath + "\*" + fType)
return lFiles
rc_mp4s = getListOfFiles(rMP4Path , ".mp4")
n_mp4_RC = len(rc_mp4s)
print("There are " + str(n_mp4_RC) + " MP4 files for R Camera")
print(rc_mp4s)
There are 19 MP4 files for R Camera ['C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_001.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_002.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_003.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_004.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_005.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_006.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_007.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_008.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_009.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_010.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_011.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_012.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_013.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_014.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_015.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_016.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_017.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_018.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\LCamera\\video_CAMB_CAMCAMBA_20180727_133419_019.MP4']
lc_mp4s = getListOfFiles(lMP4Path , ".mp4")
n_mp4_LC = len(lc_mp4s)
print("There are " + str(n_mp4_RC) + " MP4 files for L Camera")
print(lc_mp4s)
There are 19 MP4 files for L Camera ['C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_001.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_002.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_003.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_004.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_005.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_006.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_007.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_008.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_009.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_010.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_011.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_012.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_013.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_014.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_015.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_016.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_017.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_018.MP4', 'C:\\Users\\vmurc\\Documents\\Python Scripts\\Jupyter Notebooks\\game_1779\\RCamera\\video_CAMB_CAMCAMBB_20180727_133418_019.MP4']
def get_frame(video_file, frame_index):
"""
Args:
video_file: (str) path to .MP4 video file
frame_index: (int) query frame index
Returns:
frame: (ndarray, size (y, x, 3)) video frame
Uses OpenCV BGR channels
"""
video_capture = cv2.VideoCapture(video_file)
video_capture.set(cv2.CAP_PROP_POS_FRAMES, frame_index)
success, frame = video_capture.read()
if not success:
raise ValueError(
"Couldn't retrieve frame {0} from video {1}".format(
frame_index,
video_file
)
)
return frame
#Get frame 2500 for the first mp4 video in each directory
whichFrame = 2500
lc_frame = get_frame(rc_mp4s[0], whichFrame)
rc_frame = get_frame(lc_mp4s[0], whichFrame)
#Convert color from BGR to RGB
lc_frame = cv2.cvtColor(lc_frame, cv2.COLOR_BGR2RGB)
rc_frame = cv2.cvtColor(rc_frame, cv2.COLOR_BGR2RGB)
lc_rc = [lc_frame,rc_frame]
fig,axs = plt.subplots(1, 2, figsize=(20,20))
i = 0
for frame in lc_rc:
plt.style.use('ggplot')
axs[i].grid(False)
axs[i].imshow(frame)
i += 1
plt.show()
One more thing that I want to extract from the video is the number of frames in it. This can be done using the following routine:
#Determine number of frames in video
def count_frames(video_file):
"""
Args:
video_file: (str) path to .MP4 video file
Returns:
nFrames: (int) Number of frames in mp4
"""
cap = cv2.VideoCapture(video_file)
length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
return(length)
nfRC = count_frames(rc_mp4s[0])
nfLC = count_frames(lc_mp4s[0])
print(nfRC, nfLC)
7230 7230
Let's also find out if all the videos have the same number of frames
for i in range(len(rc_mp4s)):
nfRC = count_frames(rc_mp4s[i])
nfLC = count_frames(lc_mp4s[i])
print(i+1, nfRC, nfLC)
1 7230 7230 2 7200 7200 3 7200 7200 4 7200 7200 5 7200 7200 6 7200 7200 7 7200 7200 8 7200 7200 9 7200 7200 10 7200 7200 11 7200 7200 12 7200 7200 13 7200 7200 14 7200 7200 15 7200 7200 16 7200 7200 17 7200 7200 18 7200 7200 19 6314 6228
The different mp4 files have different number of frames. That's good to check as we build the processing routine.
Loading the JSON files and Checking Bounding Boxes
The JSON files contained player bounding box coordinates. The first thing that needs to be done is load the json files which can be done using the routine I made earlier
#Generate pandas dataframe containing player jersey colors from mp4 and json data provided by Trace
directory = os.getcwd()
jsonPath = '\game_1779\object_detector'
jsonList = getListOfFiles(jsonPath , ".json")
n_json = len(jsonList)
print("There are " + str(n_json) + " json files.")
print([s.replace(directory + jsonPath + '\\', '') for s in jsonList]) #Remove directory when printing
There are 38 json files. ['video_CAMB_CAMCAMBA_20180727_133419_001.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_002.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_003.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_004.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_005.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_006.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_007.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_008.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_009.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_010.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_011.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_012.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_013.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_014.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_015.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_016.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_017.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_018.MP4.json', 'video_CAMB_CAMCAMBA_20180727_133419_019.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_001.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_002.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_003.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_004.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_005.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_006.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_007.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_008.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_009.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_010.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_011.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_012.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_013.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_014.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_015.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_016.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_017.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_018.MP4.json', 'video_CAMB_CAMCAMBB_20180727_133418_019.MP4.json']
The json files are associated with an MP4 file, so I'll have to make sure that when I'm batch processing all the files, to ensure that the right json file is being paired with the right MP4 file. To do this, I'll first strip the paths and the .json extension from the list of filenames I had generated earlier and place that result into a list called json_strip
json_strip = [s.replace(directory + jsonPath + '\\', '') for s in jsonList]
json_strip = [s.replace(".json", '') for s in json_strip]
print(len(json_strip))
print(json_strip)
38 ['video_CAMB_CAMCAMBA_20180727_133419_001.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_002.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_003.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_004.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_005.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_006.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_007.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_008.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_009.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_010.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_011.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_012.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_013.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_014.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_015.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_016.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_017.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_018.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_019.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_001.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_002.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_003.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_004.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_005.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_006.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_007.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_008.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_009.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_010.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_011.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_012.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_013.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_014.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_015.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_016.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_017.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_018.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_019.MP4']
Then, I'll strip the paths from the list of MP4 files for each camera (LCAMERA and RCAMERA) I had generated earlier and place that result into two lists called mp4_strip_LC
and mp4_strip_RC
respectively.
mp4_strip_LC = [s.replace(directory + lMP4Path + '\\', '') for s in lc_mp4s]
print(mp4_strip_LC)
['video_CAMB_CAMCAMBB_20180727_133418_001.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_002.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_003.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_004.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_005.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_006.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_007.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_008.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_009.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_010.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_011.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_012.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_013.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_014.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_015.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_016.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_017.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_018.MP4', 'video_CAMB_CAMCAMBB_20180727_133418_019.MP4']
mp4_strip_RC = [s.replace(directory + rMP4Path + '\\', '') for s in rc_mp4s]
print(mp4_strip_RC)
['video_CAMB_CAMCAMBA_20180727_133419_001.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_002.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_003.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_004.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_005.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_006.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_007.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_008.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_009.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_010.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_011.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_012.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_013.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_014.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_015.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_016.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_017.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_018.MP4', 'video_CAMB_CAMCAMBA_20180727_133419_019.MP4']
As a sanity check, I'll use a list comprehension to ensure that the stripping process does indeed result in a file match.
#Check that there is an mp4 from LCamera associated with json file
mp4Name = mp4_strip_LC[0]
print(mp4Name)
matches = [match for match in json_strip if mp4Name in match]
print(matches)
video_CAMB_CAMCAMBB_20180727_133418_001.MP4 ['video_CAMB_CAMCAMBB_20180727_133418_001.MP4']
#Check that there is an mp4 from RCamera associated with json file
mp4Name = mp4_strip_RC[0]
print(mp4Name)
matches = [match for match in json_strip if mp4Name in match]
print(matches)
video_CAMB_CAMCAMBA_20180727_133419_001.MP4 ['video_CAMB_CAMCAMBA_20180727_133419_001.MP4']
Yay! It works! Finally, I'll use those lists to get the index associated with those files from each camera.
#Get the index of the json file containing the data for the [0] mp4 file from LCamera
mp4Name = mp4_strip_LC[0]
json_strip.index(mp4Name)
19
#Get the index of the json file containing the data for the [0] mp4 file from RCamera
mp4Name = mp4_strip_RC[0]
json_strip.index(mp4Name)
0
Awesome! Now, I'll put these code snippets into a routine that'll allow me to readily match json files to MP4 files with a single function! This will be important for batch processing later.
def matchJSON2MP4(jsonList, jsonPath, MP4list, MP4Path, whichMP4):
json_strip = [s.replace(directory + jsonPath + '\\', '') for s in jsonList]
json_strip = [s.replace(".json", '') for s in json_strip]
mp4_strip = [s.replace(directory + MP4Path + '\\', '') for s in MP4list]
mp4Name = mp4_strip[whichMP4]
index = json_strip.index(mp4Name)
print(index)
return index
whichMP4 = 0
jval_RC = matchJSON2MP4(jsonList, jsonPath, rc_mp4s, rMP4Path, whichMP4)
jval_LC = matchJSON2MP4(jsonList, jsonPath, lc_mp4s, lMP4Path, whichMP4)
0 19
With the framework above I can start reading the json files and start getting the bound box info. To do this, I'll generate a dictionary that will have the coordinates of each player bounding box (detection) for each frame in the video being processed
#Get dictionary from json file
def read_json_dict(path2json):
"""
Args:
path2json: (str) path to .MP4 json file containing player bounding boxes
Returns:
bb_dict: (dict) Dictionary containing bounding boxes in each frame
"""
# Opening JSON file
f = open(path2json)
# Returns JSON object as a dictionary
bb_dict = json.load(f)
f.close()
return(bb_dict)
bb_dict_LC = read_json_dict(jsonList[jval_LC])#This is for first video in the LCamera folder
bb_dict_RC = read_json_dict(jsonList[jval_RC])#This is for first video in the RCamera folder
#print(bb_dict)
The above code gives me the bounding boxes for every frame in the current video.
Next, I'll determine how many bounding boxes there are in a given frame
#Determine number of bounding boxes in frame
def count_bboxes(bb_dict,frame_index):
"""
Args:
bb_dict: (dict) dictionary from json file
frame: (int) what frame is being processed
Returns:
nDetections: (int) Number of bounding boxes in frame
"""
bbs = bb_dict['frames'][frame_index]['detections']
nDetections = len(bbs)
#print(nDetections, " bounding boxes found in frame ", frame_index)
return(nDetections)
whichFrame = 0
lc_n_bbs = count_bboxes(bb_dict_LC,whichFrame)
rc_n_bbs = count_bboxes(bb_dict_RC,whichFrame)
print(lc_n_bbs, rc_n_bbs)
1 1
Next, I'll determine which is the first frame that contains detections
#Find first frame that contains detections
def findFirstFrame(bb_dict):
"""
Args:
bb_dict: (dict) dictionary from json file
Returns:
firstFrame: (int) First frame to process in video
"""
firstFrame = bb_dict['frames'][0]['frame_index']
print('These is the first frame to process in video ', firstFrame)
return(firstFrame)
firstFrame_LC = findFirstFrame(bb_dict_LC)
firstFrame_RC = findFirstFrame(bb_dict_RC)
print(firstFrame_LC, firstFrame_RC)
These is the first frame to process in video 0 These is the first frame to process in video 62 0 62
Next, the detections are done for frame_index values that may be different for different videos. Let's figure out what the detection collection interval is for a video based on the json file.
#Find first frame that contains detections
def findFrameSpacing(bb_dict):
"""
Args:
bb_dict: (dict) dictionary from json file
Returns:
spacing: (int) Spacing between frames in json
"""
frame0 = bb_dict['frames'][0]['frame_index']
frame1 = bb_dict['frames'][1]['frame_index']
spacing = abs(frame1 - frame0)
print('The frame spacing is ', spacing)
return(spacing)
frameSpacing_LC = findFrameSpacing(bb_dict_LC)
frameSpacing_RC = findFrameSpacing(bb_dict_RC)
print(frameSpacing_LC, frameSpacing_RC)
The frame spacing is 6 The frame spacing is 6 6 6
Next, let's extract all the bounding box coordinates for the currennt frame from the json file
#Extract bounding boxes for a given frame from json
def get_bb4frame(bb_dict,frame_index):
"""
Args:
bb_dict: (dict) dictionary from json file
frame: (int) what frame is being processed
Returns:
nDetections: (int) Number of bounding boxes in frame
"""
bbs = bb_dict['frames'][frame_index]['detections']
#print('These are the coordinates for all bounding boxes in frame', frame_index)
#print(bbs)
return(bbs)
whichFrame = 150
bbs_Frame_LC = get_bb4frame(bb_dict_LC,whichFrame) #BB coordinates for current frame
bbs_Frame_RC = get_bb4frame(bb_dict_RC,whichFrame) #BB coordinates for current frame
print(bbs_Frame_LC)
print(bbs_Frame_RC)
[[1384, 413, 1420, 479, 0.73], [623, 293, 674, 391, 1.0], [940, 304, 976, 379, 0.94], [914, 276, 935, 333, 0.86], [622, 211, 656, 266, 0.94], [482, 202, 522, 265, 0.89], [534, 206, 555, 263, 0.62], [376, 189, 407, 252, 0.82], [558, 196, 582, 252, 0.93], [735, 202, 751, 246, 0.62], [112, 164, 144, 225, 0.92], [439, 173, 462, 223, 0.98], [363, 162, 384, 210, 0.96], [315, 153, 333, 196, 0.97]] [[614, 675, 635, 724, 0.9], [1521, 400, 1555, 475, 0.97], [1756, 364, 1800, 460, 1.0], [1786, 333, 1818, 406, 0.95], [2630, 284, 2677, 401, 0.99], [1740, 301, 1760, 346, 0.89], [2093, 262, 2111, 320, 0.97], [1872, 271, 1890, 315, 0.94], [2222, 254, 2261, 312, 0.74], [2347, 226, 2379, 300, 0.99], [2459, 213, 2489, 289, 0.89], [2493, 212, 2525, 279, 0.9], [2521, 187, 2553, 261, 0.74], [2592, 190, 2624, 261, 0.98], [2326, 208, 2344, 256, 0.8], [2271, 210, 2291, 253, 0.96], [2402, 199, 2427, 251, 0.94]]
Finally, let's extract the bounding box coordinates for a specific bounding box from the json file
#Extract bounding box coordinates for a specific bounding box in current frame from json
def makeRectangleFromJSON(bb_dict,whichBB):
"""
Args:
bb_dict: (dict) dictionary from json file
whichBB: (int) what bounding box is being processed
Returns:
x1 ,y1 ,x2 ,y2: (tuple) tuple containing pixel coordinates for the upper-left and lower-right corners of the bounding box
"""
x1 ,y1 ,x2 ,y2 = bb_dict[whichBB][0],bb_dict[whichBB][1],bb_dict[whichBB][2],bb_dict[whichBB][3]
#print(x1 ,y1 ,x2 ,y2, ' These are the coordinates for bounding box ', whichBB)
return(x1 ,y1 ,x2 ,y2)
whichBB = 0
x1L ,y1L ,x2L ,y2L = makeRectangleFromJSON(bbs_Frame_LC,whichBB) #BB coordinates for current BB
x1R ,y1R ,x2R ,y2R = makeRectangleFromJSON(bbs_Frame_RC,whichBB) #BB coordinates for current BB
print(x1L ,y1L ,x2L ,y2L)
print(x1R ,y1R ,x2R ,y2R)
1384 413 1420 479 614 675 635 724
Let's see if our method is working by visualizing the bounding boxes!
#Make the list of json files
jsonPath = '\game_1779\object_detector'
jsonList = getListOfFiles(jsonPath , ".json")
#Establish paths to MP4 data
rMP4Path = '\game_1779\LCamera' #Relative path containing the MP4 data from L Camera
lMP4Path = '\game_1779\RCamera' #Relative path containing the MP4 data from R Camera
#Make the list of mp4 files from each camera
rc_mp4s = getListOfFiles(rMP4Path , ".mp4")
lc_mp4s = getListOfFiles(lMP4Path , ".mp4")
#Find the json file to use for the current video
whichVideo = 0
jval_RC = matchJSON2MP4(jsonList, jsonPath, rc_mp4s, rMP4Path, whichVideo)
jval_LC = matchJSON2MP4(jsonList, jsonPath, lc_mp4s, lMP4Path, whichVideo)
print(jval_RC,jval_LC)
#Get json dictionary of all bounding boxes in video
bb_dict_LC = read_json_dict(jsonList[jval_LC])#This is for first video in the LCamera folder
bb_dict_RC = read_json_dict(jsonList[jval_RC])#This is for first video in the RCamera folder
#Find first frame with detections
firstFrame_LC = findFirstFrame(bb_dict_LC)
firstFrame_RC = findFirstFrame(bb_dict_RC)
print(firstFrame_LC,firstFrame_RC)
#Determine frame spacing
frameSpacing_LC = findFrameSpacing(bb_dict_LC)
frameSpacing_RC = findFrameSpacing(bb_dict_RC)
#Which frame to look at
whichFrame = 300
#Get a frame from a video --> The second input needs to be adjusted by the first
#frame with detections and the spacing to get the right data match
lc_frame = get_frame(lc_mp4s[whichVideo], firstFrame_LC + whichFrame*frameSpacing_LC)
rc_frame = get_frame(rc_mp4s[whichVideo], firstFrame_RC + whichFrame*frameSpacing_RC)
#Convert color from BGR to RGB
lc_frame = cv2.cvtColor(lc_frame, cv2.COLOR_BGR2RGB)
rc_frame = cv2.cvtColor(rc_frame, cv2.COLOR_BGR2RGB)
#Make a copy of the frame to store for display of all the bounding boxes
rc_frame_copy = rc_frame.copy()
lc_frame_copy = lc_frame.copy()
lc_rc = [lc_frame,rc_frame]
fig,axs = plt.subplots(1, 2, figsize=(20,20))
i = 0
for frame in lc_rc:
plt.style.use('ggplot')
axs[i].grid(False)
axs[i].imshow(frame)
i += 1
#Determine number of bounding boxes in current frame
lc_n_bbs = count_bboxes(bb_dict_LC,whichFrame)
rc_n_bbs = count_bboxes(bb_dict_RC,whichFrame)
print(lc_n_bbs,rc_n_bbs)
#Get BB coordinates for current frame
bbs_Frame_LC = get_bb4frame(bb_dict_LC,whichFrame) #BB coordinates for current frame
bbs_Frame_RC = get_bb4frame(bb_dict_RC,whichFrame) #BB coordinates for current frame
#Plot the individual bounding boxes from each frame for RCamera
fig, axs = plt.subplots(1, rc_n_bbs, figsize=(15,15))
#Loop over bounding boxes in current frame
i = 0
for bb in range(rc_n_bbs): #RCamera
#Get coordinates for current BB
x1R ,y1R ,x2R ,y2R = makeRectangleFromJSON(bbs_Frame_RC,bb) #BB coordinates for current BB
currentbox = rc_frame[y1R:y2R,x1R:x2R]
cv2.rectangle(rc_frame_copy, (x1R, y1R), (x2R, y2R), (0, 0, 255), 2)
plt.style.use('ggplot')
if rc_n_bbs > 1:
axs[bb].imshow(currentbox)
else:
axs.imshow(currentbox)
#Plot the individual bounding boxes from each frame for LCamera
fig,axs = plt.subplots(1, lc_n_bbs, figsize=(15,15))
i = 0
for bb in range(lc_n_bbs): #LCamera
#Get coordinates for current BB
x1L ,y1L ,x2L ,y2L = makeRectangleFromJSON(bbs_Frame_LC,bb) #BB coordinates for current BB
currentbox = lc_frame[y1L:y2L,x1L:x2L]
cv2.rectangle(lc_frame_copy, (x1L, y1L), (x2L, y2L), (0, 0, 255), 4)
plt.style.use('ggplot')
if lc_n_bbs > 1:
axs[bb].imshow(currentbox)
else:
axs.imshow(currentbox)
#Plot frames with bounding boxes drawn in
fig,axs = plt.subplots(1, 2, figsize=(20,20))
frame_w_bbs = [lc_frame_copy, rc_frame_copy]
i = 0
for frame in frame_w_bbs:
plt.style.use('ggplot')
axs[i].grid(False)
axs[i].imshow(frame)
i += 1
plt.show()
0 19 0 19 These is the first frame to process in video 0 These is the first frame to process in video 62 0 62 The frame spacing is 6 The frame spacing is 6 24 14
Awesome! The methods so far allow me to succesfully extract the player bounding boxes. A few things that can be seen from some of the bounding boxes. First, there are instances of false positives. False positives in this data entail bounding boxes with no players from it. This is something that will need to be addressed in the future.
Applying KMeans Clustering to Bounding Boxes from JSON
Now let's try to apply the KMeans clustering routine on the bounding boxes and see what happens. I'll stick to processing the same video and frame I've been using so far just so I can focus on the clustering itself for a bit.
#Apply K-Means function on the bounding box with K = 4
nClusters = 4
#Plot the individual bounding boxes from each frame for RCamera
fig, axs = plt.subplots(1, rc_n_bbs, figsize=(15,15))
#Loop over bounding boxes in current frame
i = 0
for bb in range(rc_n_bbs): #RCamera
#Get coordinates for current BB
x1R ,y1R ,x2R ,y2R = makeRectangleFromJSON(bbs_Frame_RC,bb) #BB coordinates for current BB
currentbox = rc_frame[y1R:y2R,x1R:x2R]
cv2.rectangle(rc_frame_copy, (x1R, y1R), (x2R, y2R), (0, 0, 255), 2)
plt.style.use('ggplot')
if rc_n_bbs > 1:
axs[bb].imshow(currentbox)
else:
axs.imshow(currentbox)
#Plot the individual bounding boxes from each frame for LCamera
fig,axs = plt.subplots(1, lc_n_bbs, figsize=(15,15))
i = 0
for bb in range(lc_n_bbs): #LCamera
#Get coordinates for current BB
x1L ,y1L ,x2L ,y2L = makeRectangleFromJSON(bbs_Frame_LC,bb) #BB coordinates for current BB
currentbox = lc_frame[y1L:y2L,x1L:x2L]
cv2.rectangle(lc_frame_copy, (x1L, y1L), (x2L, y2L), (0, 0, 255), 4)
plt.style.use('ggplot')
if lc_n_bbs > 1:
axs[bb].imshow(currentbox)
else:
axs.imshow(currentbox)
plt.show()
print("******K-Means Results for BBs from R Camera*************")
#Do K Means clustering. The KMeans could and visualize results
for bb in range(rc_n_bbs): #LCamera
#Get coordinates for current BB
x1R ,y1R ,x2R ,y2R = makeRectangleFromJSON(bbs_Frame_RC,bb) #BB coordinates for current BB
currentbox = rc_frame[y1R:y2R,x1R:x2R]
rgb_array = KMeansTest(currentbox, nClusters)
print('KM Results for BB ' + str(bb) + ' RCamera')
plotKMeansResult(nClusters,rgb_array)
print("******K-Means Results for BBs from L Camera*************")
#Do K Means clustering. The KMeans could and visualize results
for bb in range(lc_n_bbs): #LCamera
#Get coordinates for current BB
x1L ,y1L ,x2L ,y2L = makeRectangleFromJSON(bbs_Frame_LC,bb) #BB coordinates for current BB
currentbox = lc_frame[y1L:y2L,x1L:x2L]
rgb_array = KMeansTest(currentbox, nClusters)
print('KM Results for BB ' + str(bb) + ' LCamera')
plotKMeansResult(nClusters, rgb_array)
******K-Means Results for BBs from R Camera************* KM Results for BB 0 RCamera
KM Results for BB 1 RCamera
KM Results for BB 2 RCamera
KM Results for BB 3 RCamera
KM Results for BB 4 RCamera
KM Results for BB 5 RCamera
KM Results for BB 6 RCamera
KM Results for BB 7 RCamera
KM Results for BB 8 RCamera
KM Results for BB 9 RCamera
KM Results for BB 10 RCamera
KM Results for BB 11 RCamera
KM Results for BB 12 RCamera
KM Results for BB 13 RCamera
******K-Means Results for BBs from L Camera************* KM Results for BB 0 LCamera
KM Results for BB 1 LCamera
KM Results for BB 2 LCamera
KM Results for BB 3 LCamera
KM Results for BB 4 LCamera
KM Results for BB 5 LCamera
KM Results for BB 6 LCamera
KM Results for BB 7 LCamera
KM Results for BB 8 LCamera
KM Results for BB 9 LCamera
KM Results for BB 10 LCamera
KM Results for BB 11 LCamera
KM Results for BB 12 LCamera
KM Results for BB 13 LCamera
KM Results for BB 14 LCamera
KM Results for BB 15 LCamera
KM Results for BB 16 LCamera
KM Results for BB 17 LCamera
KM Results for BB 18 LCamera
KM Results for BB 19 LCamera
KM Results for BB 20 LCamera
KM Results for BB 21 LCamera
KM Results for BB 22 LCamera
KM Results for BB 23 LCamera
Applying Mask To Image Data
One of the main takeaways from the results above is that the green is the dominant color in all bounding boxes. The green hues mostly come from the presence of the grass in the field. This is where masking will come into play. My approach involves setting up low and high threshold values for each of the 3 values that make up the HSV color space (i.e. Hue, Saturation and Brightness). If a color falls within the scope of this threshold then it will be masked out.
In addition to this, I added a bit of error handling for cases where the masking process removed too many pixels. The clustering routine requires the image to be processed to have at least as many unique pixels as there are clusters. Therefore, if the resulting masked iamge has dimensions lower than desired number of clusters, the image would be ignored. This situation is likely to occur in cases where the bounding box only has field
def KMeansMaskGreen(img, clusters, lowHue, highHue, lowSat, highSat, loBright, hiBright):
"""
Args:
path2img : (str) path to cropped player bounding box
clusters : (int) how many clusters to use for KMEANS
Returns:
rgb_array : (tuple) Dominant colors in image in RGB format
"""
org_img = img.copy()
#print('Org image shape --> ',img.shape)
green = np.array([60,25,25])
loGreen = np.array([lowHue, lowSat, loBright]) #low green threshold
hiGreen = np.array([highHue, highSat, hiBright]) #Upper green threshold
#Convert image to HSV
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
#Make the mask
mask = cv2.inRange(hsv, loGreen, hiGreen)
mask_img = img.copy()
mask_img[mask==255] = (255,255,255)
#Remove white pixels from image so that they don't interfere with the process
mask_img = mask_img[np.all(mask_img != 255 , axis=-1)]
#Convert image into a 1D array
flat_img = np.reshape(mask_img,(-1,3))
arrayLen = flat_img.shape
#Ensure that masking didn't remove everything (Generally happens in false positives)
if mask_img.shape[0] <= clusters:
#print('Cropped image has dimensions lower than number of desired clusters.Not clustering current image')
rgb_array = np.empty((clusters,3,))
rgb_array[:] = np.nan
return rgb_array
else:
rgb_array = []
#Do the clustering
kmeans = KMeans(n_clusters = clusters, random_state=0, tol = 1e-4)
kmeans.fit(flat_img)
#Define the array with centroids
dominant_colors = np.array(kmeans.cluster_centers_,dtype='uint')
#Calculate percentages
percentages = (np.unique(kmeans.labels_,return_counts=True)[1])/flat_img.shape[0]
#Combine centroids representing dominant colors and percentages
#associated with each centroid into an array
pc = list(zip(percentages,dominant_colors))
pc = sorted(reversed(pc), reverse = True, key = lambda x: x[0])
i = 0
for i in range(clusters):
#dummy_array = pc[i][1]
rgb_array.append(pc[i][1])
i += 1
return rgb_array
def plotKMeansResult2(nClusters,rgb_array, clustered_image, mode = 1):
"""
Args:
rgb_array : (tuple) Dominant colors in image in RGB format
nClusters : (int) how many clusters were used for KMEANS
"""
i = 0
if mode == 1: #This display mode shows only the clustered colors
fig,axs = plt.subplots(1, nClusters, figsize=(20,20))
for i in range(nClusters):
color_of_pix = np.zeros((5, 5, 3), np.uint8)
color_of_pix[:] = [rgb_array[i][0], rgb_array[i][1], rgb_array[i][2]]
axs[i].grid(False)
axs[i].imshow(color_of_pix)
i += 1
elif mode == 2: #This display mode shows the clustered image and the clustered colors
fig,axs = plt.subplots(1, nClusters + 1, figsize=(5,5))
for i in range(nClusters):
color_of_pix = np.zeros((5, 5, 3), np.uint8)
color_of_pix[:] = [rgb_array[i][0], rgb_array[i][1], rgb_array[i][2]]
axs[0].grid(False)
axs[0].imshow(clustered_image)
axs[i+1].imshow(color_of_pix)
axs[i+1].grid(False)
i += 1
else:
print('Invalid display mode. mode must equal 1 or 2')
plt.show()
nClusters = 4
lowHue = 20
highHue = 90
lowSat = 20
highSat = 255
loBright = 20
hiBright = 255
print("******K-Means Results for BBs from R Camera*************")
#Do K Means clustering. The KMeans could and visualize results
for bb in range(rc_n_bbs): #LCamera
#Get coordinates for current BB
x1R ,y1R ,x2R ,y2R = makeRectangleFromJSON(bbs_Frame_RC,bb) #BB coordinates for current BB
currentbox = rc_frame[y1R:y2R,x1R:x2R]
rgb_array = KMeansMaskGreen(currentbox, nClusters,
lowHue, highHue,
lowSat, highSat,
loBright, hiBright)
print('KM Results for BB ' + str(bb) + ' RCamera')
plotKMeansResult2(nClusters, rgb_array, currentbox, mode = 2)
******K-Means Results for BBs from R Camera************* KM Results for BB 0 RCamera
KM Results for BB 1 RCamera
KM Results for BB 2 RCamera
KM Results for BB 3 RCamera
KM Results for BB 4 RCamera
KM Results for BB 5 RCamera
KM Results for BB 6 RCamera
KM Results for BB 7 RCamera
KM Results for BB 8 RCamera
KM Results for BB 9 RCamera
KM Results for BB 10 RCamera
KM Results for BB 11 RCamera
KM Results for BB 12 RCamera
KM Results for BB 13 RCamera
Using a mask to remove the green colors has helped quite a bit in improving the color detection routine!
Remove Bottom Half of Bounding Box to Focus on Jersey Data
Since the assignment is to determine the jersey colors only, then cropping the bottom portion of the image could also help bolster the analysis since we can focus on the region that matters a bit more.
def crop_image(image,howMuch):
"""
Args:
img : (array) image of player bounding box
howMuch : (int) percent of image to crop (between 0 and 100)
Returns:
cropped_img : (array) cropped image
"""
val = howMuch/100
cropped_img = image[0:int(image.shape[0]*val),0:int(image.shape[0])]
return cropped_img
howMuch = 50
print("******K-Means Results for BBs from R Camera*************")
#Do K Means clustering. The KMeans could and visualize results
for bb in range(rc_n_bbs): #LCamera
#Get coordinates for current BB
x1R ,y1R ,x2R ,y2R = makeRectangleFromJSON(bbs_Frame_RC,bb) #BB coordinates for current BB
currentbox = rc_frame[y1R:y2R,x1R:x2R]
croped_bb = crop_image(currentbox,howMuch)
rgb_array = KMeansMaskGreen(croped_bb, nClusters,
lowHue, highHue,
lowSat, highSat,
loBright, hiBright)
print('KM Results for BB ' + str(bb) + ' RCamera')
plotKMeansResult2(nClusters, rgb_array, croped_bb, mode = 2)
******K-Means Results for BBs from R Camera************* KM Results for BB 0 RCamera
KM Results for BB 1 RCamera
KM Results for BB 2 RCamera
KM Results for BB 3 RCamera
KM Results for BB 4 RCamera
KM Results for BB 5 RCamera
KM Results for BB 6 RCamera
KM Results for BB 7 RCamera
KM Results for BB 8 RCamera
KM Results for BB 9 RCamera
KM Results for BB 10 RCamera
KM Results for BB 11 RCamera
KM Results for BB 12 RCamera
KM Results for BB 13 RCamera
Much better!
Processing Entire MP4 file
Let's try processing every frame and see what happens! I took all my routines so far and placed them into a wrapper function below. This wrapper function takes the path to the json files, the path to the mp4 files, the video to be processed, and the number of clusters for k-means as input. The output of this function is a pandas data frame containing the dominant colors in RGB format for each bounding box, in each frame for the current video.</p>def getJerseyColorsFromMP4(jsonPath,MP4Path,whichVideo,nClusters):
#Make the list of mp4 and json files from each camera
print('Retrieving MP4 and JSON files...')
mp4List = getListOfFiles(MP4Path , ".mp4")
jsonList = getListOfFiles(jsonPath , ".json")
#Find the json file to use for the current video
jval = matchJSON2MP4(jsonList, jsonPath, mp4List, MP4Path, whichVideo)
#Get json dictionary of all bounding boxes in video
bb_dict = read_json_dict(jsonList[jval])#This is for first video in the LCamera folder
#Find first frame with detections
firstFrame = findFirstFrame(bb_dict)
#Determine frame spacing
frameSpacing = findFrameSpacing(bb_dict)
#Which frame to look at
whichFrame = 0
whichFrameAdj = firstFrame + whichFrame*frameSpacing #Adjust for video data to match json detection
nf = int(count_frames(mp4List[whichVideo])/10) #Number of frames in video
print('Initializing arrays...')
#Initialize arrays
dom_color1 = []
dom_color2 = []
dom_color3 = []
frame_list = []
bb_list = []
video_list = []
#Insert loop here for frames
print('Starting jersey color detection ...')
while whichFrameAdj < nf:
for i in tqdm(range(nf), desc="Processing Frame"):#Add progress bar for frames processed
#Get a frame from video
frame = get_frame(mp4List[whichVideo], whichFrameAdj)
#Convert color from BGR to RGB
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
#Make a copy of the frame to store for display of all the bounding boxes
frame_copy = frame.copy()
#Determine number of bounding boxes in current frame
n_bbs = count_bboxes(bb_dict,whichFrame)
#Get BB coordinates for current frame
bbs_frame = get_bb4frame(bb_dict,whichFrame) #BB coordinates for current frame
#Loop over bounding boxes in current frame
for bb in range(n_bbs):
#print('****Frame ' + str(whichFrameAdj) + ' BB ' + str(bb) + '****')
frame_list.append(whichFrameAdj) #Append frame ID to list
bb_list.append(bb)
video_list.append(whichVideo)
x1 ,y1 ,x2 ,y2 = makeRectangleFromJSON(bbs_frame,bb) #Coordinates for current BB
currentbox = frame[y1:y2,x1:x2]
cv2.rectangle(frame_copy, (x1, y1), (x2, y2), (0, 0, 255), 2)
#Crop the bounding box
croped_bb = crop_image(currentbox,howMuch)
#Do the clustering
rgb_array = KMeansMaskGreen(croped_bb, nClusters,
lowHue, highHue, lowSat, highSat, loBright, hiBright)
#Append dominant RGB colors into respective arrays
dom_color1.append(rgb_array[0])
dom_color2.append(rgb_array[1])
dom_color3.append(rgb_array[2])
whichFrame += 1
whichFrameAdj = firstFrame + whichFrame*frameSpacing #Adjust for video data to match json
print('Making pandas dataframe containing results...')
jerseyColor_df = pd.DataFrame({'Video ID': video_list,
'Frame ID': frame_list,
'BB in Frame': bb_list,
'Jersey Color 1': dom_color1,
'Jersey Color 2': dom_color2,
'Jersey Color 3': dom_color3})
print('PROCESS COMPLETED')
return jerseyColor_df
jsonPath = '\game_1779\object_detector'#Establish paths to json files
MP4Path = '\game_1779\LCamera' #Establish paths to MP4 files
whichVideo = 0 #Which video to look at?
howMuch = 50 #How much to crop off the original bounding box height
nClusters = 4 #k-value for k-means clustering routine
#Masking parameters
lowHue = 20 #low hue value
highHue = 90 #High hue value
lowSat = 20 #low saturation value
highSat = 255 #High saturation value
loBright = 20 #low brightness value
hiBright = 255 #High brightness value
jerseyColor_df = getJerseyColorsFromMP4(jsonPath,MP4Path,whichVideo,nClusters)
Retrieving MP4 and JSON files... 0 These is the first frame to process in video 62 The frame spacing is 6 Initializing arrays... Starting jersey color detection ...
Processing Frame: 100%|██████████████████████████████████████████████████████████████| 723/723 [09:48<00:00, 1.23it/s]
Making pandas dataframe containing results... PROCESS COMPLETED
The function ran without hiccups it seems! In the example shown here, a total of 723 frames were processed in just under 10 minutes. The average frame processing rate was 1.23frame/s. This rate of course depends on how many bounding boxes are being processed for a given frame.
Let's take a look at the dataframe we made.
pd.set_option('display.max_rows', 10)
jerseyColor_df
Video ID | Frame ID | BB in Frame | Jersey Color 1 | Jersey Color 2 | Jersey Color 3 | |
---|---|---|---|---|---|---|
0 | 0 | 62 | 0 | [89, 86, 78] | [178, 171, 163] | [63, 58, 50] |
1 | 0 | 68 | 0 | [99, 94, 74] | [121, 113, 88] | [113, 108, 79] |
2 | 0 | 68 | 1 | [63, 7, 7] | [14, 9, 7] | [56, 42, 26] |
3 | 0 | 92 | 0 | [132, 131, 131] | [97, 96, 92] | [170, 172, 164] |
4 | 0 | 92 | 1 | [84, 90, 101] | [122, 126, 134] | [54, 53, 44] |
... | ... | ... | ... | ... | ... | ... |
9302 | 0 | 4394 | 19 | [54, 33, 10] | [84, 55, 31] | [106, 84, 56] |
9303 | 0 | 4394 | 20 | [136, 125, 85] | [82, 17, 15] | [99, 72, 54] |
9304 | 0 | 4394 | 21 | [131, 129, 108] | [93, 89, 97] | [152, 155, 165] |
9305 | 0 | 4394 | 22 | [60, 33, 15] | [136, 115, 86] | [110, 72, 59] |
9306 | 0 | 4394 | 23 | [40, 44, 57] | [6, 12, 18] | [141, 142, 147] |
9307 rows × 6 columns
The dataframe has the desired structure. As can be seen, a total of 9,307 bounding boxes were processed. This means that 15.8 bounding boxes can be processed per second. Now I'll remove any rows in the dataframe that contained arrays with NaNs (these are the arrays containing false positives).
#Remove nan detections from dataframe
test_df = jerseyColor_df[~jerseyColor_df.applymap(lambda x : np.isnan(x).any()).any(1)]
test_df
Video ID | Frame ID | BB in Frame | Jersey Color 1 | Jersey Color 2 | Jersey Color 3 | |
---|---|---|---|---|---|---|
0 | 0 | 62 | 0 | [89, 86, 78] | [178, 171, 163] | [63, 58, 50] |
1 | 0 | 68 | 0 | [99, 94, 74] | [121, 113, 88] | [113, 108, 79] |
2 | 0 | 68 | 1 | [63, 7, 7] | [14, 9, 7] | [56, 42, 26] |
3 | 0 | 92 | 0 | [132, 131, 131] | [97, 96, 92] | [170, 172, 164] |
4 | 0 | 92 | 1 | [84, 90, 101] | [122, 126, 134] | [54, 53, 44] |
... | ... | ... | ... | ... | ... | ... |
9302 | 0 | 4394 | 19 | [54, 33, 10] | [84, 55, 31] | [106, 84, 56] |
9303 | 0 | 4394 | 20 | [136, 125, 85] | [82, 17, 15] | [99, 72, 54] |
9304 | 0 | 4394 | 21 | [131, 129, 108] | [93, 89, 97] | [152, 155, 165] |
9305 | 0 | 4394 | 22 | [60, 33, 15] | [136, 115, 86] | [110, 72, 59] |
9306 | 0 | 4394 | 23 | [40, 44, 57] | [6, 12, 18] | [141, 142, 147] |
9026 rows × 6 columns
There were 281 false positives in the dataset processed so far which means that the classification process of player objects had is 97% accuracy which is quite good!. Let's now try to see what the top 5 colors in the currently processed MP4 frames are using the KMeans routine on the RGB columns from out dataframe.
jcrgb1 = test_df[['Jersey Color 1']].to_numpy()
jcrgb2 = test_df[['Jersey Color 2']].to_numpy()
jcrgb3 = test_df[['Jersey Color 3']].to_numpy()
jc_list = [jcrgb1,jcrgb2,jcrgb3]
title_List = ['Dominant Colors in Jersey Color 1',
'Dominant Colors in Jersey Color 2',
'Dominant Colors in Jersey Color 3']
clusters = 5
listElem = 0
#print(len(jcrgb1[259][0]))
for elem in jc_list:
totColor = []
rows = elem.shape[0] - 1
for i in range(rows-1):
#print(i,jcrgb1[i][0], len(jcrgb1[i][0]))
if len(elem[i][0]) != 3:
rows -= 1
continue
else:
totColor.append(elem[i][0])
#Do KMeans
kmeans = KMeans(n_clusters = clusters)
kmeans.fit(totColor)
#Define the array with centroids
dominant_colors = np.array(kmeans.cluster_centers_,dtype = 'uint')
#Calculate percentages
percentages = (np.unique(kmeans.labels_,return_counts=True)[1])/(rows+1)
pc = zip(percentages,dominant_colors)
pc = sorted(pc,reverse=True)
#Plotting utility
print(title_List[listElem])
block = np.ones((50,50,3),dtype='uint')
plt.figure(figsize=(12,8))
i=0
for i in range(clusters):
plt.subplot(1,clusters,i+1)
block[:] = pc[i][1]# we have done this to convert bgr(opencv) to rgb(matplotlib)
plt.imshow(block)
plt.xticks([])
plt.yticks([])
plt.xlabel(str(round(pc[i][0]*100,2))+'%')
bar = np.ones((50,500,3),dtype='uint')
plt.figure(figsize=(12,8))
listElem += 1
start = 0
i = 0
for p,c in pc:
end = start+int(p*bar.shape[1])
if i==clusters:
bar[:,start:] = c
else:
bar[:,start:end] = c
start = end
i+=1
plt.imshow(bar)
plt.xticks([])
plt.yticks([])
plt.title('Color Distribution')
plt.show()
Dominant Colors in Jersey Color 1
Dominant Colors in Jersey Color 2
Dominant Colors in Jersey Color 3
Nice! The clustering process seems to have worked quite well! The team jersey colors for this particular game were red and white. The visualization above shows that my algorithm is capable of doing this quite well! One more thing that would be interesting to incorporate into this routine is the ability to assess whether the player in a given bounding box belongs to Team 1 or Team 2. I have an idea to incorporate this, but I'll have to add this later.
I could now easily wrap this clustering routine within another function that will loop through every mp4 file in a directory. Then I could concatenate the dataframes from each video together to get the player jersey colors for an entire game duration!
def getJerseyColorsFromGame(jsonPath,MP4Path,whichVideo,nClusters):
mp4_list = getListOfFiles(MP4Path , ".mp4")
n_mp4 = len(mp4_list)
df_list = []
for vid in range(n_mp4):
jerseyColor_df = getJerseyColorsFromMP4(jsonPath,MP4Path,vid,nClusters)
df_list.append(jerseyColor_df)
allJerseyColors = pd.concat(df_list)
return allJerseyColors
Conclusions
The work shown above demonstrate how the K-Means clustering algorithm can be used to extract the jersey color from soccer game video footage. The results of the clustering process are in agreement with the expected output (i.e., dominant colors are shades of red and white while team jersey colors are red and white).