Sunday, August 26, 2018

Tracking a simple, marked object with OpenCV / Python - part III

In the two previous posts, two simple methods were shown (based on background detection and based on ArUco) for object tracking, but for my application, none of them are sufficient.
In the first part, I simply couldn't find the center of my object and in the second part, the motion blur caused too much visual noise.
In this final chapter, I show you a method which takes advantage of saturation and can track a small, strong light source even with high blur.

The basic idea is that the pure white color is relatively rare even in a well lighted image. To reach it, you have to cause saturation in your camera. This can be done by a strong light source (for example, with a strong LED). In this post, I will use the flash of my mobile phone.

We will start the following code:


#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np
import cv2 
 
cap = cv2.VideoCapture(1) # This is my USB camera. 
                          # If you want to use your built-in camera,
                          # you propably have to write here zero

np.set_printoptions(threshold=np.nan) # If I print an array, I would like to see it...

showOriginal = True
pause = False

while(True):
    if not pause:
        # Capture frame-by-frame
        ret, frame = cap.read()
        #print(frame.shape) #480x640
        # Our operations on the frame come here
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        ret, thres = cv2.threshold(gray, 253, 255, cv2.THRESH_BINARY) # I keep only the white pixels
        nonzeros = np.nonzero(thres)
        try:
            cx = int(nonzeros[1].mean()) # If we have an all-black image, this will cause ValueError
            cy = int(nonzeros[0].mean())
            print "The coords are:", cx, cy
        except ValueError:
            print "I can't find the object"
            cx, cy = 0,0
            valid = False
    
    if showOriginal:
        cv2.imshow('Main', frame)
    else:
        cv2.imshow('Main', thres)
        
    
    pressedKey = cv2.waitKey(1) & 0xFF  

    if pressedKey == ord('q'):
        break
    elif pressedKey == ord('s'):
        showOriginal = not showOriginal
    elif pressedKey == ord('p'):
        pause = not pause
        
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()


The important part of the code is the treshold, where I keep only the truly white pixels (between intensity of 253 and 255).

Lets see how it works. In the first picture, you can see the raw image, and the processed one:
You can see a mostly white image with a saturated light source. Around the light source, there is a white area. After the treshold, only the area of our light source remains white.
This looks fine, we can easily detect our object "marked" with the light source, we only have to find the mass center of the image. It's like our previous background-based algorithm, but it can handle the dynamic background and we can find easily the same central point. Sadly, we are not ready yet, because motion blur can cause us some more headache:
 
As you can see, the motion blur works here too, but because of our strong light source, we get strong, detectable boundaries. We only have to decide, where is the light source of the grabbed frame. We have three, less or more reasonable position for it:  
First, speak about the green position. It can be good for some applications and it's easy to calculate, but it's definitely not the current position of the object. If you would like to use it, you will measure the acceleration wrongly. (Remark: The green point is not the mass center.)

If you want to know the exact position (and acceleration) of the object, you have to select between the red or blue dot. Both of them can be valid. To select between them, you have to know the previous position and the moving characteristics of the object. Usually, the farther is the right chose, but sometimes, the closer can also be right. (For example, if the tracked object makes small, fast circles.)

In my application, the acceleration is not so important, but I need a simple, stable algorithm, so I will chose the green dot. (But finding the others are similar.)
For this, we have to reduce this long white stripe into a line and find its half point.

For this, we have to find the morphological skeleton of the stripe. Sadly, there is no built-in function in OpenCV for morphological skeleton, but it's not so hard to implement it (and there are plenty of codes on the internet). I used this and this.

Here is our new, modified code which can show us the skeleton, too:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np
import cv2 
 
cap = cv2.VideoCapture(1) # This is my USB camera. 
                          # If you want to use your built-in camera,
                          # you propably have to write here zero

np.set_printoptions(threshold=np.nan) # If I print an array, I would like to see it...

show = 0
pause = False

def getSkeleton(img):
    " The input is a binary image. "
    originalInput = img # I don't want to modify the original image
    img = img.copy()

    size = np.size(img)
    skel = np.zeros(img.shape,np.uint8)
    element = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
    done = False
    
    # I added an attitional step, because the original algorithm keeps parts of the contours
    # I make a dilation, so the got contours will be out of the original shape
    # After the last step, I can eliminate the contours by a simple logical or
    img = cv2.dilate(img, np.ones((3,3)), iterations = 5 )
    
    while True:
        eroded = cv2.erode(img,element)
        temp = cv2.dilate(eroded,element)
        temp = cv2.subtract(img,temp)
        skel = cv2.bitwise_or(skel,temp)
        img = eroded.copy()
        
        zeros = size - cv2.countNonZero(img)
        if zeros==size:
            break
    
    return cv2.bitwise_and(skel, originalInput)

while(True):
    if not pause:
        # Capture frame-by-frame
        ret, frame = cap.read()
        #print(frame.shape) #480x640
        # Our operations on the frame come here
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        ret, thres = cv2.threshold(gray, 253, 255, cv2.THRESH_BINARY) # Usually, it's easier to work with binary images
        thres = cv2.erode(thres, np.ones((3,3)), iterations = 10 ) # noise filtering
        skeleton = getSkeleton(thres)
    
    
    if show == 0:
        cv2.imshow('Main', frame)
    elif show == 1:
        cv2.imshow('Main', thres)
    elif show == 2:
        cv2.imshow('Main', skeleton)
    
    pressedKey = cv2.waitKey(1) & 0xFF  

    if pressedKey == ord('q'):
        break
    elif pressedKey == ord('s'):
        show = (show + 1) % 3
    elif pressedKey == ord('p'):
        pause = not pause
        
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

And here is an example:

We are almost done. We got a long line with some smaller branches. 

In our last step, we have to find its half point. Sadly, this can be pretty hard. If you want to find the accurate solution, you have to fit a function to all of the white pixels with LSE (last square error). There are built-in algorithms for this in numpy (polifit) or in scipy. After this, you have to find the half point of the spline and hope that you choose the right degree and the branches won't cause too much error. According to my experience, this solution is sensitive to its parameters and causes too high noise level.
For my application I need a low-noise, stable solution which is not sensitive to its parameters, so I can make a simplification.
As I mentioned before, the mass center of the threshold is not a valid solution. If you use it, you will get usable result only in linear movement. In circular movement, the mass center will be nearer to the rotation center than the tracked object. The following figure shows the problem:
Here, the red dot marks the searched point. The green dot is the mass center and the blue dot is the rotation center. If we just calculate the mass center during the rotation, we will get a lower speed.
So using the mass center as the position of the tracked object is a bad idea, but starting the search around the mass center can help us. 
The idea is that the searched position (the red dot) must be the element of the skeleton and it's near to the mass center. So if we find the nearest point to the mass center of the skeleton, we can use it as a solution.

Calculation the mass center is easy, you have to get all nonzero pixel from the threshold and calculate the mean of X and Y coordinates:


    ret, thres = cv2.threshold(gray, 253, 255, cv2.THRESH_BINARY) # Usually, it's easier to work with binary images
    thres = cv2.erode(thres, np.ones((3,3)), iterations = 10 ) # noise filtering
    nonzeros = np.nonzero(thres)
    x = nonzeros[0]
    y = nonzeros[1]
    if np.size(x)>0:
        cX = x.mean()
        cY = y.mean()

Finding the nearest point of the skeleton to the mass center is a little bit complicated. You can use two method for it:

  1. You can create an image with the mass center as the only white pixel, dilate it and check that the gotten image and the skeleton has any common white pixel (with a binary_and operator). You have to dilate until you find a common pixel. This can be the winner solution if the skeleton contains lots of white pixels.
  2. You can iterate through every pixel of the skeleton and calculate the distance from the mass center. In my typical use-case, this is the fastest solution, I will implement this.
The final source code can be found here:


#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np
import math
import cv2 
import matplotlib.pyplot as plt
 
cap = cv2.VideoCapture(1) # This is my USB camera. 
                          # If you want to use your built-in camera,
                          # you propably have to write here zero

np.set_printoptions(threshold=np.nan) # If I print an array, I would like to see it...

show = 0
pause = False

def getSkeleton(img):
    " The input is a binary image. "
    originalInput = img # I don't want to modify the original image
    img = img.copy()

    size = np.size(img)
    skel = np.zeros(img.shape,np.uint8)
    element = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
    done = False
    
    # I added an attitional step, because the original algorithm keeps parts of the contours
    # I make a dilation, so the got contours will be out of the original shape
    # After the last step, I can eliminate the contours by a simple logical or
    img = cv2.dilate(img, np.ones((3,3)), iterations = 5 )
    
    while True:
        eroded = cv2.erode(img,element)
        temp = cv2.dilate(eroded,element)
        temp = cv2.subtract(img,temp)
        skel = cv2.bitwise_or(skel,temp)
        img = eroded.copy()
        
        zeros = size - cv2.countNonZero(img)
        if zeros==size:
            break
    
    return cv2.bitwise_and(skel, originalInput)

def getMassCenter(img):
    nonzeros = np.nonzero(img)
    x = nonzeros[1]
    y = nonzeros[0]
    if np.size(x)>0:
        return int(x.mean()), int(y.mean())
    else:
        return None, None       
    
def getNonzeroPixels(img):
    nonzeros = np.nonzero(img)
    X = nonzeros[1]
    Y = nonzeros[0]
    for x, y in zip(X, Y):
        yield x, y 
    
def getDistanceSquare(a, b):
    return math.pow(a[0]-b[0], 2) + math.pow(a[1]-b[1], 2)
    
while(True):
    if not pause:
        # Capture frame-by-frame
        ret, frame = cap.read()
        #frame = cv2.imread("skeleton_1.png") 
        #print(frame.shape) #480x640
        # Our operations on the frame come here
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        ret, thres = cv2.threshold(gray, 253, 255, cv2.THRESH_BINARY) # Usually, it's easier to work with binary images
        thres = cv2.erode(thres, np.ones((3,3)), iterations = 10 ) # noise filtering
        massCenter = getMassCenter(thres)        
        skeleton = getSkeleton(thres)
        
        if massCenter[0] is not None:
            closestPoint = (0,0)
            closestDistanceSquare = 99999999
            
            for p in getNonzeroPixels(skeleton):
                distanceSquare = getDistanceSquare(p, massCenter)
                if distanceSquare < closestDistanceSquare:
                    closestDistanceSquare = distanceSquare
                    closestPoint = p
            
            cv2.circle(frame, massCenter, 1, (0,255,0), 10)
            cv2.circle(frame, closestPoint, 1, (255,0,0), 10)
            cv2.circle(thres, closestPoint, 1, (127,), 10)
            cv2.circle(skeleton, closestPoint, 1s, (127,), 10)
        
    if show == 0:
        cv2.imshow('Main', frame)
    elif show == 1:
        cv2.imshow('Main', thres)
    elif show == 2:
        cv2.imshow('Main', skeleton)
    
    pressedKey = cv2.waitKey(1) & 0xFF  

    if pressedKey == ord('q'):
        break
    elif pressedKey == ord('s'):
        show = (show + 1) % 3
    elif pressedKey == ord('p'):
        pause = not pause
    elif pressedKey == ord('a'):
        plt.plot(xs, np.polyval(poly, xs), 'g', lw=1)
        plt.show()
        
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()


A tried to decompose it into smaller functions, I hope the code is clear enough to understand the basic ideas behind it.

Lets see how it works. Here, you can see a real image and our previous sample image.

As you can see, my original figure what a little bit exaggerated about the distance between the mass center (green) and the searched point (blue), but it still causes errors in the right conditions.

If your environment contains too much noises, you maybe have to modify this program, use filtering, blurring, or other algorithm, but the basic idea will work if you can mark your object with a strong light source.

No comments:

Post a Comment