Sunday, August 26, 2018

Calculating the real-world position of the tracked object

In the previous posts, I showed a simple but robust object tracker algorithm which can track a robot with a cheap camera.
The problem is that usually the camera is not mounted on the ceiling, the moving plane of the robot is not parallel with the plane of the camera. This is called isometric perspective view.

With a tracker we can get the position of our robot in the coordinate system of the monitor.
This is useless in most real-life application, we have to convert these coordinates somehow into the real-world coordinates.

Theoretically, this is a very easy, basic task. We have two 2D coordinate systems (the monitor and the floor). We only need a transformation matrix and we have to multiply the gotten monitor coordinates of the robot with it.
Calculating the transformation matrix is also easy. It's teached in most universities in the first semester and OpenCV contains a function for it.

The cv2.getPerspectiveTransform(pts1, pts2) needs an array with four points in the floor coordinate system (pts1) and their coordinates in the monitor coordinate system (pts2). The return value is a 3x3 matrix (if you don't know, why we need a 3x3 matrix for a 2D transformation, but you are curious, read this.) The returned M matrix can be used by cv2.warpPerspective(img, M, (rows, cols)), which will transform your image into the floor coordinate system.

This is easy, it's not really worth a blog post. You only have one problem: how will you get your four points? The easiest solution is that you mark four points on the floor which you measured. You know their coordinates on the floor. You start your measure by marking manually/finding automatically these points on the image and you get their coordinates on the monitor.

The problem is, that placing the marks correctly on the floor is time-consuming and you must to be precise. Finding automatically on the camera frame these marks can be hard, because your algorithm must be precise. The manual finding is also problematic, because during a long measurement, the camera will move slightly and it will cause you an offset error. (The automatic algorithm must be fast because you have to run it periodically.)

In this blogpost, I will show you an easy method based on ArUco, which can solve easily this problem. I already wrote about ArUco, there we saw that ArUco can recognize its codes even from an isometric view. This is what we will exploit.

The aruco.detectMarkers function has three return values: the corners of found markers, the ID-s and the rejected points. We will use the corners as fix points in the floor coordinate system (we know its exact size) and in the monitor coordinate system.

We have only one technical problem to solve: we can't use warpPerspective, because we don't want to display or work with the transformed image, we will need only the floor coordinates of the tracked object, but it's easy to use the return value of the cv2.getPerspectiveTransform(pts1, pts2).
The details are on the manual page, but I will put here a simple code which do it:

def convertFromScreenToFloorCoordSystem(x, y, M):
    """ 
    x and y are in screen coordinates,
    M is the return value of getPerspectiveTransform
    """
    M_inv = np.linalg.inv(M)
    
    X = (M_inv[0,0]*x + M_inv[0,1]*y + M_inv[0,2]) / (M_inv[2,0]*x + M_inv[2,1]*y + M_inv[2,2])
    Y = (M_inv[1,0]*x + M_inv[1,1]*y + M_inv[1,2]) / (M_inv[2,0]*x + M_inv[2,1]*y + M_inv[2,2])
    
    return X, Y

 Of course if you would like to use this method for measures, you have to use as big QR code as you can to minimize the error.

And as a working example, here is a program, which calculates the floor coordinates of your mouse pointer if the camera sees the marker.


#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np
import cv2
import cv2.aruco as aruco
 
 
cap = cv2.VideoCapture(1)


mouseX = 0
mouseY = 0

def mouseHandler(event, x, y, flags, param):
    global mouseX,mouseY
    if event == cv2.EVENT_MOUSEMOVE:
        mouseX, mouseY = x, y

def convertFromScreenToFloorCoordSystem(x, y, M):
    """ 
    x and y are in screen coordinates,
    M is the return value of getPerspectiveTransform
    """
    M_inv = np.linalg.inv(M)
    
    X = (M_inv[0,0]*x + M_inv[0,1]*y + M_inv[0,2]) / (M_inv[2,0]*x + M_inv[2,1]*y + M_inv[2,2])
    Y = (M_inv[1,0]*x + M_inv[1,1]*y + M_inv[1,2]) / (M_inv[2,0]*x + M_inv[2,1]*y + M_inv[2,2])
    
    return X, Y


cv2.namedWindow('image')
cv2.setMouseCallback('image', mouseHandler)

while(True):
    ret, frame = cap.read()

    aruco_dict = aruco.Dictionary_get(aruco.DICT_6X6_250)
    parameters =  aruco.DetectorParameters_create()
    corners, ids, rejectedImgPoints = aruco.detectMarkers(frame, aruco_dict, parameters=parameters)
   
    if len(corners) > 0:
        # Draw it for debugging    
        frame = aruco.drawDetectedMarkers(frame, corners)
        floorCoords = np.float32([[20, 0], [20, 20], [0, 20], [0, 0]]) # My marker is 20cm X 20cm
        screenCoords = corners[0][0] # I use the first found marker
        
        M = cv2.getPerspectiveTransform(floorCoords, screenCoords)
        print convertFromScreenToFloorCoordSystem(mouseX, mouseY, M)
        
        
    cv2.imshow('image', frame)
    key = cv2.waitKey(1) & 0xFF 
    if key == ord('q'):
        break
 
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Tracking a simple, marked object with OpenCV / Python - Built-in solutions

In the previous posts, we investigated different, simple, custom object tracking methods. This series wouldn't be complete, if I won't mention the built-in object tracking algorithms in OpenCV.

I don't want to write too much about them, because there are plenty of tutorials (maybe, the best to start with is this). I would like to write about the reasons, why I bother myself to write an own object tracker, which even needs a strong light source instead of just use one of the built-in solutions.

Well... The thing is that they don't really work, if the tracked object is fast. They are good and general, but they can only track objects with low speed (compared to the speed of the camera/computer).

If you are interested in the built-in algorithms, just read the article I linked below. Maybe, in your application, one of them works nicely, 
You can try them easily, the link contains a short source code where you can test all of them, also all of them is described, so try them before you write your own tracker.

Tracking a simple, marked object with OpenCV / Python - part III

In the two previous posts, two simple methods were shown (based on background detection and based on ArUco) for object tracking, but for my application, none of them are sufficient.
In the first part, I simply couldn't find the center of my object and in the second part, the motion blur caused too much visual noise.
In this final chapter, I show you a method which takes advantage of saturation and can track a small, strong light source even with high blur.

The basic idea is that the pure white color is relatively rare even in a well lighted image. To reach it, you have to cause saturation in your camera. This can be done by a strong light source (for example, with a strong LED). In this post, I will use the flash of my mobile phone.

We will start the following code:


#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np
import cv2 
 
cap = cv2.VideoCapture(1) # This is my USB camera. 
                          # If you want to use your built-in camera,
                          # you propably have to write here zero

np.set_printoptions(threshold=np.nan) # If I print an array, I would like to see it...

showOriginal = True
pause = False

while(True):
    if not pause:
        # Capture frame-by-frame
        ret, frame = cap.read()
        #print(frame.shape) #480x640
        # Our operations on the frame come here
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        ret, thres = cv2.threshold(gray, 253, 255, cv2.THRESH_BINARY) # I keep only the white pixels
        nonzeros = np.nonzero(thres)
        try:
            cx = int(nonzeros[1].mean()) # If we have an all-black image, this will cause ValueError
            cy = int(nonzeros[0].mean())
            print "The coords are:", cx, cy
        except ValueError:
            print "I can't find the object"
            cx, cy = 0,0
            valid = False
    
    if showOriginal:
        cv2.imshow('Main', frame)
    else:
        cv2.imshow('Main', thres)
        
    
    pressedKey = cv2.waitKey(1) & 0xFF  

    if pressedKey == ord('q'):
        break
    elif pressedKey == ord('s'):
        showOriginal = not showOriginal
    elif pressedKey == ord('p'):
        pause = not pause
        
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()


The important part of the code is the treshold, where I keep only the truly white pixels (between intensity of 253 and 255).

Lets see how it works. In the first picture, you can see the raw image, and the processed one:
You can see a mostly white image with a saturated light source. Around the light source, there is a white area. After the treshold, only the area of our light source remains white.
This looks fine, we can easily detect our object "marked" with the light source, we only have to find the mass center of the image. It's like our previous background-based algorithm, but it can handle the dynamic background and we can find easily the same central point. Sadly, we are not ready yet, because motion blur can cause us some more headache:
 
As you can see, the motion blur works here too, but because of our strong light source, we get strong, detectable boundaries. We only have to decide, where is the light source of the grabbed frame. We have three, less or more reasonable position for it:  
First, speak about the green position. It can be good for some applications and it's easy to calculate, but it's definitely not the current position of the object. If you would like to use it, you will measure the acceleration wrongly. (Remark: The green point is not the mass center.)

If you want to know the exact position (and acceleration) of the object, you have to select between the red or blue dot. Both of them can be valid. To select between them, you have to know the previous position and the moving characteristics of the object. Usually, the farther is the right chose, but sometimes, the closer can also be right. (For example, if the tracked object makes small, fast circles.)

In my application, the acceleration is not so important, but I need a simple, stable algorithm, so I will chose the green dot. (But finding the others are similar.)
For this, we have to reduce this long white stripe into a line and find its half point.

For this, we have to find the morphological skeleton of the stripe. Sadly, there is no built-in function in OpenCV for morphological skeleton, but it's not so hard to implement it (and there are plenty of codes on the internet). I used this and this.

Here is our new, modified code which can show us the skeleton, too:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np
import cv2 
 
cap = cv2.VideoCapture(1) # This is my USB camera. 
                          # If you want to use your built-in camera,
                          # you propably have to write here zero

np.set_printoptions(threshold=np.nan) # If I print an array, I would like to see it...

show = 0
pause = False

def getSkeleton(img):
    " The input is a binary image. "
    originalInput = img # I don't want to modify the original image
    img = img.copy()

    size = np.size(img)
    skel = np.zeros(img.shape,np.uint8)
    element = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
    done = False
    
    # I added an attitional step, because the original algorithm keeps parts of the contours
    # I make a dilation, so the got contours will be out of the original shape
    # After the last step, I can eliminate the contours by a simple logical or
    img = cv2.dilate(img, np.ones((3,3)), iterations = 5 )
    
    while True:
        eroded = cv2.erode(img,element)
        temp = cv2.dilate(eroded,element)
        temp = cv2.subtract(img,temp)
        skel = cv2.bitwise_or(skel,temp)
        img = eroded.copy()
        
        zeros = size - cv2.countNonZero(img)
        if zeros==size:
            break
    
    return cv2.bitwise_and(skel, originalInput)

while(True):
    if not pause:
        # Capture frame-by-frame
        ret, frame = cap.read()
        #print(frame.shape) #480x640
        # Our operations on the frame come here
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        ret, thres = cv2.threshold(gray, 253, 255, cv2.THRESH_BINARY) # Usually, it's easier to work with binary images
        thres = cv2.erode(thres, np.ones((3,3)), iterations = 10 ) # noise filtering
        skeleton = getSkeleton(thres)
    
    
    if show == 0:
        cv2.imshow('Main', frame)
    elif show == 1:
        cv2.imshow('Main', thres)
    elif show == 2:
        cv2.imshow('Main', skeleton)
    
    pressedKey = cv2.waitKey(1) & 0xFF  

    if pressedKey == ord('q'):
        break
    elif pressedKey == ord('s'):
        show = (show + 1) % 3
    elif pressedKey == ord('p'):
        pause = not pause
        
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

And here is an example:

We are almost done. We got a long line with some smaller branches. 

In our last step, we have to find its half point. Sadly, this can be pretty hard. If you want to find the accurate solution, you have to fit a function to all of the white pixels with LSE (last square error). There are built-in algorithms for this in numpy (polifit) or in scipy. After this, you have to find the half point of the spline and hope that you choose the right degree and the branches won't cause too much error. According to my experience, this solution is sensitive to its parameters and causes too high noise level.
For my application I need a low-noise, stable solution which is not sensitive to its parameters, so I can make a simplification.
As I mentioned before, the mass center of the threshold is not a valid solution. If you use it, you will get usable result only in linear movement. In circular movement, the mass center will be nearer to the rotation center than the tracked object. The following figure shows the problem:
Here, the red dot marks the searched point. The green dot is the mass center and the blue dot is the rotation center. If we just calculate the mass center during the rotation, we will get a lower speed.
So using the mass center as the position of the tracked object is a bad idea, but starting the search around the mass center can help us. 
The idea is that the searched position (the red dot) must be the element of the skeleton and it's near to the mass center. So if we find the nearest point to the mass center of the skeleton, we can use it as a solution.

Calculation the mass center is easy, you have to get all nonzero pixel from the threshold and calculate the mean of X and Y coordinates:


    ret, thres = cv2.threshold(gray, 253, 255, cv2.THRESH_BINARY) # Usually, it's easier to work with binary images
    thres = cv2.erode(thres, np.ones((3,3)), iterations = 10 ) # noise filtering
    nonzeros = np.nonzero(thres)
    x = nonzeros[0]
    y = nonzeros[1]
    if np.size(x)>0:
        cX = x.mean()
        cY = y.mean()

Finding the nearest point of the skeleton to the mass center is a little bit complicated. You can use two method for it:

  1. You can create an image with the mass center as the only white pixel, dilate it and check that the gotten image and the skeleton has any common white pixel (with a binary_and operator). You have to dilate until you find a common pixel. This can be the winner solution if the skeleton contains lots of white pixels.
  2. You can iterate through every pixel of the skeleton and calculate the distance from the mass center. In my typical use-case, this is the fastest solution, I will implement this.
The final source code can be found here:


#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np
import math
import cv2 
import matplotlib.pyplot as plt
 
cap = cv2.VideoCapture(1) # This is my USB camera. 
                          # If you want to use your built-in camera,
                          # you propably have to write here zero

np.set_printoptions(threshold=np.nan) # If I print an array, I would like to see it...

show = 0
pause = False

def getSkeleton(img):
    " The input is a binary image. "
    originalInput = img # I don't want to modify the original image
    img = img.copy()

    size = np.size(img)
    skel = np.zeros(img.shape,np.uint8)
    element = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
    done = False
    
    # I added an attitional step, because the original algorithm keeps parts of the contours
    # I make a dilation, so the got contours will be out of the original shape
    # After the last step, I can eliminate the contours by a simple logical or
    img = cv2.dilate(img, np.ones((3,3)), iterations = 5 )
    
    while True:
        eroded = cv2.erode(img,element)
        temp = cv2.dilate(eroded,element)
        temp = cv2.subtract(img,temp)
        skel = cv2.bitwise_or(skel,temp)
        img = eroded.copy()
        
        zeros = size - cv2.countNonZero(img)
        if zeros==size:
            break
    
    return cv2.bitwise_and(skel, originalInput)

def getMassCenter(img):
    nonzeros = np.nonzero(img)
    x = nonzeros[1]
    y = nonzeros[0]
    if np.size(x)>0:
        return int(x.mean()), int(y.mean())
    else:
        return None, None       
    
def getNonzeroPixels(img):
    nonzeros = np.nonzero(img)
    X = nonzeros[1]
    Y = nonzeros[0]
    for x, y in zip(X, Y):
        yield x, y 
    
def getDistanceSquare(a, b):
    return math.pow(a[0]-b[0], 2) + math.pow(a[1]-b[1], 2)
    
while(True):
    if not pause:
        # Capture frame-by-frame
        ret, frame = cap.read()
        #frame = cv2.imread("skeleton_1.png") 
        #print(frame.shape) #480x640
        # Our operations on the frame come here
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        ret, thres = cv2.threshold(gray, 253, 255, cv2.THRESH_BINARY) # Usually, it's easier to work with binary images
        thres = cv2.erode(thres, np.ones((3,3)), iterations = 10 ) # noise filtering
        massCenter = getMassCenter(thres)        
        skeleton = getSkeleton(thres)
        
        if massCenter[0] is not None:
            closestPoint = (0,0)
            closestDistanceSquare = 99999999
            
            for p in getNonzeroPixels(skeleton):
                distanceSquare = getDistanceSquare(p, massCenter)
                if distanceSquare < closestDistanceSquare:
                    closestDistanceSquare = distanceSquare
                    closestPoint = p
            
            cv2.circle(frame, massCenter, 1, (0,255,0), 10)
            cv2.circle(frame, closestPoint, 1, (255,0,0), 10)
            cv2.circle(thres, closestPoint, 1, (127,), 10)
            cv2.circle(skeleton, closestPoint, 1s, (127,), 10)
        
    if show == 0:
        cv2.imshow('Main', frame)
    elif show == 1:
        cv2.imshow('Main', thres)
    elif show == 2:
        cv2.imshow('Main', skeleton)
    
    pressedKey = cv2.waitKey(1) & 0xFF  

    if pressedKey == ord('q'):
        break
    elif pressedKey == ord('s'):
        show = (show + 1) % 3
    elif pressedKey == ord('p'):
        pause = not pause
    elif pressedKey == ord('a'):
        plt.plot(xs, np.polyval(poly, xs), 'g', lw=1)
        plt.show()
        
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()


A tried to decompose it into smaller functions, I hope the code is clear enough to understand the basic ideas behind it.

Lets see how it works. Here, you can see a real image and our previous sample image.

As you can see, my original figure what a little bit exaggerated about the distance between the mass center (green) and the searched point (blue), but it still causes errors in the right conditions.

If your environment contains too much noises, you maybe have to modify this program, use filtering, blurring, or other algorithm, but the basic idea will work if you can mark your object with a strong light source.

Saturday, August 25, 2018

Tracking a simple, marked object with OpenCV / Python - part II

Continuing my previous post, I show you how you can track your object marked with a QR code.
The biggest advantages of using QR codes is that there are lots of free libraries for it.
I will use ArUco which is a nice library for augmented reality applications. It can detect QR-code lisquared markers and calculate their positions and orientations.

I wont give you introduction into ArUco, you can learn the necessary information from this blogpost, for example. I also used the codes from there and I printed some tags, but if you don't want to print them, you can use the display of your mobile phone.

Here is the used marker with a white boundary:

And in this image, you can see the detected marker:


This looks nice, the library found the rotated, perspective code, it marked it, and it's really fast. Sadly, it has a problem if you use a common USB camera with slow shutter, and it's the motion blur.

You can see the problem on this image, where I moved the paper:


So for conclusion, you can use aruco if you would like to track an object with low speed or with good camera, but in my application, aruco alone is not a good solution.p

Tracking a simple, marked object with OpenCV / Python

In this post, I will cover a simple case of object tracking problem, where I want to measure a speed of a robot. The robot is made by me, so it's markable and reshapeable. I want a simple, robust algorithm, which can work with a simple, USB webcam and on a low-cost computer.
Because of the speed measurement, it's important to find the exact coordinates of the robot. An approximate bounding box is not right, because it would add too much error to the measurement.
The tracking can be divided into two parts:

  1. We have to find the (marked) robot on the first frame
  2. We have to follow the robot during fast movements
The first step is very easy, but the second can be hard because of the blur during the movement.
During this post, I will use Python 2.7 and OpenCV.

Use difference between frames

Firstly, I simply detected the background and found the searched object. It's fast and robust, but it can be hard to ensure the static background. You have to stabilize the camera, keep the lighting and extract the background. Even if you do all of this, you still have to use some filter on the difference (between the background and the new frames.)
Here is a simple code which write the (assumed) center of your object:

 #!/usr/bin/python  
 # -*- coding: utf-8 -*-  
 import numpy as np  
 import cv2   
 cap = cv2.VideoCapture(1) # This is my USB camera.   
              # If you want to use your built-in camera,  
              # you propably have to write here zero  
 background = None   
 np.set_printoptions(threshold=np.nan) # If I print an array, I would like to see it...  
 while(True):  
   # Capture frame-by-frame  
   ret, frame = cap.read()  
   #print(frame.shape) #480x640  
   # Our operations on the frame come here  
   gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)  
   if background is None:  
     cv2.imshow('Main', gray)  
   else:  
     # np.abs(gray-background) is not good for the difference, because our array contains unsigned bytes  
     # If we calculate 1-3, we got not -2 but 254  
     # But OpenCV got our back  
     diff = cv2.absdiff(gray, background)  
     ret, thres = cv2.threshold(diff, 20, 255, cv2.THRESH_BINARY) # Usually, it's easier to work with binary images  
     kernel = np.ones((3,3),np.uint8) # You can make some nice trick by a good kernel matrix, but in this case, we would like  
                      # just clear the single pixels  
     filtered = cv2.erode(thres, kernel, iterations = 1)  
     # Now, we find the mass center of the image  
     nonzeros = np.nonzero(filtered)  
     try:  
       cx = int(nonzeros[1].mean()) # If we have an all-black image, this will cause ValueError  
       cy = int(nonzeros[0].mean())  
       print "The coords are:", cx, cy  
     except ValueError:  
       print "I can't find the object"  
       cx, cy = 0,0  
       valid = False  
     # I show every images  
     horizontalConcat = np.concatenate((background, gray, diff, filtered), axis=0)  
     cv2.imshow('Main', horizontalConcat)  
   pressedKey = cv2.waitKey(1) & 0xFF   
   if pressedKey == ord('q'):  
     break  
   elif pressedKey == ord('b'):  
     background = gray  
 # When everything done, release the capture  
 cap.release()  
 cv2.destroyAllWindows()  

You can see a sample output here, where the script finds a glass:


In this image, you can see the basic problems with this method.
Firstly, I had to create an image without my object. Sometimes, this can be hard.
After that, the object modified its environment. It created shades and reflections.
And last, you can't find the exact point of the glass  twice with this method during movement.

Regardless its drawbacks, this algorithm is simple and fast enough to use it in some special cases. For my use case, it wasn't good because the moving robot changes its contours which changes its center of mass, which added too much noise for my speed measurement.