Engineer IT: Calculating the real-world position of the tracked object

In the previous posts, I showed a simple but robust object tracker algorithm which can track a robot with a cheap camera.
The problem is that usually the camera is not mounted on the ceiling, the moving plane of the robot is not parallel with the plane of the camera. This is called isometric perspective view.

With a tracker we can get the position of our robot in the coordinate system of the monitor.
This is useless in most real-life application, we have to convert these coordinates somehow into the real-world coordinates.

Theoretically, this is a very easy, basic task. We have two 2D coordinate systems (the monitor and the floor). We only need a transformation matrix and we have to multiply the gotten monitor coordinates of the robot with it.
Calculating the transformation matrix is also easy. It's teached in most universities in the first semester and OpenCV contains a function for it.

The cv2.getPerspectiveTransform(pts1, pts2) needs an array with four points in the floor coordinate system (pts1) and their coordinates in the monitor coordinate system (pts2). The return value is a 3x3 matrix (if you don't know, why we need a 3x3 matrix for a 2D transformation, but you are curious, read this.) The returned M matrix can be used by cv2.warpPerspective(img, M, (rows, cols)), which will transform your image into the floor coordinate system.

This is easy, it's not really worth a blog post. You only have one problem: how will you get your four points? The easiest solution is that you mark four points on the floor which you measured. You know their coordinates on the floor. You start your measure by marking manually/finding automatically these points on the image and you get their coordinates on the monitor.

The problem is, that placing the marks correctly on the floor is time-consuming and you must to be precise. Finding automatically on the camera frame these marks can be hard, because your algorithm must be precise. The manual finding is also problematic, because during a long measurement, the camera will move slightly and it will cause you an offset error. (The automatic algorithm must be fast because you have to run it periodically.)

In this blogpost, I will show you an easy method based on ArUco, which can solve easily this problem. I already wrote about ArUco, there we saw that ArUco can recognize its codes even from an isometric view. This is what we will exploit.

The aruco.detectMarkers function has three return values: the corners of found markers, the ID-s and the rejected points. We will use the corners as fix points in the floor coordinate system (we know its exact size) and in the monitor coordinate system.

We have only one technical problem to solve: we can't use warpPerspective, because we don't want to display or work with the transformed image, we will need only the floor coordinates of the tracked object, but it's easy to use the return value of the cv2.getPerspectiveTransform(pts1, pts2).
The details are on the manual page, but I will put here a simple code which do it:

def convertFromScreenToFloorCoordSystem(x, y, M):
    """ 
    x and y are in screen coordinates,
    M is the return value of getPerspectiveTransform
    """
    M_inv = np.linalg.inv(M)
    
    X = (M_inv[0,0]*x + M_inv[0,1]*y + M_inv[0,2]) / (M_inv[2,0]*x + M_inv[2,1]*y + M_inv[2,2])
    Y = (M_inv[1,0]*x + M_inv[1,1]*y + M_inv[1,2]) / (M_inv[2,0]*x + M_inv[2,1]*y + M_inv[2,2])
    
    return X, Y

Of course if you would like to use this method for measures, you have to use as big QR code as you can to minimize the error.

And as a working example, here is a program, which calculates the floor coordinates of your mouse pointer if the camera sees the marker.


#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np
import cv2
import cv2.aruco as aruco
 
 
cap = cv2.VideoCapture(1)


mouseX = 0
mouseY = 0

def mouseHandler(event, x, y, flags, param):
    global mouseX,mouseY
    if event == cv2.EVENT_MOUSEMOVE:
        mouseX, mouseY = x, y

def convertFromScreenToFloorCoordSystem(x, y, M):
    """ 
    x and y are in screen coordinates,
    M is the return value of getPerspectiveTransform
    """
    M_inv = np.linalg.inv(M)
    
    X = (M_inv[0,0]*x + M_inv[0,1]*y + M_inv[0,2]) / (M_inv[2,0]*x + M_inv[2,1]*y + M_inv[2,2])
    Y = (M_inv[1,0]*x + M_inv[1,1]*y + M_inv[1,2]) / (M_inv[2,0]*x + M_inv[2,1]*y + M_inv[2,2])
    
    return X, Y


cv2.namedWindow('image')
cv2.setMouseCallback('image', mouseHandler)

while(True):
    ret, frame = cap.read()

    aruco_dict = aruco.Dictionary_get(aruco.DICT_6X6_250)
    parameters =  aruco.DetectorParameters_create()
    corners, ids, rejectedImgPoints = aruco.detectMarkers(frame, aruco_dict, parameters=parameters)
   
    if len(corners) > 0:
        # Draw it for debugging    
        frame = aruco.drawDetectedMarkers(frame, corners)
        floorCoords = np.float32([[20, 0], [20, 20], [0, 20], [0, 0]]) # My marker is 20cm X 20cm
        screenCoords = corners[0][0] # I use the first found marker
        
        M = cv2.getPerspectiveTransform(floorCoords, screenCoords)
        print convertFromScreenToFloorCoordSystem(mouseX, mouseY, M)
        
        
    cv2.imshow('image', frame)
    key = cv2.waitKey(1) & 0xFF 
    if key == ord('q'):
        break
 
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Engineer IT

Sunday, August 26, 2018

Calculating the real-world position of the tracked object

No comments:

Post a Comment