Fast Object Tracking

Purpose: Track the motion of a tennis ball in real-time using a webcam.



Why bother?

Being able to quickly track an object opens up interesting possibilities within the realm of Human-Computer Interaction. As an example, any trackable object can quickly and easily be used as a mouse replacement - creating a direct relationship between the object's position and the placement of the mouse cursor. The implementation that I chose was done so specifically to fulfill the role of creating a digital goalie to be paired with a non-digital hockey game.
 


Equipment:

Note: OpenCV is not required for this project, I merely use it to grab frames from the webcam.


Synopsis

Generally speaking, these are the steps that I take in determining the location of the ball, taking a webcam frame as input:
  1. Generate Heuristic for Object Identification
  2. Perform automated background removal
  3. Compute Integral Image
  4. Isolate region of highest intensity in image
  5. Compute coordinates of actual object based on intensity data


Heuristic Generation

This is the weakest part of this algorithm, as for my simple example I had control over a number of conditions of my environment. Shortcomings here include dependency on consistent lighting throughout execution, as well as vulnerability to lose objects that have shadows cast on them. There are numerous ways of doing this, most of them more robust than this- however, this is very fast and suited the need for my application. Benefits of this simple technique are that it's simple (and thus easy to implement), and also that it responds very well to near total occlusion (try covering the tennis ball with your hand, leaving only small gaps between your fingers).
  1. Take a picture of your object (no flash) in the lighting conditions in which you will be using the algorithm.
  2. Crop out all but a rectangle encompassing the middle of the object (example: )
  3. Take some piece of color-based information from the sample given. In my example I was using the color average of the entire tile. This works due to the homogeneity of color inherent to tennis balls. To avoid the lighting dependency I considered using the ratios of the channels of the color averages against eachother, but ultimately never ended up implementing it.

    I'm sure using an image derivative (of a more complex or patterned object) could be utilized for this, comparing/evaluating the distance of corners/edges between the control (sample) image and the live feed. This would have been entirely over-engineering a solution to the problem I faced and I have thus left it to theory.


Background Elimination

Now that we have an idea of what our object "looks like" to the computer, we can proceed to eliminate anything that doesn't fit the heuristic we're using. When using something as simple as a color average, you need only set every pixel outside TOLERANCE distance from TARGET (color average) color to 0. TOLERANCE can vary depending on your conditions, I would recommend either writing a calibration utility or merely looking at the background-removal function's results and tinkering with it iteratively.

For those merely following what I've done to duplicate the effect: Zero out all pixel values (across all channels) that do not come within your specified tolerance of your color average.
 


Integral Image

Now, take the image that had background processing applied to it and compute the integral image. Essentially, this means that (starting from the top left at position (0,0)) the value at (n,n) = (n-1,n) + (n,n-1) - (n-1,n-1). If (n-1,n), (n,n-1), or (n-1,n-1) does not exist, assume it to be 0.

This provides a convenient starting step for the intensity isolation method used, as it allows rapid calculation of the intensity of a rectangular area of the image. (MUL is expensive, ADD is not)
 


Intensity Isolation ("Shifting and Constricting Window" Algorithm)

  1. Set initial conditions:
    • Minimum Window Size
    • Window Size Stepdown increment
  2. Set window size to capture_size
    • Set intensity to 1, prevIntensity to 0
  3. Loop while the current intensity is greater than prevIntensity*(1-tolerance) and the windowSize is greater than the minimum window size
    1. Decrement the window size (make sure window size doesn't go beneath minwinsize)
    2. For each possible non-overlapping window position:
      1. Calculate average (per-pixel) intensity of the window using the integral image
      2. If the average intensity is greater than the previous intensity*(1-tolerance)
        • Set prevIntensity to Intensity
        • Set intensity to the average intensity value
        • Store top left point of the window coordinates
        • Store current window size [Note: I was tracking a tennis ball, which is theoretically spherical and can thus be well contained within a square. You can generalize this algorithm to handle rectangles by adding another dimension of window size and constricting appropriately. More explanation on this topic can be found at the end of the description.]
  4. If the window size is greater than some max_size or the intensity is less than some amount proportional to the window size, return a window of size 0. Otherwise, return top left point coordinates and window size information

 


3D Coordinate Translation

Given some information about the true size of an object (width and height in pixels at a fixed distance from the camera), it is certainly possible to measure the true 3D location of the object. Since you are given the object's center and size with this algorithm, it is just a matter of writing a function that takes the size of the fully constricted window as input and outputs a distance 'z'. Your object will then be found within the cube at (Window.x + WindowSizeX/2, Window.y + WindowSizeY/2, z).