Fingertip tracking has a number of compelling applications and, thus, is of interest to a great number of people. It is surprising that there are so few high level descriptions of techniques for fingertip tracking. Here I describe two methods for finger tip tracking: k-curvature and convexity defects. While this post is technical, it does not give implementation details. Rather, it is a high level description of these two techniques.

The first step in tracking fingertips is locating the hand. This can be accomplished through foreground-background segmentation. There are a fair number of techniques for this kind of segmentation. Some of them are rather sophisticated. I have used simple background subtraction in the past. These details are beyond the scope of this post.

When the hand is properly segmented from the background, the hand is represented by a series of points as illustrated below. One possible formulation of the goal of a fingertip tracking algorithm is the identification of the points that correspond to the location of the fingertip relative to the rest of the frame. In other words, the goal is to identify the points marked red in the image below.

One way of finding these points is to find the convexity defects in this set of points.

A curve that is convex is one that curves outward rather than inward as is the case with a concave curve. If a curve is described as a series of points, those points corresponding to the convex parts of the curve are called its convex hull. A convexity defect is a set of points that are not also in the set comprising the convex hull.

There are two reasons that convex curves are important to finger tracking. First, OpenCV has efficient algorithms for both determining the convex hull and finding its convexity defects. Second, the convexity defects of the points that represent the outline of a hand give clues as to the location of the finger tips. How is this accomplished?

Below is an illustration of the outputs of OpenCV's convexity defect functions. It identifies those places on the curve that are concave. In the case of the hand, this is the space between fingers. Most importantly, the end points of these convexity defects correspond to fingertip locations.

There is an important limitation of using convexity defects for fingertip tracking. Given a hand with a single finger, there are no convexity defects. This means that this approach requires more than one finger to be extended to function properly.

Another method for fingertip tracking is called k-curvature. It is accomplished as follows:

For each point in the set of points describing a curve, the algorithm determines the angle between the two lines that start at the point in question and end k points away in either direction. This is illustrated above. If the angle is determined to be below a certain threshold, often 60 degrees, the angle is marked as being of interest. This results of this process with the outline of a hand as an input is illustrated below.

At this point, the algorithm has not distinguished between the space between fingers and the fingertips themselves. Often, implementations look for those places in which the points k away are lower in height than the points under consideration. This works for upward pointing hands, but not downward pointing hands.

Another way of distinguishing between finger tips is to keep only those points that are further from the centroid than points k away. (The centroid is a point roughly in the center of the set of points.) This allows for detecting finger tips in a downward pointing hand configuration.

These techniques are far from perfect. The hand easily succumbs to occlusion, meaning that it is easy to hide any given fingertip behind any other part of the hand. Further, these techniques are two dimensional and unable to detect, for example, a finger pointing directly at the camera. Some of these imperfections may be able to be remedied by using machine learning techniques such as particle filters. Others may be remedied through 3d cameras. Ultimately, fingertip tracking using cameras has not been perfected.