Thomas Royal

Composer | Pianist | Technologist

Writings


A Week of Computer Vision Implementation Failures with One Success

May 24, 2012

I have been attempting to grapple with the linkage between sound and image as practiced in contexts of music performance with computers. The idea is to foster a sense of a world set apart from our own by linking movement, vision, and sound. In light of this goal, I have been investigating a number of computer vision techniques for gaining meaningful data from a live video stream. (Warning: this is technical.)

Lucas-Kanade Optical Flow

My use of OpenCV for some time has been relegated to utilizing other people's implementations of object tracking using Haar classifiers and blob tracking. I am impressed with the robustness of the Haar tracker, but the technique is computationally expensive, especially seeing that it takes a week to train a classifier to recognize a particular object. Blob tracking is good, but limited, and I did not have enough knowledge a week ago to create more robust solutions than ones based on simple background subtraction.

The first openCV function that I found potentially useful in artistic contexts was calcOpticalFlowPyrLK(). Given two frames of video and a set of landmarks corresponding to points of interest on the first frame, calcOpticalFlowPyrLK() will return a second set of points that represent where the first set of points are found on the second frame. Essentially, these points describe detected movement. I used these measurements of movement to create the above video.

Optical Flow Object Tracking

Having discovered the optical flow function, I wondered if I could then find a method for tracking objects that was less expensive than the Haar classifier. I found this project report that describes a method of tracking objects using optical flow. I tried to implement something similar utilizing a moving mask as a filter to the landmark detector. This produced interesting results. Unfortunately, I had significant problems with features drifting and the classifier loosing the object.

Optical Flow Object Tracking with Color Histogram Reinforcement

I decided to try to learn how to implement Mathias K├Âlsch and Matthew Turk's "Flocks of Features" method for object tracking as described here. To say the least, it was a learning experience. For the most part, I am not sufficiently experienced in computer vision and mathematics to implement things by reading a paper. While the accompanying video shows some successful tracking, the flock is shown drifting from the object.

While I am confident that I can develop a better implementation, the process of recording the video made me question my motives. I did not see a musical application for the tracking of the position of arbitrary objects. Generally, one's interaction with a musical instrument is more sophisticated than merely placing it in space. While crafting metaphors to musical instruments may be limiting, I can say that as a musician, I found the mere moving of objects around a 3d space less than inspiring.

Optical Flow Musical Instrument - Superclusterfon

This is, perhaps, one of my more successful experiments to date. In it, I utilize the optical flow algorithm to detect motion. Whenever the slightest amount of motion occurs, two things happen. One, a "galaxy" appears moving at the detected rate of motion. Two, one of several pitches is given a little bit of volume. The speed of motion determines the pitch. Faster movements have different pitches (not necessarily higher or lower). There is a process of accumulation so that the more movement at a particular speed, the louder the pitch. (This is admittedly not all that evident.)

The process of crafting this taught me something important about the linkage between physical movement and music.

  1. Linkages between sound and movement, to be interesting, must link significant differentiation in movement with significant differentiation in sound.
  2. Analogous physical movements must be reflected in analogous sounds, allowing for surprising linkages.
  3. Differentiation in scale should be seen and heard, meaning that slightly different movements should produce slightly different sounds and largely different movements should produce largely different sounds.
  4. Difference must be immediately perceptible at all scales.
  5. Much fun is to be had in breaking these silly principles.

The way that this particular implementation could be improved is by detecting different kinds of movement and allowing these to be reflected in more highly differentiated sound processes. Differentiating movement has proved to be challenging. Maybe with the release of more refined sensors, this might be more easily achievable. Until then, the grind continues.