Human gesture serves three functional roles [Cad94]: semiotic, ergotic, and epistemic.
All three functions may be augmented using an instrument. Examples include a handkerchief for the semiotic good-bye gesture, a turn-table for the ergotic shape-up gesture of pottery, or a dedicated artefact to explore the world (for example, a retro-active system such as the pantograph [Ram94] to sense the invisible).
In Human Computer Interaction, gesture has been primarily exploited for its ergotic function: typing on a keyboard, moving a mouse, and clicking buttons. The epistemic role of gesture has emerged effectively from pen computing and virtual reality: ergotic gestures applied to an electronic pen, to a data-glove or to a body-suit are transformed into meaningful expressions for the computer system. Special purpose interaction languages have been defined, typically 2-D pen gestures as in the Apple Newton, or 3-D hand gestures to navigate in virtual spaces or to control objects remotely [BBL93].
With the exception of the electronic pen and the keyboard which both have their non-computerized counterparts, mices, data-gloves, and body-suits are ``artificial add-on's'' that wire the user down to the computer. They are not real end-user instruments (as a hammer would be), but convenient tricks for computer scientists to sense human gesture.
We claim that computer vision can transform ordinary artefacts and even body parts into effective input devices. Krueger's seminal work on the videoplace [Kru93], followed recently by Wellner's concept of digital desk [WMG93] show that the camera can be used as a non-intrusive sensor for human gesture. However, to be effective the processing behind the camera must be fast and robust. The techniques used by Krueger and Wellner are simple concept demonstrations. They are fast but fragile and work only within highly constrained environments.
We are exploring advanced computer vision techniques to non-intrusively observe human gesture in a fast and robust manner. In the next section, we present FingerPaint, an experiment in the use of cross-correlation as a means of tracking natural pointing devices for a digital desk. By ``natural pointing device'', we mean a bare finger or any real world artefact such as a pen or an eraser.