Let Your Fingers Do the Singing

Prof. Bob Pritchard and Prof. Sidney Fels have created one of the few systems in the world that translates hand gestures to digitally synthesized speech and song - photo by Martin Dee
Prof. Bob Pritchard and Prof. Sidney Fels have created one of the few systems in the world that translates hand gestures to digitally synthesized speech and song – photo by Martin Dee

UBC Reports | Vol. 53 | No. 11 | Nov. 1, 2007

By Julie-Ann Backhouse

Over 200 years ago Wolfgang von Kempelen created a manually operated speech machine. It produced spoken words by pumping a bellows and shaping air through tubes into vowels and consonants. His 1770’s talking device is considered the start of speech synthesis and inspired a line of successors curious about generating speech via artificial vocal tracts.

Now UBC researchers have created a new system that translates hand gestures to speech using a computerized glove. It is one of the few gesture-controlled systems in the world to create digitally synthesized speech and song, with the wave of your hands.

The project, Gesturally Realized Audio, Speech and Song Performance (GRASSP), is lead by composer and music professor Bob Pritchard, of the UBC School of Music, and investigates how sound can be shaped and how speech or song can be produced using hand gestures and technology.

“As an artist I’m interested in fresh ways of expressing human emotion and how we understand the human condition,” said Pritchard. “This gesture-controlled system is not unlike conducting an orchestra, adding elements and moving sound around.”

Collaborating with Pritchard is UBC Prof. Sidney Fels, Director of the Media and Graphics Interdisciplinary Centre (MAGIC). Fels is a UBC professor of computer and electrical engineering who first developed the gesture-controlled speech system called Glove-Talk.

With GRASSP musicians or performers use sensitized gloves to control and create speech, song, and electro-acoustic sounds via software that models the vocal tract. They can also control the processing of multi-channel sound from other acoustical and digital instruments through specific hand movements.

This gesture-based system gives musicians or performers access to an unlimited range of sounds and words — not available with traditional text-to-speech synthesizers — in addition to facilitating greater pitch variation and integrating visuals within vocal expression.

It takes about 100 hours to learn to use the gloves and performers are then able to move all 10 fingers and a foot pedal to produce vowels, consonants, vocal sounds, pitch and volume.

“The tipping point comes when the vocalist, or musician, starts to get really expressive with it,” notes Fels. “At that point it becomes integrated into the person, part of the performance, and is no longer only technology.

“A gesture-based system expands options for performers, allowing them to move sound around the stage, or to develop the performance for a specific site, or to activate moving and still images,” says Pritchard. It is anticipated that this gesture-controlled system will soon include features to activate synthetic faces, kinetic sculptures, or moving robots, for interactive performances.

The researchers are currently refining GRASSP on many fronts: making the system portable; adding adaptive features to allow for unique expressive styles; working with a textile artist from the Emily Carr School of Art + Design to address aesthetic elements of the gloves; and collaborating with UBC linguistics professor Eric Vatikiotis-Bateson to analyze voice production.

The research team has plans to expand the system to facial gestures, allowing performers to produce digitized sound modeling throat and mouth movements.
“Music is about shaping sound, forming a continuous sonic wave, and science has tried to artificially reproduce sound for a long time — it’s the basis of modern communication,” says Fels. “This takes it a little further.”

Both Pritchard and Fels are members of the UBC Media and Graphics Interdisciplinary Centre (MAGIC) and the UBC Institute for Computing, Information & Cognitive Systems (ICICS).

This project received funding from Social Sciences and Humanities Research Council of Canada (SSHRC) and has recently secured joint funding from the Natural Sciences and Engineering Research Council (NSERC) and the Canada Council for the Arts.

-

-

-

How Does It Work?

GRASSP uses several input devices including a Cyberglove from Immersion Corp. to shape vowel and some consonant sounds, a self made contact-sensitive glove to control stop sounds such as B, D and G, a Polhemus Fastrak™ to control vowel sounds and a foot pedal to control volume.

Interactive Links

Fels original research with graduate student singing alphabet (click on alphabet video)

Traditional text-to-speech synthesis (type in sentences and hear them reproduced)