The road to loquacious instrumentation is rough but passable
Companies are constantly looking for ways to differentiate their wares, and an
increasingly common feature vendors use towards that goal is making products talk. At
first the task seems easy: "It can't be that hard," they reason, "Why, I
can buy greeting cards that talk, picture frames that record and playback messages, even
message recorders the size of a credit card!" The problem, though, is that adding a
voice isn't always the easy and inexpensive thing that most people believe it is. If
youre contemplating such an addition to a product, this column and its sequel will
cover many of the issues involved in implementing this technology.
Behind the Scenes
The real challenge is remembering the speech generation is more than a technical issue.
Our eons of socialization have led us to associate speech with humans, so people are very
sensitive to a device speaking to them. For instance, remember the auto warning
announcements and the rebellions that occurred over their whiny, irritating
pronouncements? However, a voice at the right time and in the right place can be
effective. Recently, two of my sons got a tour of the cockpit of a 737-400 (I was peeking
over their shoulders). The pilot went through all the controls, bells, whistles, lights,
and voices. The voice and words sounded right in the cockpit, and I could imagine that
they would really help in anxious situations. However, I also wondered how much time and
money the manufacturer spent getting the voices just right for that application.
Therefore, before I add voice to a design, I try to find out as much as I can about the
application so that the design fits the client's needs. The following are some of the
questions for which I make sure I get answers:
- How do I record the message?
- How will I store the audio message in the product?
- How loud should the audio message play back?
- Should the volume be user adjustable?
- How should the product handle multiple languages?
- What fidelity or tonal quality should the audio have?
- What mood or instructions should the message content convey?
- What are appropriate message lengths?
You can see from this list that adding a voice to a product isn't so simple. Yet all of
the answers impact the product's design and production cost. At my firm, talking products
don't fall into the mass-production category. Because they typically sell in quantities
anywhere from hundreds to thousands of units in a year, this volume really limits how we
implement a voice solution.. With this aspect in mind, let me answer a few of the previous
questions with my observations.
Recording -- I've taken the tact that recording should be easy and
also easily available. Most of my clients would like to record their own people speaking
(low cost) rather than hire a professional announcer. Recording these voices with the
Windows .wav PCM format allows people to record themselves on a PC or to use files
from a recording studio. Also, decoding a .wav file by an embedded program is a
straightforward exercise that allows easy programming of an audio message into the
Message Storage-- The .wav file is linearly arranged and uses PCM
encoding, so the data need no manipulation before replay. I've been using flash memory for
the storage, so we can easily reprogram the embedded system for other messages or other
Volume -- This issue can be a nightmare, because it's impossible to
define what's loud enough. Everyone hears differently, the ambient environment is never
the same as in the demonstration lab, and the instrument case always has the speaker in
the worst possible location. In addition, getting sufficient volume out of a speaker
always seems to strain an instrument's 5V power budget. I've tried acoustic sound level
meters for setting a maximum volume, but in my opinion, the best approach is to make it
louder than you can ever imagine--just don't distort.
Settable Volume -- While youre figuring out how to generate
massive amounts of volume, don't forget you might have to tone it down for the quiet
sites. Do you let the user adjust it? If so, do you let the customer turn a pot or does
the system CPU control the volume? I've found that the choice depends on the application.
If you don't want people fiddling with the volume, put it into the system; if you do want
it adjustable, approach this feature as you would any other user-interface element and
shoot for ease of use.
Multiple Languages -- The world doesn't just speak English ,
especially in an international marketplace. Your choices are to build in enough memory to
cover the multiple languages you might need, swap a language ROM depending on the intended
location, or offer easy system reprogramming with alternate language messages. Also, don't
forget that along with language changes, you'll also have to adapt parts of the messages
themselves with different units of measure.
Fidelity -- The question really is what's good enough? Obviously, the
better the fidelity, the higher the cost. In an embedded system, fidelity comes from three
main areas: the sampling frequency, the speaker, and the enclosure.
In my designs, I use an 11.025 KHz sampling frequency for my simple reason that my
carefully calibrated ear can easily hear a difference between 8 KHz and
11.025 KHz. Above that level, though, I can't detect enough of a difference to
justify the expense. Likewise, 8 KHz would probably work fine in most applications,
but given that fidelity is so subjective, I opt for the safer route and use the higher
The quality of the speaker also has a major impact on sound fidelity--so forget simply
throwing a little 1" PM speaker into the system. To get decent fidelity you need a
speaker that can move some air. In my designs, I've been using 3" speakers with
fairly large magnets. Just be sure that the one you select has a good mounting system and
a rigid frame. I once used a speaker that worked fine sitting on the benchtop but
distorted like crazy when mounted in its case. I traced the problem to an uneven mounting
surface that twisted the speaker frame slightly, causing the voice coil to bind.
Finally, when considering the role the system enclosure plays in defining sound
fidelity, think about all the time and money speaker manufactures invest in designing
baffles. When designing the acoustics for a project I've found a few tricks. For example,
a speaker lying in the open on a lab bench looses a lot of its bass response, but placing
the speaker magnet down on top of a roll of duct tape that's lying flat on the bench
improves the bass response considerably. The same technique (minus the duct tape) also
works inside an enclosure--you simply need a cover that seals the back of the speaker. You
can form this seal from vacuum-formed plastic, a PVC end cap or a piece of sheet metal. If
this inexpensive yet effective trick is still too much trouble, just placing the speaker
inside an enclosure also helps.
Message Content -- It's hard to give a lot of concrete advice on this
area because it's so dependent on the intended audience. Some designs might only need
short warnings ("wind shear"), while others might provide the user with
instructions ("Hit any key to continue"). Also, decide on a style guide for the
messages. Make sure they always refer to widgets by the same name or acronym. Consider
also their tone. Should they make polite requests or issue commands? In many ways the
effort you put into creating message content is no different than that for creating sales
documents and user manuals.
Message Length -- Whatever the message content, make sure the
developers are aware of this constraint.. The most basic limitation is that the message
memory must fit into the system's memory map along with everything else the program does.
Also be aware that if youre close to the limit, changing languages can easily push
the design over the edge. Finally, if youre using instructional messages, make sure
theyre complete before the user follows the instruction--nobody likes waiting around
for a box to finish talking so he can move on the next step.
As you can see, adding voice capability drags into a project a variety of new issues.
In the next part of this series I'll go into the details of how I've implemented voice
hardware and software in an embedded design. PE&IN