SLTF Consulting
Technology with Business Sense

 Home | Bio | Contact Us | Site Map


 

The road to loquacious instrumentation is rough but passable

Scott Rosenthal
October, 1996

Companies are constantly looking for ways to differentiate their wares, and an increasingly common feature vendors use towards that goal is making products talk. At first the task seems easy: "It can't be that hard," they reason, "Why, I can buy greeting cards that talk, picture frames that record and playback messages, even message recorders the size of a credit card!" The problem, though, is that adding a voice isn't always the easy and inexpensive thing that most people believe it is. If you’re contemplating such an addition to a product, this column and its sequel will cover many of the issues involved in implementing this technology.

Behind the Scenes

The real challenge is remembering the speech generation is more than a technical issue. Our eons of socialization have led us to associate speech with humans, so people are very sensitive to a device speaking to them. For instance, remember the auto warning announcements and the rebellions that occurred over their whiny, irritating pronouncements? However, a voice at the right time and in the right place can be effective. Recently, two of my sons got a tour of the cockpit of a 737-400 (I was peeking over their shoulders). The pilot went through all the controls, bells, whistles, lights, and voices. The voice and words sounded right in the cockpit, and I could imagine that they would really help in anxious situations. However, I also wondered how much time and money the manufacturer spent getting the voices just right for that application.

Therefore, before I add voice to a design, I try to find out as much as I can about the application so that the design fits the client's needs. The following are some of the questions for which I make sure I get answers:

  • How do I record the message?
  • How will I store the audio message in the product?
  • How loud should the audio message play back?
  • Should the volume be user adjustable?
  • How should the product handle multiple languages?
  • What fidelity or tonal quality should the audio have?
  • What mood or instructions should the message content convey?
  • What are appropriate message lengths?

You can see from this list that adding a voice to a product isn't so simple. Yet all of the answers impact the product's design and production cost. At my firm, talking products don't fall into the mass-production category. Because they typically sell in quantities anywhere from hundreds to thousands of units in a year, this volume really limits how we implement a voice solution.. With this aspect in mind, let me answer a few of the previous questions with my observations.

Recording -- I've taken the tact that recording should be easy and also easily available. Most of my clients would like to record their own people speaking (low cost) rather than hire a professional announcer. Recording these voices with the Windows’ .wav PCM format allows people to record themselves on a PC or to use files from a recording studio. Also, decoding a .wav file by an embedded program is a straightforward exercise that allows easy programming of an audio message into the embedded system.

Message Storage-- The .wav file is linearly arranged and uses PCM encoding, so the data need no manipulation before replay. I've been using flash memory for the storage, so we can easily reprogram the embedded system for other messages or other languages.

Volume -- This issue can be a nightmare, because it's impossible to define what's loud enough. Everyone hears differently, the ambient environment is never the same as in the demonstration lab, and the instrument case always has the speaker in the worst possible location. In addition, getting sufficient volume out of a speaker always seems to strain an instrument's 5V power budget. I've tried acoustic sound level meters for setting a maximum volume, but in my opinion, the best approach is to make it louder than you can ever imagine--just don't distort.

Settable Volume -- While you’re figuring out how to generate massive amounts of volume, don't forget you might have to tone it down for the quiet sites. Do you let the user adjust it? If so, do you let the customer turn a pot or does the system CPU control the volume? I've found that the choice depends on the application. If you don't want people fiddling with the volume, put it into the system; if you do want it adjustable, approach this feature as you would any other user-interface element and shoot for ease of use.

Multiple Languages -- The world doesn't just speak English , especially in an international marketplace. Your choices are to build in enough memory to cover the multiple languages you might need, swap a language ROM depending on the intended location, or offer easy system reprogramming with alternate language messages. Also, don't forget that along with language changes, you'll also have to adapt parts of the messages themselves with different units of measure.

Fidelity -- The question really is what's good enough? Obviously, the better the fidelity, the higher the cost. In an embedded system, fidelity comes from three main areas: the sampling frequency, the speaker, and the enclosure.

In my designs, I use an 11.025 KHz sampling frequency for my simple reason that my carefully calibrated ear can easily hear a difference between 8 KHz and 11.025 KHz. Above that level, though, I can't detect enough of a difference to justify the expense. Likewise, 8 KHz would probably work fine in most applications, but given that fidelity is so subjective, I opt for the safer route and use the higher sample rate.

The quality of the speaker also has a major impact on sound fidelity--so forget simply throwing a little 1" PM speaker into the system. To get decent fidelity you need a speaker that can move some air. In my designs, I've been using 3" speakers with fairly large magnets. Just be sure that the one you select has a good mounting system and a rigid frame. I once used a speaker that worked fine sitting on the benchtop but distorted like crazy when mounted in its case. I traced the problem to an uneven mounting surface that twisted the speaker frame slightly, causing the voice coil to bind.

Finally, when considering the role the system enclosure plays in defining sound fidelity, think about all the time and money speaker manufactures invest in designing baffles. When designing the acoustics for a project I've found a few tricks. For example, a speaker lying in the open on a lab bench looses a lot of its bass response, but placing the speaker magnet down on top of a roll of duct tape that's lying flat on the bench improves the bass response considerably. The same technique (minus the duct tape) also works inside an enclosure--you simply need a cover that seals the back of the speaker. You can form this seal from vacuum-formed plastic, a PVC end cap or a piece of sheet metal. If this inexpensive yet effective trick is still too much trouble, just placing the speaker inside an enclosure also helps.

Message Content -- It's hard to give a lot of concrete advice on this area because it's so dependent on the intended audience. Some designs might only need short warnings ("wind shear"), while others might provide the user with instructions ("Hit any key to continue"). Also, decide on a style guide for the messages. Make sure they always refer to widgets by the same name or acronym. Consider also their tone. Should they make polite requests or issue commands? In many ways the effort you put into creating message content is no different than that for creating sales documents and user manuals.

Message Length -- Whatever the message content, make sure the developers are aware of this constraint.. The most basic limitation is that the message memory must fit into the system's memory map along with everything else the program does. Also be aware that if you’re close to the limit, changing languages can easily push the design over the edge. Finally, if you’re using instructional messages, make sure they’re complete before the user follows the instruction--nobody likes waiting around for a box to finish talking so he can move on the next step.

As you can see, adding voice capability drags into a project a variety of new issues. In the next part of this series I'll go into the details of how I've implemented voice hardware and software in an embedded design. PE&IN



About Us | What We Do | SSM | MiCOS | Search | Designs | Articles

Copyright © 1998-2014 SLTF Consulting, a division of SLTF Marine LLC. All rights reserved.