Off-the-shelf hardware and multimedia bring voice output within easy reach
Scott Rosenthal
December, 1996
As discussed in my last column (reference),
deciding to put a voice into a product involves many decisions, with almost none of them
involving the technology. After working your way through these implementation issues, it's
time to tackle the actual hardware and software. This column reviews how to implement
voice output in an embedded project.
Troubled specialization
Every embedded project always places constraints on what you can design into the
device. For example, everyone ideally wants to avoid sole-source parts. Likewise, if at
all possible, it's smart to avoid spending money on specialized development tools.
Unfortunately, neither goal is possible if a designer decides to use commercial
speech-processor chips.
A number of companies make chips for voice applications. Some chips speak phonemes,
others record voices and store them into a memory device, while others speak from encoded
files in memory. Some chips sound very mechanical (phoneme speech) and some sound quite
good (encoded files in memory). However, as far as I can determine, none have second
sources. Also, because these parts are specialty items, there's no guarantee that they'll
be available even a year from now.
The other problem with these devices concerns their proprietary development systems.
For example, one manufacturer whose device at first seemed reasonable to use requires a PC
running the Japanese version of DOS to run its software. Other chip manufacturers require
a designer buy (no renting allowed) their development systems for $10k or more. To me,
this amount seems like an awful lot of money to support one small facet of a product.
Still other companies don't offer development systemsyou send them the audio
information, and they program it into their chips. The catch is outrageous engineering
costs and the massive chip volumes they require.
After exploring these scenarios, I decided to implement a solution with truly
off-the-shelf parts that allowed for easy and inexpensive development and made for
low-cost second-source production.
Voice on a budget
The solution was to implement the voice system using a standard D/A, an audio amp and
EPROM. To generate speech, the system fetches a byte of data from memory and writes it to
the D/A. The audio amp pumps up the converter's output to the point where it can drive a
speaker. This approach allows for easy second-sourcing of the components while keeping
down both parts and development-tool costs. For the remainder of this column I'll describe
how to implement this homebrew design, including some of the tradeoffs made to keep
production happy.
The first step was to generate data for the system to play. I decided to store data
using the .wav format for two reasons. First, it's easy to record audio using this format
with a multimedia PC. In fact, the standard recording application that comes with Windows
works just fine for simple voice applications. Second, the format uses PCM encoding, which
requires no data decompression. I recorded the 8-bit data file at 11.025 KHz.
This choice, though, also caused the first problem I had to overcome. Have you ever
tried to generate an 11.025-KHz clock? This frequency isn't an even factor of any common
microprocessor clock frequencies. For example, assuming a standard 12-MHz clock,
youd have to divide the system clock by 1088.4353 to get the right value. What's
more, if using a custom clock frequency such as 11.995 MHz to derive the proper audio
clock, you'll need to add an oscillator to derive the standard serial-communication data
rates. In my opinion, the easiest solution is to simply forget the 0.4353 and divide the
standard clock by 1088. The result is an audio clock of 11.029 KHz with a frequency error
of 0.04%an amount far too tiny for the human ear to detect.
The next item the speech circuitry needs is D/A conversion. This stage actually
consists of two parts. The first turns the digital audio into an analog waveform, whereas
the second controls the volume. When reconstructing the audio, an important point to
remember is that PCM encoding centers the recording around the D/A's halfway pointin
this case 80H. Hence, the audio signal effectively contains a DC bias equal to half the
converter's dynamic range. You must remove this bias before sending the analog signal to a
power amplifier; failure to do so results in a system that generates little speech but
lots of smoke! In my finished system, the audio-output circuitry consists of a simple
2-pole filter to remove sampling artifacts and a series capacitor to AC-couple the signal
into the power amplifier.
The second part of this conversion is volume control. A simple way of accomplishing
this task is to add a pot at the power amp's input. Adjusting the pot changes the drive
level into the amplifier, thereby changing volume. However, if a computer is supposed to
control the output volume this simple approach won't work. In such cases I build the
speech system using a dual D/A. The first stage works as just described except the
reference voltage comes from the second stage. As program code changes the reference, so
does the output volume.
Cranking it out
With all the support stuff in place, you need a way to send data from memory to the
D/A. One of my favorite techniques is DMA because the DMA controller continuously moves
the voice data to the converter at the system clock frequency youre using without
computer intervention. For example, assume that the microprocessor integrates a DMA
controller as does the 80C188EC. This chip's controller can transfer as many as 65,536
bytes without host intervention. To play a message, my program sets up a starting address
in memory as the source, an I/O port as the destination and the message length as the
transfer size. The program then starts the DMA controller and resumes its normal business.
At the data transfer's conclusion, the DMA controller generates an interrupt. The
associated ISR either terminates the controller or sets it up to play the next part of the
message. In this way, with minimal overhead, the microprocessor can easily play audio
messages you record using standard off-the-shelf hardware.
A DMA controller isn't mandatory, though. It's still possible to crank out audio data
with processors such as the 8051, for which DMA just doesn't exist. Although it's
trickier, I've found a technique that works beautifully. I set up a timer to interrupt the
processor every 90 µsec (the period of the 11.025-KHz sample clock). In the ISR, which is
coded in assembly for speed, the processor dumps the next audio byte to the I/O port,
updates its pointer to the next data and exits. This technique ensures that data going out
the I/O port is spaced in time by 90 µsec regardless of the processing time within the
ISR. Using this method on a 12-MHz 8051, I've gotten ISR execution time down to less than
45 µsec, which then gives me approximately 50% of the processor's bandwidth for the rest
of that application. If that's not enough time, a few other options include increasing
clock frequency, changing to a processor that uses fewer states/instruction or adding a
glue logic to decrease the ISR's responsibilities even further.PE&IN
Rosenthal, S, "The road to loquacious instrumentation is
rough but passable," PE&IN, Oct 1996, pgs 72-74.
Adapted from an article that appeared in Personal Engineering & Instrumentation
News.
Return to the article index.
|