February
1998, Issue 91
Low-Cost
Voice Recognition
FUTURE
TINY ENHANCEMENTS
Naturally,
there are ways to improve the system. I was surprised
by the HC05’s speed. I also wound up with at least 200
bytes of leftover ROM for more code. Tiny Voice’s code
is modular, and updates can be easily added.
I
can increase the EEPROM capacity to 1 or even 2 KB.
This size would provide more template storage or allow
for more frame features to better resolve differences
in speech patterns.
I’d
also like to add some fuzzy logic to the pattern-matching
algorithm to improve recognition accuracy and the rejection
criteria.
Adding
a serial port instead of push buttons and LEDs could
reduce cost and add more functionality. Threshold values
could be changed, templates uploaded and downloaded,
and so on.
I
want an MCU-controlled gain adjustment on the input
for different microphone levels and background noise.
Another
improvement would be to add a dynamic time warp (DTW)
algorithm to the pattern-matching routine. The DTW takes
into account slight variations on how each word is pronounced—in
particular, variations in lengths of phonemes.
But
with only 200 bytes of code space left over, adding
a DTW would be challenging. A first-order approximation
may be achievable, however.
I’d
rather use C than assembly language. When I started
this project, I knew squeezing this functionality into
1200 bytes would be tough. So, a high-level language
was out of the question.
Since
then, I’ve had the opportunity to try out a C compiler
from Byte Craft. The good news is, it generates small
enough code. The bad news: I wish I’d used it earlier.
And
as a final wish, I would like to use a different processor.
Of all these improvements, this one is probably the
best. You can now get equivalent MCUs with built-in
ADCs, which would provide more elaborate signal processing
and better noise rejection.
One
of the best candidates for a low-cost system is the
Sharp SM8500 8-bit MCU. It has almost everything you
need for an embedded voice-command system, including
a 10-bit ADC (8 channels) and an 8-bit DAC, which is
useful for voice feedback and verification.
The
SM8500 features SIO and UART ports to communicate with
other system devices, 2 KB of internal RAM, as well
as internal ROM and the ability to access external ROM
or RAM. It also offers 80+ I/O pins for keypad and display
interfacing, hardware multiply and divide, and a 250-ns
instruction cycle time. And, it costs under $3.
If
you’re willing to spend a bit more, then a new level
of performance may be realized. New 32-bit RISC MCUs
are becoming available in the sub $15 or even sub $10
range.
For
example, the Sharp ARM710M RISC processor, running at
a conservative 16 MHz, performs a complete FFT-Mel-Cepstrum
analysis using only 50% of the processor’s resources.
With
the ability of RISC processors to address large amounts
of memory, you have the ingredients to put together
a dictation system like the one I’m using now. And,
it can run off a couple pen-light batteries!