Voice Recognition

Go To Last Post
31 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

Has anyone ever used voice recognition on an AVR. My boss saw a toy with it now he wants me to incorporate it into our line of electronic locks.

Any ideas welcome

Thanks

James

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

oh poor poor you ;)
one idea would be to record those sounds and then try to compare those two (recorded and incoming). only i somehow think, that 20 MIPS isn't enough for that... ARM would do it better. I saw an appnote on AVR page how to record a sound message with a normal AVR. you might want to give it a look.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not a comment on the technical issues, but I don't think I would trust anything valuable to a voice activated lock.

Usually when I'm showing off for new passengers and tell my car's navigation system to give me directions to drive "home" (a fairly easy word to recognize, I'd think, and in its vocabulary), it instead lights up icons for gas stations. I want to go home, and it wants to go out for a drink.

Unless you're going to plug your lock into a PC for an operator specific training session, I'd guess what your boss is asking about will be an interesting and educational bag o' worms. But I'm just jealous - I'd like someone to pay me to play with it, as long as there were limited expectations of success.

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Limited speech recognition of individual words for a particular speaker would be feasible using a bank of band-pass filters (a couple of op amp chips) and comparators. Early computers like the PET and TRS-80 could manage it with that sort of external hardware by comparing the input frequencies over time with stored templates previously created in a training session.

Leon

Leon Heller G1HSM

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

An external ram on a mega64 or 128 would let you grab one or two seconds of speech. A 128 point FFT runs pretty fast. I think the idea is to slide a window thru the samples and keep taking FFTs, then try to correlate the stored spectra with the acquired ones. You could get the whole thing working on a pc, then try to shoehorn it into whatever would hold it all.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

First thought: job for a DSP ! ;)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Check out Sensory's Voice Extreme modules obviously not AVR but can be interfaced to an AVR if necessary.

http://www.sensoryinc.com/html/s...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Sensory's Voice Extreme modules

These look interesting - their site lists the toolkit at $129, but Digikey shows it as obsolete (Sensory shows Digikey as a distributor). Of course they may be talking about two different things.

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

zbaird wrote:
I want to go home, and it wants to go out for a drink.

If I was with you and you planned on taking me home and shoving me in the dark garage for the night, I might suggest we go out for a drink first, too =)

Clancy _________________ Step 1: RTFM Step 2: RTFF (Forums) Step 3: RTFG (Google) Step 4: Post

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was looking at Sensory's Voice modules and panasonic do an IC too called the Hello IC. I just wanted to see if anyone had played with it before.

zbaird i agree about it cant be too secure, i think he just wants me to build a working prototype to show off our new locks. I need to get the base models done first, using a keypad, I-button, RFID and maybe finger print.

I must say i do love this part of the job its the getting ready for production part I hate.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I finally heard back from Sensory - the Voice Extreme is indeed obsolete. They recommend the VR Stamp - development kit $350, module by itself (a 40 pin chip) $40.

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I will simply make some comments which are meant as, "food for thought", rather than a specific answer.

In many if not most/all instances the only additional hardware required on older 8-bit computers like the PET, TRS-80, Apple-II, Atari, C= 64, were simple A to D converters. It is really not hard to do Voice Recognition although to do so you need to understand one of the basic tenants of Engineering and that is the “art of compromise”. In short, you can build a basic non-speaker dependant VR system quite easily with only a little bit of memory (both code and working ne:RAM/EEPROM etc) although you may suffer from false triggers or false positives as a consequence. The flip side of this is that you can also develop a highly “speaker specific” voice recognition system with a light code/working overhead, although in this case you may get false negatives with the same speaker/person.

This is where the "art of compromise" needs to be applied. You need to decide if you will place a higher importance on accuracy of the individual (which may mean that the same person will trigger false reads if they have a cold, or simply say the word slightly differently) or are you more concerned with “what” they are saying, in which case you try to trigger on the words they are saying rather than any infliction or anything specific to the way they say the words.

Ultimately it all comes down to the sample rate of the speech and how heavy handed you get with the comparison to the reference (stored) image of that speech. After this anything else is simply an exercise in academic wanking.

Explaining this in detail would require a great deal of time which I don’t have, although I do remember that there were a few articles printed in Byte and Circuit Cellar (eons ago - probably before half of you were born) which not only explained the basic theories but provided working examples. Thanks to Google, I have been able to locate a reference to one of the CC articles which from memory shows how you can do this in a lowly 6085 with very small amounts of ROM (1240 bytes) and RAM (64 bytes).

http://www.circuitcellar.com/pas...

Hopefully this link will steer you in a direction which may be of assistance.

Cheers, Brenton

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

6805... other than that, I agree totally

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for all your helps guys, Brenton_S15 i have found that web page very helpful and quite an intresting read. I'll let you all know how i get on with this project.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

Just a note about the Sensory Direct modules. I've used them (I've got one of the original ones that's now obsolete), and it did work pretty well. You can program them in either "speaker dependent" or "speaker independent" mode. The first is designed for use as access control, the second is designed for use as general voice-activated control.

They work surprisingly well, but do sometimes require a few tries and lots of tweaking with the microphone in the right spot. However fun to play with!

-Colin

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wonder exactly what kind of locks James was talking about in the first post? I have visions of someone stood on their front doorsteep, in a howling gale, with driving rain, screaming "let me in your bastard" but all to no avail :lol:

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I wonder exactly what kind of locks James was talking about in the first post? I have visions of someone stood on their front doorsteep, in a howling gale, with driving rain, screaming "let me in your bastard" but all to no avail Laughing

"I think you said "Banstead", is this correct?"

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

"Open the pod bay door, Hal"

Chuck Baird

"I wish I were dumber so I could be more certain about my opinions. It looks fun." -- Scott Adams

http://www.cbaird.org

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have some old Antic and Analog magz from My 8-bit Atari days, I have found a lot of useful code in them(if it runs on a 2 MHz 6502 it will fly on the AVR) I seem to recall messing around with some voice recognition reading the analog paddle port(8-bit ADC). If you want I can look for the code for you. The 6502 only had two registers and an "Accumulator" that stored the result of operations.(yes I was a very odd 12 year old, but the Hood I grew up in was a great incentive to stay indoors)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Trollicus_Rex wrote:
... but the Hood I grew up in was a great incentive to stay indoors)

Is Hood a town? Or did your parents dress you oddly?

Ross McKenzie ValuSoft Melbourne Australia

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

goujam wrote:
Hi,

Has anyone ever used voice recognition on an AVR. My boss saw a toy with it ...

James,

I have read the whole thread....but you say your boss has already got it in a toy. Steal the toy from him and do an autopsy....evaluate its performance. Notwithstanding the good suggestions here already, surely checking what out is available now could be useful also.

Ross McKenzie ValuSoft Melbourne Australia

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry Ross, Hood as in Neighborhood.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was looking at this code looks fairly simple.
http://www.eecg.toronto.edu/~aam...
It looks like uses a lot of floating point, I have only used assembly with the AVR, what's the floating point support like with GCC?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Trollicus_Rex, Ill have a look through the code see if i can borrow any of it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Trollicus_Rex, Ill have a look through the code see if i can borrow any of it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Trollicus_Rex wrote:
what's the floating point support like with GCC?

Reasonably good (within the 32 bit 'float' limit) but it's imperative that the code links against libm.a - this will be the case if an Mfile generated Makefile is being used but it will NOT be the (default) case if a project inside AVR Studio is used and that would need fixing by moving libm.a from the left to the right pane under the library configuration.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Clawson, I was looking at your pic, -2,-3 why so negative? Feeling down?
:(

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh if you follow the link at the bottom of the page from that last link I posted, someone made a voice recognition project with a mega32.
http://instruct1.cit.cornell.edu/courses/ee476/FinalProjects/s2006/XL76_SL362/XL76%20SL362/index.html

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Trollicus_Rex wrote:
Clawson, I was looking at your pic, -2,-3 why so negative? Feeling down?
:(

That's a good point, maybe the X and Y axes should be inverted? Maybe the designers of the "test" projected their own politcal tendencies into the positive areas.

Four legs good, two legs bad, three legs stable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
VOICE RECOGNITION SECURITY SYSTEM

ECE 476 Spring 2006


http://instruct1.cit.cornell.edu...

The approach seems to be the same as in an article I have filed somewhere that did recognition on a quite small microcontroller. Gotta dig that out...

Found it online. Search for "tiny voice" or "tinyvoice"; Brad Stewart, Moto HC705.
http://ca.geocities.com/xxxtoyte...

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Looks like a bunch of digital fir lo pass filters... thats what they call a vocoder. Dont like the ir block diagram. Here's what I'd try...take 8000 8 bit samples at 8KHz (1 sec utterance), thats 8 samples per ms. If you took 64 time points (8ms window) and did an fft, youd have 32 freq points. Now slide the window down 8ms and get another 32 freq points. Repeat until window has slid along the whole word. This is the 'trained' or 'learned' word. The cleverness and intelligence comes in comparing the test word against this data... you have dozens of sets of spectra, all 8ms apart. MP3 throws out all spectra 10 db below the loudest peak in a sample... this is evidently whats important in that 8ms hunk... so you keep the loudest spectra in each window to compare against.

Imagecraft compiler user