Record voice from microphone: what is the quality?

Go To Last Post
9 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I need to acquire and save to a non-volatile memory a voice message (speech) from a microphone. The solution should be minimized as number of components, PCB space and cost.

It's a speech application, so 8kHz/8bit is more than sufficient and the quality expected should be similar to a phone call.

I already read the AVR335: Digital Sound Recorder with AVR and DataFlash and it's a very good starting point.
In that AN the analog signal from microphone is acquired with internal 10-bit ADC and reduced to 8-bit samples with a simple operation (truncation).
My question is: what could be the quality of the recorded audio signal with this approach?

I know complex circuits with microphone input have AGC amplifier in order to compensate different distances and volume levels of different speakers. I don't need high-fidelity audio, but I'd like to hear the message such that it will result understandable (similar to a phone call quality).

The speaker should always speak at the same distance (15-20cm) and with a high volume, but as you can understand the effect of different speakers will be different.

Is the simple approach described in AN from Atmel suitable with my application?
Do you suggest other low-cost solutions?

Last Edited: Fri. Feb 9, 2018 - 10:54 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The signal to noise is 6dB per bit. The 10 bit sample is 0-1023, but its really 0 to +-512, so its 9 bits, 54dB sig to noise. There is a sort of cool trick to doing a realtime automatic gain control. You have 9 bits, you want 7, so if 'some number of samples' in the buffer have an 8th bit set, shift all samples right one bit, remember that you have 6dB gain reduction. If he talks louder and you still have a 9th bit set, shift 2 bits. The 8 bit samps are about 42dB sig to noise. Worse than AM radio. Worse than vinyl. Phones use ulaw and alaw. 13 or 14 bits in an ersatz floating point format (5 bits of fraction and a 3 bit exponent). Lots of range, but even less s/n.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bobgardner wrote:
The signal to noise is 6dB per bit. The 10 bit sample is 0-1023, but its really 0 to +-512, so its 9 bits, 54dB sig to noise.

I can't understand why the sign bit causes a decrease in signal to noise ratio, anyway it's not important for our discussion.

bobgardner wrote:

There is a sort of cool trick to doing a realtime automatic gain control. You have 9 bits, you want 7, so if 'some number of samples' in the buffer have an 8th bit set, shift all samples right one bit

This won't be so simple, because I need to have all the samples to see if there is one or more that have the MSB set. The message will be too long to keep in internal RAM, I need to save it in an external Flash at run-time during recording.

I can use a trick: I can start storing 8-bit samples until I encounter the first sample with MSB set. From that time on, I store samples shifted to the right. At the end, I save the index number of sample, if any, from which the shifting started.

During playback, I start outputting 8-bit samples as they are in memory until the index number signed as the starting point of the shifting.

What do you think?

bobgardner wrote:
, remember that you have 6dB gain reduction. If he talks louder and you still have a 9th bit set, shift 2 bits. The 8 bit samps are about 42dB sig to noise. Worse than AM radio. Worse than vinyl. Phones use ulaw and alaw. 13 or 14 bits in an ersatz floating point format (5 bits of fraction and a 3 bit exponent). Lots of range, but even less s/n.

So I can expect, from this approach, a quality similar to a phone call... right?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You need to do this by incremental piecewise refinement. First get the serial port working so you can put up a menu. 1) gen 10 bit sine in buf 2) play 10bit buf 3) fill buff from a/d with gain=1 3a) fill buff from a/d with gain=10 4)fill and play in a loop using 2 buffers 5)record to sd card double buffered instead of playing 6) playback from sd card. 7) agc on/off. etc. Do you want 4K freq resp or 8K freq resp? (use 8k or 16k sampling rate)

As for the max to min db calc, you dont compare the plus max 0f 1023 to the neg max of 0, you cvt 0-1023 to -511 to 0 to 511, then the max to min db calc is 20log10(511).

You can record on a pc, save to sd card to test the playback on the avr.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

bobgardner wrote:
You need to do this by incremental piecewise refinement.

If I start from 10-bits samples (ADC with a resolution of 10 bits), I have two index numbers (N and M) that describe the message:
    samples [0..N-1] are 0..255; samples [N..M-1] are in the range 0..512 (sample N is at least 256);
    samples [M..] are in the range 0..1023 (sample M is at least 512).
Of course, N and M could assume a magic value to indicate the corresponding interval isn't encountered (N=M=-1 if all the samples are in the range 0..255).

During recording, samples in the first interval will be saved as 8-bit samples (adc & 0xFF); samples in the second interval will be saved as 8-bit samples after right shifting by 1 bit ((adc >> 1) & 0xFF); samples in the last interval will be saved as 8-bit samples after right shifting by 2 bits ((adc >> 2) & 0xFF).

If playback is performed with an 8-bit PWM, I can do the following:
- if ((N==-1) && (M==-1)), all the samples are output as they are;
- else if ((N!=-1) && (M==-1)), samples [0..N-1] are output right shifted by 1 bit and samples [N..] are output as they are;
- else if (M!=-1), samples [0..N-1] are output right shifted by 2 bits, samples [N..M-1] are output right shifted by 1 bit, samples [M..] are output as they are.

bobgardner wrote:
First get the serial port working so you can put up a menu. 1) gen 8 bit sine in buf 2) play 8 buf 3) fill buff from a/d 4)fill and play in a loop using 2 buffers 5)record to sd card double bufferd instead of playing. etc. Do you want 4K freq resp or 8K freq resp? (use 8k or 16k sampling rate)

4k frequency response, so I'll use 8kHz sampling frequency.

bobgardner wrote:
As for the max to min db calc, you dont compare the plus max 0f 1023 to the neg max of 0, you cvt 0-1023 to -511 to 0 to 511, then the max to min db calc is 20log10(511).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

My question is: what could be the quality of the recorded audio signal with this approach?

I've basically done exactly what you ask about. With little more than an amplified mic (to about a 1Vpp signal into the ADC) and a directly connected 8 Ohm speaker on a PWm pin (yeah I know Ohm's Law says that current is too high!) it gave pretty respectable results. Around "telephone quality" as an 8kHz sample (4kHz audio range) would suggest.

To know what 8kHz quality is like go here:

http://en.wikibooks.org/wiki/A-l...

In the "yellow bit" of that page are three samples of the same thing at 44khZ, 22kHz and 8KHz. Now I don't claim to have "bat ears" but try as I might I have a pretty hard time hearing much degradation in the 8kHz version there!

One key thing I do know is that on play back "audio chambers" are key. Start with a small speaker on the end of a wire and play back some audio into it. Now take a plastic cup (empty!) up-turn it and put it over the speaker and play the same sound - astronomically better. I worked for a company that used to make telephones (amongst many things) and the design of the plastics around the speaker for "hands free" was absolutely key for making the difference between "two bean cans on a piece of string" and pretty respectable audio quality.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you want to record to an sd card, you want a 512 byte sector all ready to blast into the card. How fast does it write? 4ms for 512 bytes? Thats 125K per sec. No prob. So you need 2 buffers for the input samples, then you process thos to a couple of 512 byte output buffers. So 3K altogether. Use a mega1280. Has 4K of ram.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you design your own analog front end, then note that op-amps are generally easier to use than discrete transistors. You typically need a fair amount of gain, and a LPF or BPF as part of the front end.

Note that you need a LPF somewhere in your analog front end to attenuate higher frequency components to the input signal. The mic is likely to have a bandwidth well above your 4 KHz desired bandwidth.

Laying out a decent quality analog amplifier on a digital PCB can sometimes be a challenge.

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Looks like this was awoken for spam that has then been deleted so I'll lock this.

Topic locked