Forum Menu




 


Log in Problems?
New User? Sign Up!
AVR Freaks Forum Index

Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Author Message
kniight
PostPosted: Jun 13, 2012 - 02:41 PM
Newbie


Joined: Feb 24, 2007
Posts: 6
Location: Budapest

Hi all,

I have some problem with ALSA and the AC97C controller.
I would use it in a voice over ip project, customized software and customized hardware.

The problem is:

When the cpu is having continuous network traffic, like an ftp transfer,
long playbacks give the following message from the kernel:

Code:
Jun  1 16:07:39 avr32 user.debug kernel: ALSA sound/core/pcm_lib.c:1765: playback write error (DMA or IRQ trouble?)


After this message, playback is not possible, but I can change the volume, switch GPIO pins on the codec chip.

Sometimes this message appears after 10 minues of playback, sometimes after more than an hour.
Pinging or flood pinging during the voip transfer does not have this effect.

Tried it with different kernels from 2.6.35 to 3.2, both as module and compiled in, but the problem still exists.

The first hardware is an NGW100 v1 with CS4299 codec chip,
the second one is a custom made CPU board with AP7001 cpu, an SPI ethernet controller and the same codec.
The problem exists on both hardware.

Code:
cat random10Mbyte.dat > /dev/dsp
also makes this error to happen in a few ten minutes. (with the long FTP transfer in the background)
Playing with aplay makes this error to happen.
We have tried both synchronous and asynchronous playback algorithm,
every possible error checks, and a lot of (pre)buffering, buffer level checking, but the problem still exists.
We think that the problem is in the kernel somewhere.

We noticed, that the interupt count is zero for the AC97C in the /proc/interrupts file everytime.
Is it normal not getting interrupts from AC97C ?

I played with the wait time around
Code:
tout = schedule_timeout(wait_time);
for two reasons:
a - perhaps to fix my problem
b - to force some interrupt from the AC97C to prove that the interrupt handling is OK.
I had no success, got no interrupts and the problem stayed.

I looked around in the ac97c.c and the lib_pcm.c sources, and I did not find any obvoius errors leading to my problem.
The interrupt bit initialization seems to be good, the interrupt handler seems to be initialized correctly.
Understanding alsa's internal logic is far beyond my knowledge and capabilities.

Please help!
Thanks
 
 View user's profile Send private message  
Reply with quote Back to top
hce
PostPosted: Jun 13, 2012 - 03:55 PM
Raving lunatic


Joined: Jan 07, 2003
Posts: 4580
Location: Oslo, Norway

The AC97C driver uses the DMAC, which is where the data is sinked and sourced, you should see all the interrupts fire in the DMAC module.

Interrupts in the AC97C driver is only used if you hook up to a WM971x AC97 codec with resistive touch controller.

I don't know what locks up in the DMA engine, but I've seen issues in it before, though I though they were addressed at some point before 3.x kernels.
 
 View user's profile Send private message  
Reply with quote Back to top
kniight
PostPosted: Jun 18, 2012 - 11:56 AM
Newbie


Joined: Feb 24, 2007
Posts: 6
Location: Budapest

Hi,

I did some debugging on the dma engine:

First I enabled the dynamic debugging for the dw_dmac module (the verbode debug is compiled in too):
Code:
 echo 'file dw_dmac.c +p' > /sys/kernel/debug/dynamic_debug/control

(All the other debugs are disabled.)

During playback (with cat /dev/urandom > /dev/dsp) I get theese messages regularly:
Code:
...
Jun 18 12:10:42 avr32 user.debug kernel: dw_dmac dw_dmac.0: tasklet: status_block=2 status_err=0
Jun 18 12:10:42 avr32 user.debug kernel: dma dma0chan1: new cyclic period llp 0x13ce7ae0
Jun 18 12:10:42 avr32 user.debug kernel: dw_dmac dw_dmac.0: interrupt: status=0x2
Jun 18 12:10:42 avr32 user.debug kernel: dw_dmac dw_dmac.0: tasklet: status_block=2 status_err=0
Jun 18 12:10:42 avr32 user.debug kernel: dma dma0chan1: new cyclic period llp 0x13ce7a80
Jun 18 12:10:43 avr32 user.debug kernel: dw_dmac dw_dmac.0: interrupt: status=0x2
Jun 18 12:10:43 avr32 user.debug kernel: dw_dmac dw_dmac.0: tasklet: status_block=2 status_err=0
Jun 18 12:10:43 avr32 user.debug kernel: dma dma0chan1: new cyclic period llp 0x13ce7a20
Jun 18 12:10:43 avr32 user.debug kernel: dw_dmac dw_dmac.0: interrupt: status=0x2
Jun 18 12:10:43 avr32 user.debug kernel: dw_dmac dw_dmac.0: tasklet: status_block=2 status_err=0
Jun 18 12:10:43 avr32 user.debug kernel: dma dma0chan1: new cyclic period llp 0x13ce7c00
Jun 18 12:10:43 avr32 user.debug kernel: dw_dmac dw_dmac.0: interrupt: status=0x2
Jun 18 12:10:43 avr32 user.debug kernel: dw_dmac dw_dmac.0: tasklet: status_block=2 status_err=0
Jun 18 12:10:43 avr32 user.debug kernel: dma dma0chan1: new cyclic period llp 0x13ce7ba0
Jun 18 12:10:43 avr32 user.debug kernel: dw_dmac dw_dmac.0: interrupt: status=0x2
Jun 18 12:10:43 avr32 user.debug kernel: dw_dmac dw_dmac.0: tasklet: status_block=2 status_err=0
Jun 18 12:10:43 avr32 user.debug kernel: dma dma0chan1: new cyclic period llp 0x13ce7b40
Jun 18 12:10:44 avr32 user.debug kernel: dw_dmac dw_dmac.0: interrupt: status=0x2
Jun 18 12:10:44 avr32 user.debug kernel: dw_dmac dw_dmac.0: tasklet: status_block=2 status_err=0
...

Then suddenly the noise stops...
The log shows nothig for a few seconds, then...
Code:
...
Jun 18 12:10:44 avr32 user.debug kernel: dw_dmac dw_dmac.0: interrupt: status=0x2
Jun 18 12:10:44 avr32 user.debug kernel: dw_dmac dw_dmac.0: tasklet: status_block=2 status_err=0
Jun 18 12:10:44 avr32 user.debug kernel: dma dma0chan1: new cyclic period llp 0x13ce7a20
***THIS IS WHEN THE SOUND STOPPED***
Jun 18 12:10:54 avr32 user.debug kernel: ALSA sound/core/pcm_lib.c:1765: playback write error (DMA or IRQ trouble?)
Jun 18 12:11:07 avr32 user.debug kernel: ALSA sound/core/pcm_lib.c:1765: playback write error (DMA or IRQ trouble?)
Jun 18 12:11:17 avr32 user.debug kernel: ALSA sound/core/pcm_lib.c:1765: playback write error (DMA or IRQ trouble?)
...

Then I hit CTRL+C for the playback...
Code:
...
Jun 18 12:11:07 avr32 user.debug kernel: ALSA sound/core/pcm_lib.c:1765: playback write error (DMA or IRQ trouble?)
Jun 18 12:11:17 avr32 user.debug kernel: ALSA sound/core/pcm_lib.c:1765: playback write error (DMA or IRQ trouble?)
***THIS IS WHEN I HIT CTRL+C, BUT SOME TIME PASSES***
Jun 18 12:11:24 avr32 user.debug kernel: dma dma0chan1: cyclic free
Jun 18 12:11:24 avr32 user.debug kernel: dma dma0chan1: moving desc 93ce7c00 to freelist
Jun 18 12:11:24 avr32 user.debug kernel: dma dma0chan1: moving desc 93ce7ba0 to freelist
Jun 18 12:11:24 avr32 user.debug kernel: dma dma0chan1: moving desc 93ce7b40 to freelist
Jun 18 12:11:24 avr32 user.debug kernel: dma dma0chan1: moving desc 93ce7ae0 to freelist
Jun 18 12:11:24 avr32 user.debug kernel: dma dma0chan1: moving desc 93ce7a80 to freelist
Jun 18 12:11:24 avr32 user.debug kernel: dma dma0chan1: moving desc 93ce7a20 to freelist
...

Retrying playback makes no sound:
Code:
...
Jun 18 12:11:44 avr32 user.debug kernel: dma dma0chan1: scanned 0 descriptors on freelist
Jun 18 12:11:44 avr32 user.debug kernel: dma dma0chan1: scanned 0 descriptors on freelist
Jun 18 12:11:44 avr32 user.debug kernel: dma dma0chan1: scanned 0 descriptors on freelist
Jun 18 12:11:44 avr32 user.debug kernel: dma dma0chan1: scanned 0 descriptors on freelist
Jun 18 12:11:44 avr32 user.debug kernel: dma dma0chan1: scanned 0 descriptors on freelist
Jun 18 12:11:44 avr32 user.debug kernel: dma dma0chan1: scanned 0 descriptors on freelist
Jun 18 12:11:44 avr32 user.debug kernel: dma dma0chan1: cyclic prepared buf 0x13cf8000 len 24576 period 4096 periods 6
Jun 18 12:11:54 avr32 user.debug kernel: ALSA sound/core/pcm_lib.c:1765: playback write error (DMA or IRQ trouble?)
...

My problem is, that there is no specific error message before the sound playback stops.
It just stops without any previous warning, and I can not find the cause.
It just happens.

Hope somebody can help me!
 
 View user's profile Send private message  
Reply with quote Back to top
Kevin1234
PostPosted: Jan 14, 2013 - 06:55 PM
Newbie


Joined: Jan 14, 2013
Posts: 1


Hi,

I run into the same issue, did you find a solution to this problem?

Thanks
 
 View user's profile Send private message  
Reply with quote Back to top
hce
PostPosted: Jan 15, 2013 - 08:07 AM
Raving lunatic


Joined: Jan 07, 2003
Posts: 4580
Location: Oslo, Norway

AFAIK this has been fixed upstream now, tried the latest kernel release?
 
 View user's profile Send private message  
Reply with quote Back to top
kniight
PostPosted: Jan 15, 2013 - 09:19 AM
Newbie


Joined: Feb 24, 2007
Posts: 6
Location: Budapest

Kevin1234 wrote:
Hi,
I run into the same issue, did you find a solution to this problem?


I am yet not 100 percent sure about the solution, but I have a theory.

I think this is a problem with clock instability.

We checked the 20MHz Xtal with a digital oscilloscope, and there is noise on the clock signal. Sometimes this noise is a big spike or glitch, in the same range as the clock signal, so it might trigger the clock circuity with an extra pulse. This pulse will unlock the pll, and until the pll is not locked again, the cpu is temporarily frozen, all internal clocks are disabled (see the PLL description in the Power Manager chapter of the datasheet).

According to the AP7001 documentation (doc32015.pdf, section 26.6.2 on page 446) the AC97 Controller requires that it's periperhal clock frequency must be higer than the AC97 codec's 12.2 MHz bit clock.
This is generally OK , because the controller gets the clock of the PBB bus, usually above 60 MHz.

During playback when the pll unlock happens, there will be a short period of time when there is no clock on the PBB bus and the Controller. But the Codec still goes on from it's own clock source. This might lock out some hardware gates in the circuit wich attaches the two clock domains. (That's my theory)

Unfortunately we did not found a software solution to restart or reinitialize this failing part. We tried every kind of hacking in the kernel and user space, closing, reopening, module unload, dumped registers (of the DMACA, the AC97C and the codec) before and after playback, before and after this lockout. The DMACA channels are OK, the AC97C was completely switched off, on and reconfigured, codec was reset via gpio pin, but no luck. Every register has the same values if it is working and if is not working. We have weeks (possibly months) of pure man-hours work finding this out and trying to find a solution. Every register has the same values if it is working and if is not working.

The low impedance of grounding is very important. There is an application note about the power supply for the oscillators and the pll. The NGW100 design does not meet this! Power supply stability/noise might also be a problem. Our tests how that under heavy load (=more noise) or using other external devices like IO extenders, SPI ethernet (=also more noise) the problem comes quicker!

A possible solution might be to change the xtal to an external, stable oscillator (XCO).
We tried this method, and are still testing, the results are promising. We have devices which before were deaf after 5 minutes, after the xtal/XCO change we reached 70 hours of continous playback. If we reach one week, I think we have found a solution. Smile

Some words about the XCO.
IMPORTANT! The XIN clock input is NOT 3.3V tolerant!

First we tried a DIP14 compatible, NSK 20.0000Mhz XCO with 3.3 supply, and we connected the XCO output to the cpu XIN thru an 270 ohm resistor. It works fine.

For our production series we are trying "7C-20.000 MBB-T" from TXC, which can operate from 1.8V.

We tried first this XCO with 1.8V supply and output directly connected to cpu. We found out that the amplitude is not big enough, and the noise still can unlock the pll. Unlocking is less likely, got hours of playback.

Now we are running the tests with this TXC, with 3.3V supply and an 220ohm series with the clock signal. This configuration is very promising. We also added an 15pF between the XCO gnd and the XCO output, but it does not make a difference.

It is important to connect the oscillators GND close to the CPU analog ground.

All these modifications were done on our CPU board which is based on the AP7001 cpu and was designed based on the NGW schematics. Until now we did not modify an NGW 100, but we are planning to it a bit later.

If you would like to play with the registers I can give you patches.

Are you doing a hobby project, or developing an application?
 
 View user's profile Send private message  
Reply with quote Back to top
kniight
PostPosted: Jan 15, 2013 - 09:27 AM
Newbie


Joined: Feb 24, 2007
Posts: 6
Location: Budapest

hce wrote:
AFAIK this has been fixed upstream now, tried the latest kernel release?


I tried a fresh 3.x kernel as you suggested in June or July. We had no luck, just the same problem. NGWs have this pll unlock a bit less frequently.
 
 View user's profile Send private message  
Reply with quote Back to top
kniight
PostPosted: Jan 23, 2013 - 09:53 AM
Newbie


Joined: Feb 24, 2007
Posts: 6
Location: Budapest

kniight wrote:
Kevin1234 wrote:
Hi,
I run into the same issue, did you find a solution to this problem?


I am yet not 100 percent sure about the solution, but I have a theory.

I think this is a problem with clock instability.

...

Now we are running the tests with this TXC, with 3.3V supply and an 220ohm series with the clock signal. This configuration is very promising. We also added an 15pF between the XCO gnd and the XCO output, but it does not make a difference.


Some update: Two of our device passed 4 days continous playback, an another two devices 2 days. All tests were stopped manually. The devices use Chrystal oscillators powered with 3.3V, and using the 220 ohm and 15pF filter cap.
We are using kernel 2.6.35.
 
 View user's profile Send private message  
Reply with quote Back to top
Display posts from previous:     
Jump to:  
All times are GMT + 1 Hour
Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Powered by PNphpBB2 © 2003-2006 The PNphpBB Group
Credits