Critical Exception sig: 9

Go To Last Post
4 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi all,

I'm having a bit of a strange issue with one board out of 16, all are running the same boot loader (u-boot 1.3.1) and same kernel image (2.6.24.3.atmel.3). I'm booting the kernel off tftp and running the rootfs from nfs share.

However, the one particular board gets Critical Exceptions on a semi-regular basis. I've already swapped off the SDRAM thinking that it could be a ram related issue, with no success.

In my mind it points to a hardware issue since 15 other boards running the exact same setup are unaffected. What exactly does the critical exception mean? Is this the code has tried stepping outside of the acceptable memory space, perhaps with a stuck address bit or something like that?

Anyway here is a snip of the error message - one of many but this is a typical one: (this is during the boot sequence)

Mounting virtual filesystems:
  /proc mounted
  /sys mounted
  /dev mounted
  /dev/pts directory made
  /dev/pts mounted
  /dev/shm directory made
Oops: Critical exception, sig: 9 [#1]
FRAME_POINTER chip: 0x01f:0x1e82 rev 2
pc : [<00041502>]    lr : [<00044778>]    Not tainted
sp : 7ff1eabc  r12: 00000000  r11: 0007c23c
r10: 00000000  r9 : 00000001  r8 : 0007a428
r7 : 0007c668  r6 : 0007556c  r5 : 00079d28  r4 : 0007c23c
r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 00000005
Flags: qvnzC
Mode bits: hjmde....g
CPU Mode: Application
Process: S00mountvirtfs [227] (task: 93c11300 thread: 93d5a000)
Killed
  /config mounted
  /tmp mounted
  /var/run mounted
  /var/log mounted

Some things seem to cause it to crap out easier than others, such as running some python scripts.

If anyone's ran into similar issues in the past I'd appreciate any tips. I'm running the mtest on the u-boot, and the next potential hardware step would be to replace the AVR32, not a small undertaking.

Thanks,

James

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

1) Bad routing to the SDRAM device? Not capable of high speed signals.

2) Wrong settings for SDRAM timing in U-Boot?

3) Kernel version are you running?

4) GCC and Binutils version are you using?

Hans-Christian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A bit weird since signal 9 is SIGKILL - can't immediately think how that would be thrown there.

Can you enable verbose BUG() reporting in your kernel (it's in the kernel hacking menu in menuconfig)? You should get a more verbose error report then including a full stack backtrace which might help.

-S.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the input guys!

We think we may have a solution, the ethernet jack seems to have had an intermittent connection issue, if you apply pressure on it at the right angle the problems go away completely. It makes sense as the issues only start showing up once the system starts booting from the nfs share, previous to that everything is ok. Turns out there is a mod on the underside of the jack that required a pin to be snipped and despite this mod from being in place the nub of the snipped pin may have punched thru the tape.

I ran an exhaustive mtest and everything looked good on the memory, along with probing all the SDRAM signals to make sure they look good. It was this that helped me root source the ethernet jack, I had my finger supporting my hand by propping on top of the jack, and mysteriously the board would boot up problem free when I was probing certain lines!

The kernel I'm using is 2.6.24.3.atmel.3 and the gcc is avr32-linux-gcc 4.2.1.

I'll definitely try out the verbose reporting if this fix doesn't work (waiting on a new jack - pulling the old one off kind of wrecked it) and that'll be useful for future debugging.

Thanks!