"BUG: soft lockup" using sockets

Go To Last Post
7 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have an NGW100 dev kit, and I'm trying to get the avahi daemon running.

Using the latest rc4 buildroot, it compiles without problem, but when it runs, I get a "BUG: soft lockup".

Below is the final part of the strace output.

Any ideas ?

$ strace /usr/sbin/avahi-daemon --debug

...

socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 10
setsockopt(10, SOL_IP, IP_MULTICAST_TTL, "\377", 1) = 0
setsockopt(10, SOL_IP, IP_TTL, [255], 4) = 0
setsockopt(10, SOL_IP, IP_MULTICAST_LOOP, "\1", 1) = 0
bind(10, {sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
setsockopt(10, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(10, SOL_IP, IP_PKTINFO, [1], 4) = 0
setsockopt(10, SOL_IP, IP_RECVTTL, [1], 4) = 0
fcntl(10, F_GETFD)                      = 0
fcntl(10, F_SETFD, FD_CLOEXEC)          = 0
fcntl(10, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
write(5, "W", 1)                        = 1
write(5, "W", 1)                        = 1
uname({sys="Linux", node="mimc200-base.example.net", ...}) = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 11
bind(11, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
setsockopt(11, SOL_IP, IP_PKTINFO, [1], 4) = 0
setsockopt(11, SOL_IP, IP_RECVTTL, [1], 4) = 0
fcntl(11, F_GETFD)                      = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
fcntl(11, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(11, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
write(5, "W", 1)                        = 1
socket(PF_NETLINK, SOCK_DGRAM, 0)       = 12
getpid()                                = 383
bind(12, {sa_family=AF_NETLINK, pid=383, groups=00000111}, 12) = 0
setsockopt(12, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
brk(0x1f000)                            = 0x1f000
write(5, "W", 1)                        = 1
send(12, "\0\0\0\21\0\22\1\5\0\0\0\0\0\0\0\0\0", 17, 0BUG: soft lockup - CPU#0 stuck for 11s! [:383]
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What flavour kernel are you running? Can you enable the verbose BUG reporting and post the full bug report including stack trace from there? If you aren't running a vanilla Atmel kernel can you try one (or is your hardware different)?

Thanks,
-S.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm using the Atmel buildroot rc4 distro on a plain NGW100 dev kit.

Verbose BUG reporting is enabled in the kernel, but all I get is the one-line:-

BUG: soft lockup - CPU#0 stuck for 11s! [:377]

I've even got gdb up and running ( :shock: ) and I only get:-

~ # gdbserver :1024 /usr/sbin/avahi-daemon
Process /usr/sbin/avahi-daemon created; pid = 377
Listening on port 1024
Remote debugging from host 10.0.0.105
Found user 'default' (UID 1000) and group 'default' (GID 1000).
Successfully dropped root privileges.
avahi-daemon 0.6.22 starting up.
WARNING: No NSS support for mDNS detected, consider installing nss-mdns!
No service file found in /etc/avahi/services.
BUG: soft lockup - CPU#0 stuck for 11s! [:377]

And on the host side ...

r$ avr32-linux-gdb usr/sbin/avahi-daemon 
GNU gdb 6.7.1.atmel.1.0.3
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i386-pc-linux-gnu --target=avr32-linux-uclibc"...
(gdb) target remote 10.0.0.102:1024
Remote debugging using 10.0.0.102:1024
0x2aaab884 in _start () from /usr/local/dev/avr32/buildroot-cvs/build_avr32/staging_dir/lib/ld-uClibc.so.0
(gdb) list
1310	    }
1311	
1312	    /* If the initialization failed by some reason, we add the time to the seed*/
1313	    seed ^= (unsigned) time(NULL);
1314	
1315	    srand(seed);
1316	}
1317	
1318	int main(int argc, char *argv[]) {
1319	    int r = 255;
(gdb) break main
Breakpoint 1 at 0x642a: file main.c, line 1319.
(gdb) cont
Continuing.
warning: .dynamic section for "/lib/libpthread.so.0" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/usr/lib/libavahi-common.so.3" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/usr/lib/libavahi-core.so.5" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/usr/lib/libexpat.so.1" is not at the expected address (wrong library or version mismatch?)

Breakpoint 1, main (argc=1, argv=0x7fb75ec4) at main.c:1319
1319	    int r = 255;
(gdb) list
1314	
1315	    srand(seed);
1316	}
1317	
1318	int main(int argc, char *argv[]) {
1319	    int r = 255;
1320	    int wrote_pid_file = 0;
1321	
1322	    avahi_set_log_function(log_function);
1323	
(gdb) s
1320	    int wrote_pid_file = 0;
(gdb) s
1322	    avahi_set_log_function(log_function);
(gdb) s
1324	    init_rand_seed();
(gdb) s
init_rand_seed () at main.c:1302
1302	    unsigned seed = 0;
(gdb) cont
Continuing.

At this point, gdb just stops.

Any ideas ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do I need to fix the warning ?

warning: .dynamic section for "xyz" is not at the expected address (wrong library or version mismatch?)

Or is that not really the problem ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

squidgit wrote:
What flavour kernel are you running? Can you enable the verbose BUG reporting and post the full bug report including stack trace from there? If you aren't running a vanilla Atmel kernel can you try one (or is your hardware different)?

Thanks,
-S.


Okay ... I've gone back to a blank install of Atmel's buildroot v2.1.0, and I'll walk you through what I've found. This is all on a basic NGW100 dev kit.

$ make atngw100-base_defconfig
$ make

This all compiles, but has no avahi support. I've then added all the avahi options, and recompiled.

On the target, avahi-daemon segfaults, but this is separate bug (see my previous post #66706). Adding the -lpthread fixes the segfault.

I'm now missing the avahi-damon.conf file, which I just copied from the avahi build directory.

Which brings us to this ...

$ avahi-daemon --debug
Found user 'default' (UID 1000) and group 'default' (GID 1000).
Successfully dropped root privileges.
avahi-daemon 0.6.21 starting up.
WARNING: No NSS support for mDNS detected, consider installing nss-mdns!
Failed to open /etc/resolv.conf: No such file or directory
No service file found in /etc/avahi/services.

.. and there it stays with no BUG() output.

So I'll now run linux26-menuconfig and enable "Detect Soft Lockups" and "Verbose BUG() reporting".

Remake the kernel and reboot.

avahi-daemon still hangs, but no BUG() output at all !! :cry:

Where have I gone wrong ?

Any chance *you* could try avahi-daemon and see what happens ?

Thanks
Mark

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

squidgit wrote:
What flavour kernel are you running? Can you enable the verbose BUG reporting and post the full bug report including stack trace from there? If you aren't running a vanilla Atmel kernel can you try one (or is your hardware different)?

Right ... I've *finally* managed to get a more verbose BUG() output.

I'm now using the following setup:-

    RC5 buildroot
    atngw100_defconfig
    add full avahi package (with pthread fix)
    add "Mutex debugging: basic checks" via linux26-menuconfig
This throws up the following error (and re-running it produces mostly the same output, but it does differ slightly sometines):-

~ # avahi-daemon --debug
Found user 'default' (UID 1000) and group 'default' (GID 1000).
Successfully dropped root privileges.
avahi-daemon 0.6.22 starting up.
WARNING: No NSS support for mDNS detected, consider installing nss-mdns!
No service file found in /etc/avahi/services.
BUG: soft lockup - CPU#0 stuck for 61s! [:352]
PC is at netlink_ack+0xf8/0x154
LR is at __alloc_skb+0x48/0xb0
pc : [<9010c2cc>]    lr : [<900fc954>]    Not tainted
sp : 91d2fd14  r12: 91e38000  r11: 91e37410
r10: 00000000  r9 : 91e37440  r8 : 00000024
r7 : 91d2fd28  r6 : 91e37400  r5 : 91e37424  r4 : 0000e448
r3 : 91e38000  r2 : 00000000  r1 : 91d76400  r0 : fffffff0
Flags: qvNzC
Mode bits: hjmde....g
CPU Mode: Supervisor
Process:  [352] (task: 91d48300 thread: 91d2e000)
Call trace:
 [<9010c378>] netlink_rcv_skb+0x50/0x8c
 [<9010799a>] rtnetlink_rcv+0x12/0x18
 [<9010c17e>] netlink_unicast+0x18e/0x1e4
 [<9010c590>] netlink_sendmsg+0x1dc/0x1e4
 [<900f953c>] sock_sendmsg+0x84/0x98
 [<900f9c2a>] sys_sendto+0x8a/0xa4
 [<900f9c50>] sys_send+0xc/0x10
 [<90013132>] syscall_return+0x0/0x12

This also throws up the following when the kernel is booting (i.e. before we've got to the shell prompt):-

Enable NAT:
  IPv4 forwarding: done
  iptables postrouting: ------------[ cut here ]------------
Badness at kernel/mutex.c:134
PC is at __mutex_lock_slowpath+0x42/0x10a
LR is at __mutex_lock_slowpath+0x38/0x10a
pc : [<9017b8c2>]    lr : [<9017b8b8>]    Not tainted
sp : 91d69e2c  r12: 00000001  r11: 00000000
r10: 00000001  r9 : 901edfac  r8 : 00000000
r7 : 91d69e40  r6 : c08758b4  r5 : c0876130  r4 : 000a4008
r3 : 00400002  r2 : 91c2b600  r1 : c0876140  r0 : c0862754
Flags: qvnZc
Mode bits: hjmde....G
CPU Mode: Supervisor
Process: insmod [291] (task: 91c2b600 thread: 91d68000)
Call trace:
 [<9017b990>] mutex_lock+0x6/0xa
 [] nf_ct_extend_register+0xc/0x58 [nf_conntrack]
 [] nf_conntrack_helper_init+0x22/0x4c [nf_conntrack]
 [] nf_conntrack_init+0xa2/0x158 [nf_conntrack]
 [] nf_conntrack_standalone_init+0x8/0x9c [nf_conntrack]
 [<90036b5c>] sys_init_module+0xf8c/0x105c
 [<90013132>] syscall_return+0x0/0x12

done
  iptables incoming trafic: done
  iptables outgoung trafic: done

Enabling some of the other debug options also threw up the following:-

~ # avahi-daemon --debug
Found user 'default' (UID 1000) and group 'default' (GID 1000).
Successfully dropped root privileges.
avahi-daemon 0.6.22 starting up.
WARNING: No NSS support for mDNS detected, consider installing nss-mdns!
No service file found in /etc/avahi/services.
BUG: soft lockup - CPU#0 stuck for 61s! [:341]
PC is at mutex_unlock+0x12/0x1c
LR is at __rtnl_unlock+0xa/0x10
pc : [<9018906a>]    lr : [<90111cfa>]    Tainted: G      D
sp : 91cefd00  r12: 9020b0ac  r11: 91d6b800
r10: 9021ce54  r9 : 00000001  r8 : 91c04a10
r7 : 91cefd00  r6 : 90111a14  r5 : 91d6b800  r4 : 0000e448
r3 : 00000002  r2 : 00000000  r1 : 91deb460  r0 : 9021c99c
Flags: qvnzc
Mode bits: hjmde....g
CPU Mode: Supervisor
Process:  [341] (task: 91c736c0 thread: 91cee000)
Call trace:
 [<90111cfa>] __rtnl_unlock+0xa/0x10
 [<90111db4>] rtnetlink_rcv_msg+0x78/0x148
 [<90116944>] netlink_rcv_skb+0x34/0x8c
 [<90111d36>] rtnetlink_rcv+0x12/0x18
 [<9011674c>] netlink_unicast+0x1b0/0x204
 [<90116b7c>] netlink_sendmsg+0x1e0/0x1e8
 [<90102f4c>] sock_sendmsg+0x84/0x98
 [<9010363a>] sys_sendto+0x8a/0xa4
 [<90103660>] sys_send+0xc/0x10
 [<90013132>] syscall_return+0x0/0x12
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just to update this thread.

This issue is *finally* fixed.

I needed to update the kernel from 2.5.25.10 to 2.6.26.

I'm not sure what the exact issue was, but that solves it for me.