Skip to content


BUG: soft lockup – CPU#3 stuck for 10s!

故障现象
刚上的机器,网站不能访问.ssh也没反应

查看/var/log/message有一堆错误
Aug 19 21:35:23 bora kernel: BUG: soft lockup – CPU#3 stuck for 10s! [php-cgi:23997]
Aug 19 21:35:23 bora kernel: CPU 3:
Aug 19 21:35:23 bora kernel: Modules linked in: ip_conntrack_netbios_ns xt_state ip_conntrack nfnetlink iptable_filter ip_tables deflate zlib_deflate ccm serpent blowfish twofish ecb xcbc crypto_hash cbc md5 sha256 sha512 des aes_generic testmgr_cipher testmgr crypto_blkcipher aes_x86_64 ipcomp6 ipcomp ah6 ah4 esp6 xfrm6_esp esp4 xfrm4_esp aead crypto_algapi xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_tunnel xfrm6_tunnel tunnel6 ipv6 xfrm_nalgo crypto_api af_key autofs4 hidp rfcomm l2cap bluetooth sunrpc ipt_REJECT xt_limit xt_tcpudp x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi cpufreq_ondemand acpi_cpufreq freq_table dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sr_mod cdrom serio_raw sg bnx2 pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata shpchp mptsas mptscsih mptbase sc
Aug 19 21:35:23 bora kernel: i_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Aug 19 21:35:23 bora kernel: Pid: 23997, comm: php-cgi Not tainted 2.6.18-128.el5 #1
Aug 19 21:35:23 bora kernel: RIP: 0010:[] [] .text.lock.spinlock+0x2/0x30
Aug 19 21:35:23 bora kernel: RSP: 0000:ffff81013399bc68 EFLAGS: 00000282
Aug 19 21:35:23 bora kernel: RAX: ffff810224273c80 RBX: ffff810223ca7980 RCX: ffff810224273c80
Aug 19 21:35:23 bora kernel: RDX: 0000000000000000 RSI: 00000000000001f4 RDI: ffff810224273cc0
Aug 19 21:35:23 bora kernel: RBP: ffff81013399bbe0 R08: 0000000000000002 R09: 0000000000000000
Aug 19 21:35:23 bora kernel: R10: ffff81022b310680 R11: 0000000000000000 R12: ffffffff8005dc8e
Aug 19 21:35:23 bora kernel: R13: ffff810224273c80 R14: ffffffff800774da R15: ffff81013399bbe0
Aug 19 21:35:23 bora kernel: FS: 00002b5378ee4c20(0000) GS:ffff81012fc4e6c0(0000) knlGS:0000000000000000
Aug 19 21:35:23 bora kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 19 21:35:23 bora kernel: CR2: 00002b5380d78000 CR3: 0000000215b15000 CR4: 00000000000006e0
Aug 19 21:35:23 bora kernel:
Aug 19 21:35:23 bora kernel: Call Trace:
Aug 19 21:35:23 bora kernel: [] udp_rcv+0x431/0x5d1
Aug 19 21:35:23 bora kernel: [] ip_local_deliver+0x19d/0x263
Aug 19 21:35:23 bora kernel: [] ip_rcv+0x53a/0x57d
Aug 19 21:35:23 bora kernel: [] netif_receive_skb+0x370/0x39c
Aug 19 21:35:23 bora kernel: [] :bnx2:bnx2_poll_work+0xf7d/0x10b5
Aug 19 21:35:23 bora kernel: [] sk_free+0xc3/0x105
Aug 19 21:35:23 bora kernel: [] sched_balance_self+0x154/0x2f0
Aug 19 21:35:23 bora kernel: [] ip_local_deliver_finish+0x0/0x1e9
Aug 19 21:35:23 bora kernel: [] nf_hook_slow+0x58/0xbc
Aug 19 21:35:23 bora kernel: [] ip_local_deliver+0x19d/0x263
Aug 19 21:35:23 bora kernel: [] ip_rcv+0x53a/0x57d
Aug 19 21:35:23 bora kernel: [] :bnx2:bnx2_poll_msix+0x2e/0xc5
Aug 19 21:35:24 bora kernel: [] net_rx_action+0xa4/0x1a4
Aug 19 21:35:24 bora kernel: [] __do_softirq+0x89/0x133
Aug 19 21:35:24 bora kernel: [] call_softirq+0x1c/0x28
Aug 19 21:35:24 bora kernel: [] do_softirq+0x2c/0x85
Aug 19 21:35:24 bora kernel: [] do_IRQ+0xec/0xf5
Aug 19 21:35:24 bora kernel: [] ret_from_intr+0x0/0xa
Aug 19 21:35:24 bora kernel:
Aug 19 21:35:26 bora kernel: BUG: soft lockup – CPU#6 stuck for 10s! [pluto:3927]
Aug 19 21:35:26 bora kernel: CPU 6:
Aug 19 21:35:26 bora kernel: Modules linked in: ip_conntrack_netbios_ns xt_state ip_conntrack nfnetlink iptable_filter ip_tables deflate zlib_deflate ccm serpent blowfish twofish ecb xcbc crypto_hash cbc md5 sha256 sha512 des aes_generic testmgr_cipher testmgr crypto_blkcipher aes_x86_64 ipcomp6 ipcomp ah6 ah4 esp6 xfrm6_esp esp4 xfrm4_esp aead crypto_algapi xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_tunnel xfrm6_tunnel tunnel6 ipv6 xfrm_nalgo crypto_api af_key autofs4 hidp rfcomm l2cap bluetooth sunrpc ipt_REJECT xt_limit xt_tcpudp x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi cpufreq_ondemand acpi_cpufreq freq_table dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sr_mod cdrom serio_raw sg bnx2 pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata shpchp mptsas mptscsih mptbase sc
Aug 19 21:35:26 bora kernel: i_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Aug 19 21:35:26 bora kernel: Pid: 3927, comm: pluto Not tainted 2.6.18-128.el5 #1
Aug 19 21:35:26 bora kernel: RIP: 0010:[] [] .text.lock.spinlock+0x2/0x30
Aug 19 21:35:26 bora kernel: RSP: 0018:ffff81012e5f5bb0 EFLAGS: 00000282
Aug 19 21:35:26 bora kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff810224273c80
Aug 19 21:35:26 bora kernel: RDX: 0000000000000000 RSI: 00000000000001e0 RDI: ffff810224273cc0
Aug 19 21:35:26 bora kernel: RBP: 0000000000000206 R08: ffff81012e5f5a38 R09: 0000000000000000
Aug 19 21:35:26 bora kernel: R10: ffff81012e5f5ab8 R11: 0000000000000048 R12: ffff810224273c80
Aug 19 21:35:26 bora kernel: R13: 0000100000000011 R14: 0000000400000000 R15: 0000000000000000
Aug 19 21:35:26 bora kernel: FS: 00002b4ce020bdb0(0000) GS:ffff81013397ce40(0000) knlGS:0000000000000000
Aug 19 21:35:26 bora kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 19 21:35:26 bora kernel: CR2: 00002b5380d78000 CR3: 00000002269a2000 CR4: 00000000000006e0
Aug 19 21:35:26 bora kernel:
Aug 19 21:35:26 bora kernel: Call Trace:
Aug 19 21:35:26 bora kernel: [] release_sock+0x6b/0xaa
Aug 19 21:35:26 bora kernel: [] udp_sendmsg+0x4de/0x5ce
Aug 19 21:35:26 bora kernel: [] sock_sendmsg+0xf3/0x110
Aug 19 21:35:26 bora kernel: [] inode_has_perm+0x56/0x63
Aug 19 21:35:26 bora kernel: [] autoremove_wake_function+0x0/0x2e
Aug 19 21:35:26 bora kernel: [] selinux_inode_getattr+0x50/0x5e
Aug 19 21:35:26 bora kernel: [] _atomic_dec_and_lock+0x39/0x57
Aug 19 21:35:26 bora kernel: [] sys_sendto+0x11c/0x14f
Aug 19 21:35:26 bora kernel: [] tracesys+0xd5/0xe0
Aug 19 21:35:26 bora kernel:
Aug 19 21:35:33 bora kernel: BUG: soft lockup – CPU#3 stuck for 10s! [php-cgi:23997]
Aug 19 21:35:33 bora kernel: CPU 3:
Aug 19 21:35:33 bora kernel: Modules linked in: ip_conntrack_netbios_ns xt_state ip_conntrack nfnetlink iptable_filter ip_tables deflate zlib_deflate ccm serpent blowfish twofish ecb xcbc crypto_hash cbc md5 sha256 sha512 des aes_generic testmgr_cipher testmgr crypto_blkcipher aes_x86_64 ipcomp6 ipcomp ah6 ah4 esp6 xfrm6_esp esp4 xfrm4_esp aead crypto_algapi xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_tunnel xfrm6_tunnel tunnel6 ipv6 xfrm_nalgo crypto_api af_key autofs4 hidp rfcomm l2cap bluetooth sunrpc ipt_REJECT xt_limit xt_tcpudp x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi cpufreq_ondemand acpi_cpufreq freq_table dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sr_mod cdrom serio_raw sg bnx2 pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata shpchp mptsas mptscsih mptbase sc

硬件:
dell R410
E5504(四核) x2
4G x2
SAS 15k 146g x2

系统:
CentOS 5.2 64bit
nginx-0.7.61
php-5.2.10
mysql-5.1.37
eaccelerator-0.9.5.3

#top
top – 14:03:42 up 16:11, 1 user, load average: 0.30, 0.40, 0.43
Tasks: 260 total, 1 running, 259 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.4%us, 1.0%sy, 0.0%ni, 94.8%id, 0.5%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 8168412k total, 5068160k used, 3100252k free, 510276k buffers
Swap: 4096532k total, 0k used, 4096532k free, 3251992k cached

#cat /proc/version
Linux version 2.6.18-128.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Jan 21 10:41:14 EST 2009

# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5504 @ 2.00GHz
stepping : 5
cpu MHz : 1596.000
cache size : 4096 KB
physical id : 1
siblings : 4
core id : 0
cpu cores : 4
apicid : 16
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
bogomips : 3993.25
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual

#iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
PING icmp — anywhere anywhere icmp echo-request state NEW
ACCEPT all — anywhere anywhere
DROP all — 127.0.0.0/8 anywhere
DROP all — anywhere anywhere state INVALID
DROP tcp — anywhere anywhere tcp flags:FIN,SYN,RST,PSH,ACK,URG/FIN,PSH,URG
DROP tcp — anywhere anywhere tcp flags:FIN,SYN,RST,PSH,ACK,URG/FIN,SYN,RST,PSH,ACK,URG
DROP tcp — anywhere anywhere tcp flags:FIN,SYN,RST,PSH,ACK,URG/FIN,SYN,RST,ACK,URG
DROP tcp — anywhere anywhere tcp flags:FIN,SYN,RST,PSH,ACK,URG/NONE
DROP tcp — anywhere anywhere tcp flags:SYN,RST/SYN,RST
DROP tcp — anywhere anywhere tcp flags:FIN,SYN/FIN,SYN
ACCEPT tcp — anywhere anywhere tcp dpt:smtp
ACCEPT tcp — anywhere anywhere tcp dpt:http
ACCEPT tcp — anywhere anywhere tcp dpt:mysql
ACCEPT tcp — anywhere anywhere tcp dpt:webcache
ACCEPT tcp — anywhere anywhere tcp dpt:15666
ACCEPT tcp — anywhere anywhere tcp dpt:ssh
DROP tcp — anywhere anywhere tcp flags:FIN,SYN,RST,ACK/SYN

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all — anywhere anywhere
ACCEPT all — anywhere anywhere

Chain PING (1 references)
target prot opt source destination
RETURN icmp — anywhere anywhere icmp echo-request limit: avg 1/sec burst 5
REJECT icmp — anywhere anywhere reject-with icmp-port-unreachable

Chain SYNFLOOD (0 references)
target prot opt source destination

===========================
故障原因好像是kernel-2.6.18-128有冲突,继续观察中

相关资料
http://bugs.centos.org/view.php?id=3582
https://bugzilla.redhat.com/show_bug.cgi?id=484590

Posted in LINUX, 技术.

Tagged with .


No Responses (yet)

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.