system nie odpowiada po miękkiej blokadzie


Zaobserwowaliśmy częste problemy z miękkim blokowaniem w Ubuntu 12.04 (jądro: 3.8.0-29-generic) i po tym stwierdziliśmy, że system przestał odpowiadać. Oto komunikat kern.log tuż przed wystąpieniem miękkich blokad. Każda pomoc byłaby mile widziana.
Mar 29 00:12:01 HOST9016 kernel: [387780.959368] BUG: soft lockup - CPU#60 stuck for 23s! [java:113233]
Mar 29 00:12:01 HOST9016 kernel: [387781.007045] BUG: soft lockup - CPU#63 stuck for 23s! [java:113220]
Mar 29 00:12:01 HOST9016 kernel: [387781.007516] Modules linked in: nf_conntrack_ipv6(F) nf_defrag_ipv6(F) ip6table_filter(F) ip6_tables(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_LOG(F) xt_tcpudp(F) xt_conntrack(F) xt_hashlimit(F) iptable_filter(F) ip_tables(F) x_tables(F) vesafb(F) coretemp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) joydev(F) hid_generic(F) gpio_ich(F) microcode(F) psmouse(F) serio_raw(F) usbhid(F) hid(F) hpwdt(F) hpilo(F) lpc_ich(F) ioatdma(F) dca(F) wmi(F) bnep(F) rfcomm(F) bluetooth(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) fscache(F) acpi_power_meter(F) lockd(F) mac_hid(F) sunrpc(F) nf_conntrack_ftp(F) nf_conntrack(F) lp(F) parport(F) tg3(F) ptp(F) pps_core(F) hpsa(F)
Mar 29 00:12:01 HOST9016 kernel: [387781.007520] CPU 63
Mar 29 00:12:01 HOST9016 kernel: [387781.007521] Pid: 113220, comm: java Tainted: GF 3.8.0-29-generic #42~precise1-Ubuntu HP ProLiant DL580 Gen8
Mar 29 00:12:01 HOST9016 kernel: [387781.007530] RIP: 0010:[<ffffffff811674a5>] [<ffffffff811674a5>] change_pte_range+0x205/0x2d0
Mar 29 00:12:01 HOST9016 kernel: [387781.007532] RSP: 0018:ffff883dbc9ffca8 EFLAGS: 00000286
Mar 29 00:12:01 HOST9016 kernel: [387781.007533] RAX: ffffea00f1431600 RBX: ffff883dbc8d4958 RCX: 0600000000080068
Mar 29 00:12:01 HOST9016 kernel: [387781.007960] RDX: 0000000000000000 RSI: 00007f2769b6e000 RDI: 8000003c50c58166
Mar 29 00:12:01 HOST9016 kernel: [387781.007961] RBP: ffff883dbc9ffd48 R08: ffff883dbc8d4958 R09: 0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.007961] R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000004
Mar 29 00:12:01 HOST9016 kernel: [387781.007962] R13: 0000000000000202 R14: ffffffff81ce6fa0 R15: ffff883dbc9ffc98
Mar 29 00:12:01 HOST9016 kernel: [387781.007964] FS: 00007f22b1059700(0000) GS:ffff881fffa40000(0000) knlGS:0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.007965] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 29 00:12:01 HOST9016 kernel: [387781.007966] CR2: 00007f47ab783028 CR3: 0000001d8f9a3000 CR4: 00000000001407e0
Mar 29 00:12:01 HOST9016 kernel: [387781.007967] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.007968] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 29 00:12:01 HOST9016 kernel: [387781.007969] Process java (pid: 113220, threadinfo ffff883dbc9fe000, task ffff883dbc9045c0)
Mar 29 00:12:01 HOST9016 kernel: [387781.007970] Stack:
Mar 29 00:12:01 HOST9016 kernel: [387781.007985] ffff883dbc9ffd38 ffff881fd06ba940 ffff881fd06ba680 000000007a400000
Mar 29 00:12:01 HOST9016 kernel: [387781.008435] 00007f2689600000 0000000000000001 ffff883dbc8d4958 0000000000000001
Mar 29 00:12:01 HOST9016 kernel: [387781.008445] ffffea017f48e570 8000000000000025 8000003c50c58166 00007f2769c00000
Mar 29 00:12:01 HOST9016 kernel: [387781.008445] Call Trace:
Mar 29 00:12:01 HOST9016 kernel: [387781.008452] [<ffffffff811677ea>] change_protection_range+0x27a/0x410
Mar 29 00:12:01 HOST9016 kernel: [387781.008875] [<ffffffff811679f5>] change_protection+0x75/0xc0
Mar 29 00:12:01 HOST9016 kernel: [387781.008881] [<ffffffff8117baeb>] change_prot_numa+0x1b/0x30
Mar 29 00:12:01 HOST9016 kernel: [387781.008889] [<ffffffff8109544a>] task_numa_work+0x24a/0x320
Mar 29 00:12:01 HOST9016 kernel: [387781.008895] [<ffffffff8107bdc8>] task_work_run+0xc8/0xf0
Mar 29 00:12:01 HOST9016 kernel: [387781.009311] [<ffffffff81014d9a>] do_notify_resume+0xaa/0xc0
Mar 29 00:12:01 HOST9016 kernel: [387781.009318] [<ffffffff816fcb9a>] int_signal+0x12/0x17
Mar 29 00:12:01 HOST9016 kernel: [387781.009738] Code: 0f 84 73 ff ff ff e9 69 ff ff ff 0f 1f 00 48 8b 7d 90 4c 89 f2 4c 89 ee e8 89 54 ff ff 31 d2 48 85 c0 0f 84 34 ff ff ff 48 8b 08 <48> c1 e9 3a 83 bd 7c ff ff ff ff 74 7e 39 8d 7c ff ff ff 0f b6
Mar 29 00:12:01 HOST9016 kernel: [387781.098867] BUG: soft lockup - CPU#69 stuck for 23s! [java:113232]
Mar 29 00:12:01 HOST9016 kernel: [387781.148120] Modules linked in: nf_conntrack_ipv6(F) nf_defrag_ipv6(F) ip6table_filter(F) ip6_tables(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_LOG(F) xt_tcpudp(F) xt_conntrack(F) xt_hashlimit(F) iptable_filter(F) ip_tables(F) x_tables(F) vesafb(F) coretemp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) joydev(F) hid_generic(F) gpio_ich(F) microcode(F) psmouse(F) serio_raw(F) usbhid(F) hid(F) hpwdt(F) hpilo(F) lpc_ich(F) ioatdma(F) dca(F) wmi(F) bnep(F) rfcomm(F) bluetooth(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) fscache(F) acpi_power_meter(F) lockd(F) mac_hid(F) sunrpc(F) nf_conntrack_ftp(F) nf_conntrack(F) lp(F) parport(F) tg3(F) ptp(F) pps_core(F) hpsa(F)
Mar 29 00:12:01 HOST9016 kernel: [387781.150284] CPU 69
Mar 29 00:12:01 HOST9016 kernel: [387781.150288] Pid: 113232, comm: java Tainted: GF 3.8.0-29-generic #42~precise1-Ubuntu HP ProLiant DL580 Gen8
Mar 29 00:12:01 HOST9016 kernel: [387781.150701] RIP: 0010:[<ffffffff811674a5>] [<ffffffff811674a5>] change_pte_range+0x205/0x2d0
Mar 29 00:12:01 HOST9016 kernel: [387781.150706] RSP: 0018:ffff887fcba19ca8 EFLAGS: 00000286
Mar 29 00:12:01 HOST9016 kernel: [387781.151137] RAX: ffffea00f71aee00 RBX: ffff883dbc8d4958 RCX: 0600000000080078
Mar 29 00:12:01 HOST9016 kernel: [387781.151139] RDX: 0000000000000000 RSI: 00007f2a3c820000 RDI: 8000003dc6bb8166
Mar 29 00:12:01 HOST9016 kernel: [387781.151141] RBP: ffff887fcba19d48 R08: ffff883dbc8d4958 R09: 0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.151143] R10: 0000000000000004 R11: 0000000000000293 R12: 0000000000000004
Mar 29 00:12:01 HOST9016 kernel: [387781.151145] R13: 0000000000000293 R14: ffffffff81ce6fa0 R15: ffff887fcba19c98
Mar 29 00:12:01 HOST9016 kernel: [387781.151148] FS: 00007f22829a7700(0000) GS:ffff881fffb00000(0000) knlGS:0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.151151] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 29 00:12:01 HOST9016 kernel: [387781.151153] CR2: 00007f60e5451720 CR3: 0000001d8f9a3000 CR4: 00000000001407e0
Mar 29 00:12:01 HOST9016 kernel: [387781.151154] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.151156] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 29 00:12:01 HOST9016 kernel: [387781.151599] Process java (pid: 113232, threadinfo ffff887fcba18000, task ffff887cfb0345c0)
Mar 29 00:12:01 HOST9016 kernel: [387781.151600] Stack:
Mar 29 00:12:01 HOST9016 kernel: [387781.151602] ffff887fcba19d38 ffff881fd06ba940 0000000000000293 0000000200000004
Mar 29 00:12:01 HOST9016 kernel: [387781.152476] 0000000000000000 ffff883dbc8d4958 ffff883dbc8d4958 0000000000000001
Mar 29 00:12:01 HOST9016 kernel: [387781.152895] ffffea01723ae170 8000000000000025 8000003dc6bb8166 00007f2a3ca00000
Mar 29 00:12:01 HOST9016 kernel: [387781.153738] Call Trace:
Mar 29 00:12:01 HOST9016 kernel: [387781.154157] [<ffffffff811677ea>] change_protection_range+0x27a/0x410
Mar 29 00:12:01 HOST9016 kernel: [387781.154575] [<ffffffff811679f5>] change_protection+0x75/0xc0
Mar 29 00:12:01 HOST9016 kernel: [387781.154992] [<ffffffff8117baeb>] change_prot_numa+0x1b/0x30
Mar 29 00:12:01 HOST9016 kernel: [387781.155001] [<ffffffff8109544a>] task_numa_work+0x24a/0x320
Mar 29 00:12:01 HOST9016 kernel: [387781.155009] [<ffffffff8107bdc8>] task_work_run+0xc8/0xf0
Mar 29 00:12:01 HOST9016 kernel: [387781.155015] [<ffffffff816f254b>] ? __schedule+0x3bb/0x6b0
Mar 29 00:12:01 HOST9016 kernel: [387781.155021] [<ffffffff81014d9a>] do_notify_resume+0xaa/0xc0
Mar 29 00:12:01 HOST9016 kernel: [387781.155448] [<ffffffff816fcb9a>] int_signal+0x12/0x17
Mar 29 00:12:01 HOST9016 kernel: [387781.155450] Code: 0f 84 73 ff ff ff e9 69 ff ff ff 0f 1f 00 48 8b 7d 90 4c 89 f2 4c 89 ee e8 89 54 ff ff 31 d2 48 85 c0 0f 84 34 ff ff ff 48 8b 08 <48> c1 e9 3a 83 bd 7c ff ff ff ff 74 7e 39 8d 7c ff ff ff 0f b6
Mar 29 00:12:01 HOST9016 kernel: [387781.262831] BUG: soft lockup - CPU#79 stuck for 22s! [java:113234]
Mar 29 00:12:01 HOST9016 kernel: [387781.314646] Modules linked in: nf_conntrack_ipv6(F) nf_defrag_ipv6(F) ip6table_filter(F) ip6_tables(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_LOG(F) xt_tcpudp(F) xt_conntrack(F) xt_hashlimit(F) iptable_filter(F) ip_tables(F) x_tables(F) vesafb(F) coretemp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) joydev(F) hid_generic(F) gpio_ich(F) microcode(F) psmouse(F) serio_raw(F) usbhid(F) hid(F) hpwdt(F) hpilo(F) lpc_ich(F) ioatdma(F) dca(F) wmi(F) bnep(F) rfcomm(F) bluetooth(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) fscache(F) acpi_power_meter(F) lockd(F) mac_hid(F) sunrpc(F) nf_conntrack_ftp(F) nf_conntrack(F) lp(F) parport(F) tg3(F) ptp(F) pps_core(F) hpsa(F)
Mar 29 00:12:01 HOST9016 kernel: [387781.319281] CPU 79
Mar 29 00:12:01 HOST9016 kernel: [387781.319285] Pid: 113234, comm: java Tainted: GF 3.8.0-29-generic #42~precise1-Ubuntu HP ProLiant DL580 Gen8
Mar 29 00:12:01 HOST9016 kernel: [387781.319288] RIP: 0010:[<ffffffff8115c93f>] [<ffffffff8115c93f>] vm_normal_page+0x1f/0x80
Mar 29 00:12:01 HOST9016 kernel: [387781.320152] RSP: 0000:ffff887d8ede7c88 EFLAGS: 00000a06
Mar 29 00:12:01 HOST9016 kernel: [387781.320568] RAX: 0070bea105980000 RBX: ffff881fd06ba940 RCX: 0000000000000001
Mar 29 00:12:01 HOST9016 kernel: [387781.320570] RDX: 8000001c2fa84166 RSI: 00007f2b98da6000 RDI: 8000001c2fa84166
Mar 29 00:12:01 HOST9016 kernel: [387781.320572] RBP: ffff887d8ede7c98 R08: ffff883dbc8d4958 R09: 0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.320573] R10: 0000000000000004 R11: 0000000000000202 R12: 000000000000004f
Mar 29 00:12:01 HOST9016 kernel: [387781.320998] R13: ffffffff8104e810 R14: 000000000000003c R15: 0000004fd2942458
Mar 29 00:12:01 HOST9016 kernel: [387781.321001] FS: 00007f22827a5700(0000) GS:ffff883fffa60000(0000) knlGS:0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.321002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 29 00:12:01 HOST9016 kernel: [387781.321004] CR2: 00007f47b2eb3000 CR3: 0000001d8f9a3000 CR4: 00000000001407e0
Mar 29 00:12:01 HOST9016 kernel: [387781.321419] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.321421] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 29 00:12:01 HOST9016 kernel: [387781.321819] Process java (pid: 113234, threadinfo ffff887d8ede6000, task ffff887df5799740)
Mar 29 00:12:01 HOST9016 kernel: [387781.321820] Stack:
Mar 29 00:12:01 HOST9016 kernel: [387781.321821] ffff887d8ede7c98 8000001c2fa84166 ffff887d8ede7d48 ffffffff81167497
Mar 29 00:12:01 HOST9016 kernel: [387781.322653] ffff887d8ede7d38 ffff881fd06ba940 0000000000000202 0000000200000004
Mar 29 00:12:01 HOST9016 kernel: [387781.323512] ffff887d8ede7e00 ffff883dbc8d4958 ffff883dbc8d4958 0000000000000001
Mar 29 00:12:01 HOST9016 kernel: [387781.324377] Call Trace:
Mar 29 00:12:01 HOST9016 kernel: [387781.324796] [<ffffffff81167497>] change_pte_range+0x1f7/0x2d0
Mar 29 00:12:01 HOST9016 kernel: [387781.324802] [<ffffffff811677ea>] change_protection_range+0x27a/0x410
Mar 29 00:12:01 HOST9016 kernel: [387781.325225] [<ffffffff811679f5>] change_protection+0x75/0xc0
Mar 29 00:12:01 HOST9016 kernel: [387781.325672] [<ffffffff8117baeb>] change_prot_numa+0x1b/0x30
Mar 29 00:12:01 HOST9016 kernel: [387781.326888] [<ffffffff8109544a>] task_numa_work+0x24a/0x320
Mar 29 00:12:01 HOST9016 kernel: [387781.326900] [<ffffffff8107bdc8>] task_work_run+0xc8/0xf0
Mar 29 00:12:01 HOST9016 kernel: [387781.326912] [<ffffffff81014d9a>] do_notify_resume+0xaa/0xc0
Mar 29 00:12:01 HOST9016 kernel: [387781.327756] [<ffffffff816fcb9a>] int_signal+0x12/0x17
Mar 29 00:12:01 HOST9016 kernel: [387781.327758] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 49 89 f8 48 89 d7 48 89 e5 53 48 83 ec 08 48 89 f8 0f 1f 40 00 48 c1 e0 12 <48> c1 e8 1e f6 c6 02 75 27 48 39 05 19 cd b8 00 72 3f 48 89 c3
Mar 29 06:24:22 HOST9016 kernel: [410090.031877] BUG: soft lockup - CPU#103 stuck for 23s! [java:113233]
Mar 29 06:24:22 HOST9016 kernel: [410090.086169] Modules linked in: nf_conntrack_ipv6(F) nf_defrag_ipv6(F) ip6table_filter(F) ip6_tables(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_LOG(F) xt_tcpudp(F) xt_conntrack(F) xt_hashlimit(F) iptable_filter(F) ip_tables(F) x_tables(F) vesafb(F) coretemp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) joydev(F) hid_generic(F) gpio_ich(F) microcode(F) psmouse(F) serio_raw(F) usbhid(F) hid(F) hpwdt(F) hpilo(F) lpc_ich(F) ioatdma(F) dca(F) wmi(F) bnep(F) rfcomm(F) bluetooth(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) fscache(F) acpi_power_meter(F) lockd(F) mac_hid(F) sunrpc(F) nf_conntrack_ftp(F) nf_conntrack(F) lp(F) parport(F) tg3(F) ptp(F) pps_core(F) hpsa(F)

Zaproszony:
Anonimowy użytkownik

Anonimowy użytkownik

Potwierdzenie od:

Nie myśl też, że jest wystarczająco dużo informacji, dzienniki z aplikacji java byłyby lepsze, ślad stosu pokazuje wywołania zarządzania pamięcią. Możliwy wyciek pamięci? Java nie ma jawnych ustawień maksymalnej pamięci, może sprawdź, czy jest to ustawione.
Anonimowy użytkownik

Anonimowy użytkownik

Potwierdzenie od:

Wydaje mi się, że w tym pytaniu nie ma wystarczających informacji, aby w pełni na nie odpowiedzieć, ale patrząc na dzienniki widzę, że błąd jest spowodowany przez aplikację java:
Mar 29 06:24:22 HOST9016 kernel: [410090.031877] BUG: soft lockup - CPU#103 stuck for 23s! [java:113233]

Myślę więc, że następnym krokiem byłoby przyjrzenie się tej konkretnej aplikacji, aby zobaczyć, co robi.

Aby odpowiedzieć na pytania, Zaloguj się lub Zarejestruj się