470193ffc9
This issue is detected after kernel updated from 5.10.112 version to 5.10.152 version. Bad commit is d83d886e69bd (PCI/ERR: Recover from RCEC AER errors) which comes from linux-yocto 5.10 stable tree. It will lead to board hang up after triggering kdump. This issue can be reproduced on board whose name is Supermicro A2SDi-16C-TP8F, bios version is 1.4 and build date is 01/29/2021. We don't need pci AER functionality enabled in the kdump kernel, and it causes some boards to hang in certain situations as kernel AER error log shows. So we just disable it. KERNEL AER ERROR LOG: [ 7.409296] pcieport 0000:00:05.0: AER: Multiple Corrected error received: 0000:00:05.0 [ 7.417311] BUG: kernel NULL pointer dereference, address: 0000000000000028 [ 7.418296] #PF: supervisor read access in kernel mode [ 7.418296] #PF: error_code(0x0000) - not-present page [ 7.418296] PGD 0 P4D 0 [ 7.418296] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 7.418296] CPU: 0 PID: 93 Comm: irq/25-aerdrv Not tainted 5.10.0-6-amd64 #1 Debian 5.10.152-1.stx.25 [ 7.418296] Hardware name: Supermicro SYS-E300-9A-16CN8TP/A2SDi-16C-TP8F, BIOS 1.4 01/29/2021 [ 7.418296] RIP: 0010:pci_walk_bus+0x25/0x90 [ 7.418296] Code: 00 00 00 00 00 0f 1f 44 00 00 41 56 41 55 49 89 fd 48 c7 c7 20 37 9a 99 41 54 49 89 f4 55 48 89 d5 53 4c 89 eb e8 2b 5a 56 00 <49> 8b 7d 28 eb 1f 48 8b 47 18 48 85 c0 74 31 4c 8b 70 28 48 89 c3 [ 7.418296] RSP: 0000:ffffa60040173dc8 EFLAGS: 00010282 [ 7.418296] RAX: ffff8b553fded001 RBX: 0000000000000000 RCX: 0000000000000000 [ 7.418296] RDX: ffff8b553fded000 RSI: ffffffff9833c6e0 RDI: ffffffff999a3720 [ 7.418296] RBP: ffffa60040173e10 R08: 0000000000000002 R09: ffffa60040173d74 [ 7.418296] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff9833c6e0 [ 7.418296] R13: 0000000000000000 R14: 0000000000000028 R15: ffff8b555e206328 [ 7.418296] FS: 0000000000000000(0000) GS:ffff8b55bec00000(0000) knlGS:0000000000000000 [ 7.418296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7.418296] CR2: 0000000000000028 CR3: 000000087d80a000 CR4: 00000000003506f0 [ 7.418296] Call Trace: [ 7.418296] find_source_device+0x34/0x5a [ 7.418296] aer_isr.cold+0x89/0x9e [ 7.418296] ? __set_cpus_allowed_ptr+0xb6/0x220 [ 7.418296] ? disable_irq_nosync+0x10/0x10 [ 7.418296] irq_thread_fn+0x20/0x60 [ 7.418296] irq_thread+0x104/0x1b0 [ 7.418296] ? irq_finalize_oneshot.part.0+0xe0/0xe0 [ 7.418296] ? irq_thread_check_affinity+0xa0/0xa0 [ 7.418296] kthread+0x133/0x150 [ 7.418296] ? __kthread_bind_mask+0x60/0x60 [ 7.418296] ret_from_fork+0x22/0x30 [ 7.418296] Modules linked in: [ 7.418296] CR2: 0000000000000028 TEST PLAN: PASS: build-pkgs -c -p kdump-tools PASS: build-pkgs -c -p kdump-tools-rt PASS: boot PASS: on troublesome and non-troublesome platform systemctl enable kdump-tools.service systemctl start kdump-tools.service echo 1 >/proc/sysrq-trigger echo 'c' > /proc/sysrq-trigger vmcore has been created successfully system boots back up automatically Closes-Bug: 1999646 Change-Id: I9ffc6e96d4b7fbd0b29a806d4d96dfc8e89dc4c6 Signed-off-by: Peng Zhang <Peng.Zhang2@windriver.com>
30 lines
941 B
Diff
30 lines
941 B
Diff
From 88e8f23536d60aa163c72ffdbe453315c5102d3c Mon Sep 17 00:00:00 2001
|
|
From: Peng Zhang <Peng.Zhang2@windriver.com>
|
|
Date: Thu, 15 Dec 2022 00:09:32 -0800
|
|
Subject: [PATCH] kdump-tools: disable AER to fix kdump hung issue
|
|
|
|
We don't need pci AER functionality enabled in the kdump kernel,
|
|
and it causes some boards to hang in certain situations. So just
|
|
disable it.
|
|
|
|
Signed-off-by: Peng Zhang <Peng.Zhang2@windriver.com>
|
|
---
|
|
rules | 2 +-
|
|
1 file changed, 1 insertion(+), 1 deletion(-)
|
|
|
|
diff --git a/debian/rules b/debian/rules
|
|
index 72b7d6d..b428331 100755
|
|
--- a/debian/rules
|
|
+++ b/debian/rules
|
|
@@ -14,7 +14,7 @@ ifeq ($(DEB_HOST_ARCH),arm64)
|
|
else ifeq ($(DEB_HOST_ARCH),ppc64el)
|
|
KDUMP_CMDLINE_APPEND += maxcpus=1 irqpoll noirqdistrib nousb
|
|
else
|
|
- KDUMP_CMDLINE_APPEND += nr_cpus=1 irqpoll nousb ata_piix.prefer_ms_hyperv=0
|
|
+ KDUMP_CMDLINE_APPEND += nr_cpus=1 irqpoll nousb ata_piix.prefer_ms_hyperv=0 pci=noaer
|
|
endif
|
|
|
|
%:
|
|
--
|
|
2.34.1
|