Problem:
We received a report of a workload that causes an xfs task to be blocked
for more than 120 seconds on log reservation via iomap_ioend completion
batching.
kernel: err [5636141.631454] INFO: task xfs-conv/dm-4:1788 blocked for
more than 122 seconds.
kernel: info [267022.728862] Workqueue: xfs-conv/dm-4 xfs_end_io [xfs]
kernel: info [267022.728864] Call Trace:
kernel: info [267022.728870] __schedule+0x340/0x810
kernel: info [267022.728876] schedule+0x51/0xc0
kernel: info [267022.728913] xlog_grant_head_wait+0xc7/0x200 [xfs]
kernel: info [267022.728950] xlog_grant_head_check+0xd0/0x110 [xfs]
kernel: info [267022.728985] xfs_log_reserve+0xc3/0x1e0 [xfs]
kernel: info [267022.729023] xfs_trans_reserve+0x156/0x1b0 [xfs]
kernel: info [267022.729184] xfs_trans_alloc+0xc6/0x190 [xfs]
kernel: info [267022.729317] xfs_iomap_write_unwritten+0xaa/0x2c0 [xfs]
kernel: info [267022.729333] ? stop_one_cpu+0x71/0xa0
kernel: info [267022.729347] ? set_cpus_allowed_ptr+0x10/0x10
kernel: info [267022.729396] xfs_end_ioend+0xc4/0x100 [xfs]
kernel: info [267022.729444] ? xfs_setfilesize_ioend+0x60/0x60 [xfs]
kernel: info [267022.729491] xfs_end_io+0xb9/0xe0 [xfs]
kernel: info [267022.729505] process_one_work+0x1a1/0x370
kernel: info [267022.729516] rescuer_thread+0x207/0x350
kernel: info [267022.729528] ? worker_thread+0x370/0x370
kernel: info [267022.729537] kthread+0x12e/0x150
kernel: info [267022.729548] ? __kthread_cancel_work+0x40/0x40
kernel: info [267022.729559] ret_from_fork+0x1f/0x30
After that, the connection via ssh to the controller is stuck,
Press Ctrl+C, it entered shell and the prompt displayed '-sh-4.2$'
Solution:
Removing the preallocated transaction from xfs append ioends to avoid
the ioend completion batching log reservation deadlock.
Now we continue to process the append ioend completions via the
workqueue, but let the wq task allocate the transaction similar to other
ioend types.
Backport the four patches from upstream(git://git.kernel.org/pub/scm/
linux/kernel/git/torvalds/linux.git) for debian-based StarlingX.
Only the 0034-xfs-use-current-journal_info-for-detecting-transacti.patch
for centos-based StarlingX is from stable tree(git://git.kernel.org/pub/
scm/linux/kernel/git/stable/linux.git linux-5.10.y branch), because the
kernel has been upgraded to v5.10.152 for debian-based StarlingX which
includes this fix, so we just apply it for the centos-based one.
TestPlan:
Pass: Execute bonnie++ test for xfs filesystem successfully without
kernel panic and any xfs anomalies in the kernel logs.
$mkfs.x /dev/sdc1
$mount /dev/sdc1 ~/xfstests
$sudo bonnie++ -u root:root -d ~/xfstests
Debian:
Pass: build-pkgs -c -a
Pass: build-image
Pass: boot successfully with std/rt.
CentOS:
Pass: build-pkgs
Pass: build-iso
Pass: boot successfully with std/rt.
Closes-Bug: 1996269
Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>
Change-Id: I1e5b85111b2b54cd249c116724b952042f9d781f