kernel/kernel-rt/centos/patches/0003-affine-compute-kernel-threads.patch
M. Vefa Bicakci 8fde1a8070 kthread_cpus: Avoid large stack allocation
Since commit 7b55e47e1d9f ("Restore CPU affinity patches'
functionality"), the function kthread_setup() has allocated a cpumask
structure on the stack, which uses 1024 bytes on the stack when
CONFIG_MAXSMP is enabled. This was detected after setting
CONFIG_FRAME_WARN to 1024 for testing purposes.

To clarify, with the following configuration:

  $ grep -e MAXSMP= -e NR_CPUS= -e CPUMASK_OFFSTACK= \
    -e FRAME_WARN= .config
  CONFIG_MAXSMP=y
  CONFIG_NR_CPUS=8192
  CONFIG_CPUMASK_OFFSTACK=y
  CONFIG_FRAME_WARN=1024

the following build warning is encountered:

    CC      kernel/cpu.o
  kernel/cpu.c: In function 'kthread_setup':
  kernel/cpu.c:2475:1: warning: the frame size of 1032 bytes is \
      larger than 1024 bytes [-Wframe-larger-than=]

This commit resolves this issue by allocating the cpumask structure
using the boot memory allocator via alloc_bootmem_cpumask_var().
A similar approach is taken for the housekeeping-related cpumask
structures in kernel/sched/isolation.c.

Please note that the issue fixed by this commit (large stack allocation)
only exists when CONFIG_CPUMASK_OFFSTACK is enabled. StarlingX currently
disables CONFIG_CPUMASK_OFFSTACK. Disabling CONFIG_CPUMASK_OFFSTACK sets
CONFIG_NR_CPUS' maximum value to 512 CPUs, which in turn corresponds to
64 bytes on the stack. In contrast, enabling CONFIG_CPUMASK_OFFSTACK
(for example, by setting CONFIG_MAXSMP=y) raises the upper limit for
CONFIG_NR_CPUS to 8192 CPUs, which translates to a 1024-byte bitmap.

No change in behaviour is intended.

While carrying out the code changes, space characters in the affected
function were converted to tab characters as a minor clean-up.

Testing:
* Monolithic ISO image build was successful.
* Installation and bootstrap of the built ISO was successful with a
  virtual machine in All-in-One simplex mode, using the low-latency
  profile.
* kthread_cpus argument's behaviour was confirmed to not have been
  negatively affected by checking that the CPU affinity of kthreadd
  (i.e., pid 2) matches the kthread_cpus= argument's value.
* kthread_cpus argument's error handling was confirmed to work as
  expected by booting up the installed system with an invalid argument
  ('kthread_cpus=abc'). In this scenario, the CPU affinity of kthreadd
  was observed to be correctly set to 0-7 (with an 8-CPU VM).

Change-Id: I31d2175d6084142e63d4b38d7b0c5677046fce4f
Closes-Bug: 1957188
Fixes: 7b55e47e1d9f ("Restore CPU affinity patches' functionality")
Signed-off-by: M. Vefa Bicakci <vefa.bicakci@windriver.com>
2022-01-12 16:48:18 -05:00

174 lines
6.7 KiB
Diff

From 5fba1536bd24bc77ef0d6b2516fefcff69d7cf59 Mon Sep 17 00:00:00 2001
From: Chris Friesen <chris.friesen@windriver.com>
Date: Tue, 24 Nov 2015 16:27:28 -0500
Subject: [PATCH] affine compute kernel threads
This is a kernel enhancement to configure the cpu affinity of kernel
threads via kernel boot option kthread_cpus=<cpulist>. The compute
kickstart file and compute-huge.sh scripts will update grub with the
new option.
With kthread_cpus specified, the cpumask is immediately applied upon
thread launch. This does not affect kernel threads that specify cpu
and node.
Note: this is based off of Christoph Lameter's patch at
https://lwn.net/Articles/565932/ with the only difference being
the kernel parameter changed from kthread to kthread_cpus.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
[VT: The existing "isolcpus"
kernel bootarg, cgroup/cpuset, and taskset might provide the some
way to have cpu isolation. However none of them satisfies the requirements.
Replacing spaces with tabs. Combine two calls of set_cpus_allowed_ptr()
in kernel_init_freeable() in init/main.c into one. Performed tests]
Signed-off-by: Vu Tran <vu.tran@windriver.com>
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Zhang Zhiguo <zhangzhg@neusoft.com>
Signed-off-by: Vefa Bicakci <vefa.bicakci@windriver.com>
[jm: Adapted the patch for context changes.]
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
---
.../admin-guide/kernel-parameters.txt | 10 ++++++++
include/linux/cpumask.h | 3 +++
init/main.c | 2 ++
kernel/cpu.c | 23 +++++++++++++++++++
kernel/kthread.c | 4 ++--
kernel/umh.c | 3 +++
6 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a19a3005b545..6f79c718ae4f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2223,6 +2223,16 @@
See also Documentation/trace/kprobetrace.rst "Kernel
Boot Parameter" section.
+ kthread_cpus= [KNL, SMP] Only run kernel threads on the specified
+ list of processors. The kernel will start threads
+ on the indicated processors only (unless there
+ are specific reasons to run a thread with
+ different affinities). This can be used to make
+ init start on certain processors and also to
+ control where kmod and other user space threads
+ are being spawned. Allows to keep kernel threads
+ away from certain cores unless absoluteluy necessary.
+
kpti= [ARM64] Control page table isolation of user
and kernel address spaces.
Default: enabled on cores which need mitigation.
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 383684e30f12..8fcc67ea3d8c 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -55,6 +55,7 @@ extern unsigned int nr_cpu_ids;
* cpu_present_mask - has bit 'cpu' set iff cpu is populated
* cpu_online_mask - has bit 'cpu' set iff cpu available to scheduler
* cpu_active_mask - has bit 'cpu' set iff cpu available to migration
+ * cpu_kthread_mask - has bit 'cpu' set iff general kernel threads allowed
*
* If !CONFIG_HOTPLUG_CPU, present == possible, and active == online.
*
@@ -91,10 +92,12 @@ extern struct cpumask __cpu_possible_mask;
extern struct cpumask __cpu_online_mask;
extern struct cpumask __cpu_present_mask;
extern struct cpumask __cpu_active_mask;
+extern struct cpumask __cpu_kthread_mask;
#define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
#define cpu_online_mask ((const struct cpumask *)&__cpu_online_mask)
#define cpu_present_mask ((const struct cpumask *)&__cpu_present_mask)
#define cpu_active_mask ((const struct cpumask *)&__cpu_active_mask)
+#define cpu_kthread_mask ((const struct cpumask *)&__cpu_kthread_mask)
extern atomic_t __num_online_cpus;
diff --git a/init/main.c b/init/main.c
index db693781a12f..4e7777bdab6e 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1536,6 +1536,8 @@ static noinline void __init kernel_init_freeable(void)
do_basic_setup();
+ set_cpus_allowed_ptr(current, cpu_kthread_mask);
+
kunit_run_all_tests();
console_on_rootfs();
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 016f2d0686b6..6783db02e9ae 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2505,6 +2505,29 @@ EXPORT_SYMBOL(__cpu_active_mask);
atomic_t __num_online_cpus __read_mostly;
EXPORT_SYMBOL(__num_online_cpus);
+struct cpumask __cpu_kthread_mask __read_mostly
+ = {CPU_BITS_ALL};
+EXPORT_SYMBOL(__cpu_kthread_mask);
+
+static int __init kthread_setup(char *str)
+{
+ cpumask_var_t tmp_mask;
+ int err;
+
+ alloc_bootmem_cpumask_var(&tmp_mask);
+
+ err = cpulist_parse(str, tmp_mask);
+ if (!err)
+ cpumask_copy(&__cpu_kthread_mask, tmp_mask);
+ else
+ pr_err("Cannot parse 'kthread_cpus=%s'; error %d\n", str, err);
+
+ free_bootmem_cpumask_var(tmp_mask);
+
+ return 1;
+}
+__setup("kthread_cpus=", kthread_setup);
+
void init_cpu_present(const struct cpumask *src)
{
cpumask_copy(&__cpu_present_mask, src);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 3ce6a31db7b4..683008e94fd4 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -300,7 +300,7 @@ static int kthread(void *_create)
* back to default in case they have been changed.
*/
sched_setscheduler_nocheck(current, SCHED_NORMAL, &param);
- set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_FLAG_KTHREAD));
+ set_cpus_allowed_ptr(current, cpu_kthread_mask);
/* OK, tell user we're spawned, wait for stop or wakeup */
__set_current_state(TASK_UNINTERRUPTIBLE);
@@ -655,7 +655,7 @@ int kthreadd(void *unused)
/* Setup a clean context for our children to inherit. */
set_task_comm(tsk, "kthreadd");
ignore_signals(tsk);
- set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_FLAG_KTHREAD));
+ set_cpus_allowed_ptr(tsk, cpu_kthread_mask);
set_mems_allowed(node_states[N_MEMORY]);
current->flags |= PF_NOFREEZE;
diff --git a/kernel/umh.c b/kernel/umh.c
index 3f646613a9d3..e5027cee43f7 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -80,6 +80,9 @@ static int call_usermodehelper_exec_async(void *data)
*/
current->fs->umask = 0022;
+ /* We can run only where init is allowed to run. */
+ set_cpus_allowed_ptr(current, cpu_kthread_mask);
+
/*
* Our parent (unbound workqueue) runs with elevated scheduling
* priority. Avoid propagating that into the userspace child.
--
2.29.2