Updates related to optimizing cpu scheduling sections

Deleted only-downstream topic.

Signed-off-by: egoncalv <elisamaraaoki.goncalves@windriver.com>
Change-Id: I28f8ed892f2828f5c3281cc311d3fdd3ac7ce79a
This commit is contained in:
egoncalv 2021-08-17 14:49:32 -03:00
parent 1a2b9390b8
commit f58be1ae68
2 changed files with 0 additions and 240 deletions

View File

@ -13,7 +13,6 @@ StarlingX Kubernetes
:maxdepth: 1
about-the-admin-tutorials
installing-and-running-cpu-manager-for-kubernetes
----------------------
Application management

View File

@ -1,239 +0,0 @@
.. jme1561551450093
.. _installing-and-running-cpu-manager-for-kubernetes:
==========================================
Install and Run CPU Manager for Kubernetes
==========================================
You must install Helm charts and label worker nodes appropriately before using
CMK.
.. rubric:: |context|
Perform the following steps to enable CMK on a cluster.
.. rubric:: |proc|
#. Apply the **cmk-node** label to each worker node to be managed using CMK.
For example:
.. code-block:: none
~(keystone)admin)$ system host-lock worker-0
~(keystone)admin)$ system host-label-assign worker-0 cmk-node=enabled
+-------------+--------------------------------------+
| Property | Value |
+-------------+--------------------------------------+
| uuid | 2909d775-cd6c-4bc1-8268-27499fe38d5e |
| host_uuid | 1f00d8a4-f520-41ee-b608-1b50054b1cd8 |
| label_key | cmk-node |
| label_value | enabled |
+-------------+--------------------------------------+
~(keystone)admin)$ system host-unlock worker-0
#. Perform the following steps if you have not specified CMK at Ansible
Bootstrap in the localhost.yml file:
#. On the active controller, run the following command to generate the
username and password to be used for Docker login.
This command generates the username and password to be used for Docker
login.
.. code-block:: none
$ sudo python /usr/share/ansible/stx-ansible/playbooks/roles/common/push-docker-images/files/get_registry_auth.py 625619392498.dkr.ecr.us-west-2.amazonaws.com <Access_Key_ID_from_Wind_Share> <Secret_Access_Key_from_Wind_Share>
#. Run the Docker login command:
.. code-block:: none
~(keystone)admin)$ sudo docker login 625619392498.dkr.ecr.us-west-2.amazonaws.com -u AWS -p <password_returned_from_first_cmd>
#. Pull the CMK image from the AWS registry.
.. code-block:: none
~(keystone)admin)$ sudo docker pull 625619392498.dkr.ecr.us-west-2.amazonaws.com/docker.io/starlingx/master/latest_image_build
#. Tag the image, by using the following command:
.. code-block:: none
~(keystone)admin)$ sudo docker image tag 625619392498.dkr.ecr.us-west-2.amazonaws.com/docker.io/starlingx/master/latest_image_build
#. Authenticate the local registry, by using the following command:
.. code-block:: none
~(keystone)admin)$ sudo docker login registry.local:9001 -u admin -p <admin_passwd>
#. Push the image, by using the following command:
.. code-block:: none
~(keystone)admin)$ sudo docker image push registry.local:9001/docker.io/wind-river/cmk:WRCP.20.01-v1.3.1-15-ge3df769-1
#. On all configurations with two controllers, after the CMK Docker image has
been pulled, tagged \(with the local registry\), and pushed \(to the local
registry\), the admin user should log in to the inactive controller and run
the following commands:
For example:
.. code-block:: none
~(keystone)admin)$ sudo docker login registry.local:9001 -u admin -p <admin_passwd>
~(keystone)admin)$ sudo docker image pull tis-lab-registry.cumulus.wrs.com:9001/wrcp-staging/docker.io/wind-river/cmk:WRCP.20.01-v1.3.1-15-ge3df769-1
#. Configure any isolated CPUs on worker nodes in order to reduce host OS
impacts on latency for tasks running on Isolated CPUs.
Any container tasks running on isolated CPUs will have to explicitly manage
their own affinity, the process scheduler will ignore them completely.
.. note::
The following commands are examples only, the admin user must specify
the number of CPUs per processor based on the node CPU topology.
.. code-block:: none
~(keystone)admin)$ system host-lock worker-1
~(keystone)admin)$ system host-cpu-modify -f platform -p0 1 worker-1
~(keystone)admin)$ system host-cpu-modify -f application-isolated -p0 15 worker-1
~(keystone)admin)$ system host-cpu-modify -f application-isolated -p1 15 worker-1
~(keystone)admin)$ system host-unlock worker-1
This sets one platform core and 15 application-isolated cores on NUMA node
0, and 15 application-isolated cores on NUMA node 1. At least one CPU must
be left unspecified, which will cause it to be an application CPU.
#. Run the /opt/extracharts/cpu-manager-k8s-setup.sh helper script to install
the CMK Helm charts used to configure the system for CMK.
#. Before running this command, untar files listed in /opt/extracharts.
.. code-block:: none
~(keystone)admin)$ cd /opt/extracharts
~(keystone)admin)$ sudo tar -xvf cpu-manager-k8s-init-1.3.1.tgz
~(keystone)admin)$ sudo tar -xvf cpu-manager-k8s-webhook-1.3.1.tgz
~(keystone)admin)$ sudo tar -xvf cpu-manager-k8s-1.3.1.tgz
#. Run the script.
The script is located in the /opt/extracharts directory of the active
controller.
For example:
.. code-block:: none
~(keystone)admin)$ cd /opt/extracharts
~(keystone)admin)$ ./cpu-manager-k8s-setup.sh
The following actions are performed:
- The **cpu-manager-k8s-init** chart is installed. This will create a
service account and set up rules-based access control.
- A webhook is created to insert the appropriate resources into pods
that request CMK resources. \(This will result in one pod running.\)
- A daemonset is created for the per-CMK-node pod that will handle
all CMK operations on that node.
- **cmk-webhook-deployment** is launched on the controller and
**cpu-manager-k8s-cmk-default** is launched on the worker.
By default, each node will have one available CPU allocated to the
shared pool, and all the rest allocated to the exclusive pool. The
platform CPUs will be ignored.
#. Add more CPUs to the shared pool.
#. Override the allocation via per-node Helm chart overrides on the
**cpu-manager-k8s** Helm chart.
.. code-block:: none
$ cat <<EOF > /home/sysadmin/worker-0-cmk-overrides.yml
# For NUM_EXCLUSIVE_CORES a value of -1 means
# "all available cores after infra and shared
# cores have been allocated".
# NUM_SHARED_CORES must be at least 1.
conf:
cmk:
NUM_EXCLUSIVE_CORES: -1
NUM_SHARED_CORES: 1
overrides:
cpu-manager-k8s_cmk:
hosts:
- name: worker-0
conf:
cmk:
NUM_SHARED_CORES: 2
EOF
#. Apply the override.
.. code-block:: none
$ helm upgrade cpu-manager cpu-manager-k8s --reuse-values -f /home/sysadmin/worker-0-cmk-overrides.yml
#. After CMK has been installed, run the following command to patch the
webhook to pull the image, if required for future use:
.. code-block:: none
~(keystone)admin)$ kubectl -n kube-system patch deploy cmk-webhook-deployment \
-p '{"spec":{"template":{"spec":{"containers":[{"name":"cmk-webhook",\
"imagePullPolicy":"IfNotPresent"}]}}}}'
.. rubric:: |postreq|
Once CMK is set up, you can run workloads as described at `https://github.com/intel/CPU-Manager-for-Kubernetes <https://github.com/intel/CPU-Manager-for-Kubernetes>`__,
with the following caveats:
- When using CMK, the application pods should not specify requests or limits
for the **cpu** resource.
When running a container with :command:`cmk isolate --pool=exclusive`, the
**cpu** resource should be superseded by the
:command:`cmk.intel.com/exclusive-cores` resource.
When running a container with :command:`cmk isolate --pool=shared` or
:command:`cmk isolate --pool=infra`, the **cpu** resource has no meaning as
Kubelet assumes it has access to all the CPUs rather than just the
**infra** or **shared** ones and this confuses the resource tracking.
- There is a known issue with resource tracking if a node with running
CMK-isolated applications suffers an uncontrolled reboot. The suggested
workaround is to wait for it to come back up, then lock/unlock the node.
- When using the :command:`cmk isolate --socket-id` command to run an
application on a particular socket, there can be complications with
scheduling because the Kubernetes scheduler isn't NUMA-aware. A pod can be
scheduled to a kubernetes node that has enough resources across all NUMA
nodes, but then a container trying to run :command:`cmk isolate --socket-id=<X>`
can lead to a run-time error if there are not enough resources on that
particular NUMA node:
.. code-block:: none
~(keystone)admin)$ kubectl logs cmk-isolate-pod
[6] Failed to execute script cmk
Traceback (most recent call last):
File "cmk.py", line 162, in <module> main()
File "cmk.py", line 127, in main args["--socket-id"])
File "intel/isolate.py", line 57, in isolate.format(pool_name))
SystemError: Not enough free cpu lists in pool
.. From step 1
.. xbooklink For more information on node labeling, see |node-doc|: :ref:`Configure Node Labels from the CLI <assigning-node-labels-from-the-cli>`.
.. From step 2
.. xreflink For more information, see |inst-doc|: :ref:`Bootstrap and Deploy Cloud Platform <bootstrapping-and-deploying-starlingx>`.