metal/mtce/debian/deb_folder/pmon.service
Jim Gauld d368475197 Configure systemd CPUShares/CPUQuota for pmon.service
This updates CPUShares and CPUQuota for pmon.service.
This gives reduced shares and quota since pmon.service has sporadic CPU
usage yet is not latency critical. Significant hirunner CPU usage comes
from various audits (unrelated to pmon process itself) running under the
systemd pmon.service cgroup.

For example: ceph health audit, ceph osd audit, can easily require 100%
cpu for several seconds, often taking 30% occupancy for multiple
seconds.

This reduces pmon cgroup to 150 CPUShare from 1024 and sets
CPUQUota 15%. This smoothes out behaviour of poorly behaved audits.
This effectively slows down the audit behaviour by a few seconds due
to throttling.

This is part of an overall set of adjustments are required for systemd
cgroups CPUShares, CPUQuota, and AllowedCPUs for key system services.
This will improve latency of Kubernetes critical components, and
throttles lesser important services.

Partial-Bug: 2084714

TEST PLAN:
AIO-SX, AIO-DX, Standard, Storage, DC, AIO-DX with ceph:
- PASS: Fresh install
- PASS: verify systemd parameters for pmon

        Example:
        systemctl show pmon.service | grep -e CPUShares -e CPUQuota

AIO-SX, AIO-DX:
- PASS: B&R

AIO-DX:
- PASS: K8S orchestrated upgrade 1.24 - 1.29
- TODO: controller swact

Change-Id: I6ee5c6029c2a5a0fae26e9231401e4d4f1c016df
Signed-off-by: Jim Gauld <James.Gauld@windriver.com>
2024-11-15 09:21:05 -05:00

31 lines
708 B
Desktop File

[Unit]
Description=StarlingX Maintenance Process Monitor
After=config.service
# The following thirdparty service files are not modified by StarlingX,
# so add "After" clauses here rather than "Before=pmon.service" to those
After=sshd.service acpid.service syslog-ng.service
After=ntpd.service ptp4l.service phc2sys.service
Before=hostw.service
[Service]
Type=forking
ExecStart=/etc/init.d/pmon start
ExecStop=/etc/init.d/pmon stop
ExecReload=/etc/init.d/pmon reload
PIDFile=/var/run/pmond.pid
KillMode=process
# Failure handling
TimeoutStartSec=10s
TimeoutStopSec=10s
Restart=always
RestartSec=2
# cgroup engineering
CPUShares=150
CPUQuota=15%
CPUQuotaPeriodSec=10ms
[Install]
WantedBy=multi-user.target