Power Metrics Enablement - vRAN Integration

Story: 2011065
Task:  50966

Change-Id: I6dc6d615018a8009ab564068197d3c2b61436503
Signed-off-by: Ngairangbam Mili <ngairangbam.mili@windriver.com>
This commit is contained in:
Ngairangbam Mili 2024-08-28 04:53:30 +00:00
parent fa80e22983
commit cf43882618

View File

@ -54,7 +54,7 @@ Uncore events can only be loaded from the following cpu models:
* - 0xCF
- Intel Emerald Rapids X
Source: https://github.com/influxdata/telegraf/issues/13098#issuecomment-1512585422
Source: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_powerstat/README.md#supported-cpu-models
.. rubric:: |proc|
@ -126,120 +126,285 @@ You can activate the Intel |PMU| plugin with the following command:
| | |
+----------------+-------------------+
Override Input Plugins
----------------------
Override intel_powerstat plugin
-------------------------------
You can change the default input plugins parameters by override.
You can change the default ``intel_powerstat`` plugin parameters by override.
The plugin parameters include CPU and package metrics, and also the read method
of |MSR|.
The default plugin parameters include CPU and package metrics.
The list of available options for both CPU and package metrics can be found on
the powerstat documentation:
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_powerstat/README.md#configuration
It is worth noting that when overriding, the user must inform both metrics
parameters (cpu and package), otherwise the plugin would stop collecting the
missing metrics.
.. note::
The ``read_method`` parameter specifies the reading method of |MSR|. This
parameter accepts two values, concurrent or sequential. The default is
concurrent. Concurrent method uses goroutines to read each |MSR| value
concurrently.
The sequential method reads each value sequentially. This reduces latency
overhead when using preempt-rt kernel with isolated cores, but might cause loss
of precision on metrics calculation.
When overriding, you must inform both metrics parameters (CPU and package),
otherwise the plugin would stop collecting the missing metrics.
Example of overriding the powerstat plugin:
.. code-block:: none
.. rubric:: |proc|
[sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml
config:
inputs:
intel_powerstat:
read_method: "sequential"
cpu_metrics: ["cpu_frequency","cpu_busy_frequency","cpu_temperature","cpu_c0_state_residency","cpu_c1_state_residency","cpu_c6_state_residency","cpu_busy_cycles"]
package_metrics: ["current_power_consumption","current_dram_power_consumption","thermal_design_power","cpu_base_frequency"]
#. Update the input parameters.
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml
.. code-block:: none
+----------------+--------------------------------------+
| Property | Value |
+----------------+--------------------------------------+
| name | telegraf |
| namespace | power-metrics |
| user_overrides | config: |
| | intel_powerstat: |
| | cpu_metrics: |
| | - cpu_frequency |
| | - cpu_busy_frequency |
| | - cpu_temperature |
| | - cpu_c0_state_residency |
| | - cpu_c1_state_residency |
| | - cpu_c6_state_residency |
| | - cpu_busy_cycles |
| | package_metrics: |
| | - current_power_consumption |
| | - current_dram_power_consumption |
| | - thermal_design_power |
| | - cpu_base_frequency |
| | read_method: sequential |
| | |
+----------------+--------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml
config:
inputs:
# Default plugins to collect power-metrics data
- intel_powerstat:
cpu_metrics:
- "cpu_frequency"
- "cpu_busy_frequency"
- "cpu_temperature"
- "cpu_c0_state_residency"
- "cpu_c1_state_residency"
- "cpu_c6_state_residency"
- "cpu_busy_cycles"
package_metrics:
- "current_power_consumption"
- "current_dram_power_consumption"
- "thermal_design_power"
- "cpu_base_frequency"
- "uncore_frequency"
- intel_pmu:
event_definitions:
- "/etc/telegraf/events_definition.json"
core_events:
- events:
- INST_RETIRED.ANY
- linux_cpu:
metrics: ["cpufreq"]
Re-apply the app.
#. Apply the override.
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml
+----------------+------------------------------------------------+
| Property | Value |
+----------------+------------------------------------------------+
| name | telegraf |
| namespace | power-metrics |
| user_overrides | config: |
| | inputs: |
| | - intel_powerstat: |
| | cpu_metrics: |
| | - cpu_frequency |
| | - cpu_busy_frequency |
| | - cpu_temperature |
| | - cpu_c0_state_residency |
| | - cpu_c1_state_residency |
| | - cpu_c6_state_residency |
| | - cpu_busy_cycles |
| | package_metrics: |
| | - current_power_consumption |
| | - current_dram_power_consumption |
| | - thermal_design_power |
| | - cpu_base_frequency |
| | - uncore_frequency |
| | - intel_pmu: |
| | event_definitions: |
| | - "/etc/telegraf/events_definition.json" |
| | core_events: |
| | - events: |
| | - INST_RETIRED.ANY |
| | - linux_cpu: |
| | metrics: ["cpufreq"] |
| | |
+----------------+------------------------------------------------+
#. Re-apply the application.
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
.. note::
Power Metrics may increase the scheduling latency due to perf and |MSR|
readings. It was observed that there was a latency impact of around 3 µs on
average, plus spikes with significant increases in maximum latency values.
There was also an impact on the kernel processing time. Applications that
run with priorities at or above 50 in real time kernel isolated CPUs should
allow kernel services to avoid unexpected system behavior.
Configuration Requirement for Power Metrics and linux_cpu
---------------------------------------------------------
If the BIOS is not configured to delegate control to the operating system, the
``linux_cpu`` metrics may not function as expected. Remove ``linux_cpu`` to ensure that
power-metrics operate correctly. In this case, metrics generated by ``linux_cpu``
will not be available.
To verify that the BIOS is properly configured, a frequency driver should be
loaded in Linux. You can check this by running the :command:`cpupower frequency-info` command.
Example:
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
sysadmin@controller-0:~$ cpupower frequency-info
analyzing CPU 0:
driver: intel_pstate
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: Cannot determine or is not supported.
hardware limits: 800 MHz - 3.60 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 800 MHz and 2.50 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 2.50 GHz (asserted by call to kernel)
boost state support:
Supported: yes
Active: yes
If there is no delegation from the BIOS to the operating system, the ``linux_cpu``
module may fail to function correctly. To enable power-metrics, it is necessary
to remove the ``linux_cpu`` module. In this scenario, the performance metrics
generated by the ``linux_cpu`` module will not be available.
Add input plugins
Example:
.. code-block:: none
sysadmin@compute-0:~$ cpupower frequency-info
analyzing CPU 0:
no or unknown cpufreq driver is active on this CPU
CPUs which run at the same hardware frequency: Not Available
CPUs which need to have their frequency coordinated by software: Not Available
maximum transition latency: Cannot determine or is not supported.
Not Available
available cpufreq governors: Not Available
Unable to determine current policy
current CPU frequency: Unable to call hardware
current CPU frequency: Unable to call to kernel
boost state support:
Supported: yes
Active: yes
Intel Power Stat Configuration Behavior
---------------------------------------
This section describes the expected behavior for the [[inputs.intel_powerstat]]
configuration for different configuration scenarios:
- Empty configuration
When the ``platform_metrics`` parameter is set to an empty array, as shown
below, all the metrics should be restricted from being returned. This means, no
metrics will be provided in this configuration.
[[inputs.intel_powerstat]]
platform_metrics = []
- Default configuration
With either the default configuration or when the [[inputs.intel_powerstat]]
input is used without specifying platform_metrics, only the following metrics
should be enabled:
current_power_consumption
current_dram_power_consumption
thermal_design_power
This default behavior ensures that only the essential power consumption metrics
are collected.
- Specific platform metrics
If specific metrics are enabled using the following ``platform_metrics``
parameter, only the metrics specified in the ``platform_metrics`` array will be
returned. No other metrics will be included beyond the explicitly listed ones.
[[inputs.intel_powerstat]]
platform_metrics = ["cpu_base_frequency", ...]
Add Input Plugins
-----------------
You can add new plugins overriding the plugins column.
You can add new plugins by overriding the inputs parameter.
#. Add the cgroups plugin.
Example of overriding the powerstat plugin:
#. Add the ``cpu_c3_state_residency`` metric to the ``intel_powerstat/cpu_metrics`` plugin.
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-cgroups.yaml
[sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml
config:
inputs:
- cgroup:
paths: ["/sys/fs/cgroup/cpu","/sys/fs/cgroup/cpu/*","/sys/fs/cgroup/cpu/*/*",]
files: ["cpuacct.usage", "cpuacct.usage_percpu", "cpu.cfs_period_us", "cpu.cfs_quota_us", "cpu.shares", "cpu.stat"]
# Default plugins to collect power-metrics data
- intel_powerstat:
cpu_metrics:
- "cpu_frequency"
- "cpu_busy_frequency"
- "cpu_temperature"
- "cpu_c0_state_residency"
- "cpu_c1_state_residency"
- "cpu_c3_state_residency"
- "cpu_c6_state_residency"
- "cpu_busy_cycles"
package_metrics:
- "current_power_consumption"
- "current_dram_power_consumption"
- "thermal_design_power"
- "cpu_base_frequency"
- "uncore_frequency"
- intel_pmu:
event_definitions:
- "/etc/telegraf/events_definition.json"
core_events:
- events:
- INST_RETIRED.ANY
- linux_cpu:
metrics: ["cpufreq"]
#. Apply the override.
.. code-block:: none
system helm-override-update power-metrics telegraf power-metrics --values /path/to/file.yaml
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-cgroups.yaml
+----------------+--------------------------------+
| Property | Value |
+----------------+--------------------------------+
| name | telegraf |
| namespace | power-metrics |
| user_overrides | config: |
| | inputs: |
| | - cgroup: |
| | files: |
| | - cpuacct.usage |
| | - cpuacct.usage_percpu |
| | - cpu.cfs_period_us |
| | - cpu.cfs_quota_us |
| | - cpu.shares |
| | - cpu.stat |
| | paths: |
| | - /sys/fs/cgroup/cpu |
| | - /sys/fs/cgroup/cpu/* |
| | - /sys/fs/cgroup/cpu/*/* |
| | |
+----------------+--------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml
+----------------+------------------------------------------------+
| Property | Value |
+----------------+------------------------------------------------+
| name | telegraf |
| namespace | power-metrics |
| user_overrides | config: |
| | inputs: |
| | - intel_powerstat: |
| | cpu_metrics: |
| | - cpu_frequency |
| | - cpu_busy_frequency |
| | - cpu_temperature |
| | - cpu_c0_state_residency |
| | - cpu_c1_state_residency |
| | - cpu_c3_state_residency |
| | - cpu_c6_state_residency |
| | - cpu_busy_cycles |
| | package_metrics: |
| | - current_power_consumption |
| | - current_dram_power_consumption |
| | - thermal_design_power |
| | - cpu_base_frequency |
| | - uncore_frequency |
| | - intel_pmu: |
| | event_definitions: |
| | - "/etc/telegraf/events_definition.json" |
| | core_events: |
| | - events: |
| | - INST_RETIRED.ANY |
| | - linux_cpu: |
| | metrics: ["cpufreq"] |
| | |
+----------------+------------------------------------------------+
#. Re-apply the application.
@ -247,57 +412,79 @@ You can add new plugins overriding the plugins column.
[sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
#. If required, add configmap and volumes via override.
.. code-block:: none
volumes:
- name: telegraf-example
configMap:
name: telegraf-example
mountPoints:
- name: telegraf-example
mountPath: /path/to/file.json
subPath: file.json
.. code-block:: none
system helm-override-update power-metrics telegraf power-metrics --values /path/to/file.yaml
For more information on Telegraf plugins, see
https://github.com/influxdata/telegraf#documentation.
Remove input plugins
Remove Input Plugins
--------------------
You can remove plugins by setting their value to false in the plugins column.
You can remove plugins by overriding the inputs parameter.
#. Remove the cgroups plugin.
#. Remove the ``linux_cpu`` plugin.
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-cgroups.yaml
[sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml
config:
inputs:
- cgroup: false
# Default plugins to collect power-metrics data
- intel_powerstat:
cpu_metrics:
- "cpu_frequency"
- "cpu_busy_frequency"
- "cpu_temperature"
- "cpu_c0_state_residency"
- "cpu_c1_state_residency"
- "cpu_c3_state_residency"
- "cpu_c6_state_residency"
- "cpu_busy_cycles"
package_metrics:
- "current_power_consumption"
- "current_dram_power_consumption"
- "thermal_design_power"
- "cpu_base_frequency"
- "uncore_frequency"
- intel_pmu:
event_definitions:
- "/etc/telegraf/events_definition.json"
core_events:
- events:
- INST_RETIRED.ANY
#. Apply the override.
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values ./telegraf-cgroups.yaml
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml
+----------------+-------------------+
| Property | Value |
+----------------+-------------------+
| name | telegraf |
| namespace | power-metrics |
| user_overrides | config: |
| | inputs: |
| | - cgroup: false |
| | |
+----------------+-------------------+
+----------------+------------------------------------------------+
| Property | Value |
+----------------+------------------------------------------------+
| name | telegraf |
| namespace | power-metrics |
| user_overrides | config: |
| | inputs: |
| | - intel_powerstat: |
| | cpu_metrics: |
| | - cpu_frequency |
| | - cpu_busy_frequency |
| | - cpu_temperature |
| | - cpu_c0_state_residency |
| | - cpu_c1_state_residency |
| | - cpu_c3_state_residency |
| | - cpu_c6_state_residency |
| | - cpu_busy_cycles |
| | package_metrics: |
| | - current_power_consumption |
| | - current_dram_power_consumption |
| | - thermal_design_power |
| | - cpu_base_frequency |
| | - uncore_frequency |
| | - intel_pmu: |
| | event_definitions: |
| | - "/etc/telegraf/events_definition.json" |
| | core_events: |
| | - events: |
| | - INST_RETIRED.ANY |
| | |
+----------------+------------------------------------------------+
#. Re-apply the application.
@ -305,7 +492,7 @@ You can remove plugins by setting their value to false in the plugins column.
[sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
Modify Telegraf data collection interval
Modify Telegraf Data Collection Interval
----------------------------------------
Telegraf report its metrics each 10 seconds, but you can modify this time
@ -319,7 +506,30 @@ interval with the following command:
cAdvisor
--------
Enable and disable Perf Events on cAdvisor
Enable or Disable cAdvisor
--------------------------
To enable or disable cAdvisor, use the following command:
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics cadvisor power-metrics --set cadvisor_enabled=true
+----------------+------------------------+
| Property | Value |
+----------------+------------------------+
| name | cadvisor |
| namespace | power-metrics |
| user_overrides | cadvisor_enabled: true |
| | |
+----------------+------------------------+
Reapply the power-metrics application and wait for the pod to restart.
.. code-block:: none
[sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
Enable and Disable Perf Events on cAdvisor
------------------------------------------
To enable or disable Perf Events on cAdvisor, use the following command: