Power Metrics Enablement - vRAN Integration
Story: 2011065 Task: 50966 Change-Id: I6dc6d615018a8009ab564068197d3c2b61436503 Signed-off-by: Ngairangbam Mili <ngairangbam.mili@windriver.com>
This commit is contained in:
parent
fa80e22983
commit
cf43882618
@ -54,7 +54,7 @@ Uncore events can only be loaded from the following cpu models:
|
||||
* - 0xCF
|
||||
- Intel Emerald Rapids X
|
||||
|
||||
Source: https://github.com/influxdata/telegraf/issues/13098#issuecomment-1512585422
|
||||
Source: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_powerstat/README.md#supported-cpu-models
|
||||
|
||||
.. rubric:: |proc|
|
||||
|
||||
@ -126,120 +126,285 @@ You can activate the Intel |PMU| plugin with the following command:
|
||||
| | |
|
||||
+----------------+-------------------+
|
||||
|
||||
Override Input Plugins
|
||||
----------------------
|
||||
|
||||
Override intel_powerstat plugin
|
||||
-------------------------------
|
||||
You can change the default input plugins parameters by override.
|
||||
|
||||
You can change the default ``intel_powerstat`` plugin parameters by override.
|
||||
|
||||
The plugin parameters include CPU and package metrics, and also the read method
|
||||
of |MSR|.
|
||||
The default plugin parameters include CPU and package metrics.
|
||||
|
||||
The list of available options for both CPU and package metrics can be found on
|
||||
the powerstat documentation:
|
||||
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_powerstat/README.md#configuration
|
||||
|
||||
It is worth noting that when overriding, the user must inform both metrics
|
||||
parameters (cpu and package), otherwise the plugin would stop collecting the
|
||||
missing metrics.
|
||||
.. note::
|
||||
|
||||
The ``read_method`` parameter specifies the reading method of |MSR|. This
|
||||
parameter accepts two values, concurrent or sequential. The default is
|
||||
concurrent. Concurrent method uses goroutines to read each |MSR| value
|
||||
concurrently.
|
||||
|
||||
The sequential method reads each value sequentially. This reduces latency
|
||||
overhead when using preempt-rt kernel with isolated cores, but might cause loss
|
||||
of precision on metrics calculation.
|
||||
When overriding, you must inform both metrics parameters (CPU and package),
|
||||
otherwise the plugin would stop collecting the missing metrics.
|
||||
|
||||
Example of overriding the powerstat plugin:
|
||||
|
||||
.. code-block:: none
|
||||
.. rubric:: |proc|
|
||||
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml
|
||||
config:
|
||||
inputs:
|
||||
intel_powerstat:
|
||||
read_method: "sequential"
|
||||
cpu_metrics: ["cpu_frequency","cpu_busy_frequency","cpu_temperature","cpu_c0_state_residency","cpu_c1_state_residency","cpu_c6_state_residency","cpu_busy_cycles"]
|
||||
package_metrics: ["current_power_consumption","current_dram_power_consumption","thermal_design_power","cpu_base_frequency"]
|
||||
#. Update the input parameters.
|
||||
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml
|
||||
.. code-block:: none
|
||||
|
||||
+----------------+--------------------------------------+
|
||||
| Property | Value |
|
||||
+----------------+--------------------------------------+
|
||||
| name | telegraf |
|
||||
| namespace | power-metrics |
|
||||
| user_overrides | config: |
|
||||
| | intel_powerstat: |
|
||||
| | cpu_metrics: |
|
||||
| | - cpu_frequency |
|
||||
| | - cpu_busy_frequency |
|
||||
| | - cpu_temperature |
|
||||
| | - cpu_c0_state_residency |
|
||||
| | - cpu_c1_state_residency |
|
||||
| | - cpu_c6_state_residency |
|
||||
| | - cpu_busy_cycles |
|
||||
| | package_metrics: |
|
||||
| | - current_power_consumption |
|
||||
| | - current_dram_power_consumption |
|
||||
| | - thermal_design_power |
|
||||
| | - cpu_base_frequency |
|
||||
| | read_method: sequential |
|
||||
| | |
|
||||
+----------------+--------------------------------------+
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml
|
||||
config:
|
||||
inputs:
|
||||
# Default plugins to collect power-metrics data
|
||||
- intel_powerstat:
|
||||
cpu_metrics:
|
||||
- "cpu_frequency"
|
||||
- "cpu_busy_frequency"
|
||||
- "cpu_temperature"
|
||||
- "cpu_c0_state_residency"
|
||||
- "cpu_c1_state_residency"
|
||||
- "cpu_c6_state_residency"
|
||||
- "cpu_busy_cycles"
|
||||
package_metrics:
|
||||
- "current_power_consumption"
|
||||
- "current_dram_power_consumption"
|
||||
- "thermal_design_power"
|
||||
- "cpu_base_frequency"
|
||||
- "uncore_frequency"
|
||||
- intel_pmu:
|
||||
event_definitions:
|
||||
- "/etc/telegraf/events_definition.json"
|
||||
core_events:
|
||||
- events:
|
||||
- INST_RETIRED.ANY
|
||||
- linux_cpu:
|
||||
metrics: ["cpufreq"]
|
||||
|
||||
Re-apply the app.
|
||||
#. Apply the override.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml
|
||||
|
||||
+----------------+------------------------------------------------+
|
||||
| Property | Value |
|
||||
+----------------+------------------------------------------------+
|
||||
| name | telegraf |
|
||||
| namespace | power-metrics |
|
||||
| user_overrides | config: |
|
||||
| | inputs: |
|
||||
| | - intel_powerstat: |
|
||||
| | cpu_metrics: |
|
||||
| | - cpu_frequency |
|
||||
| | - cpu_busy_frequency |
|
||||
| | - cpu_temperature |
|
||||
| | - cpu_c0_state_residency |
|
||||
| | - cpu_c1_state_residency |
|
||||
| | - cpu_c6_state_residency |
|
||||
| | - cpu_busy_cycles |
|
||||
| | package_metrics: |
|
||||
| | - current_power_consumption |
|
||||
| | - current_dram_power_consumption |
|
||||
| | - thermal_design_power |
|
||||
| | - cpu_base_frequency |
|
||||
| | - uncore_frequency |
|
||||
| | - intel_pmu: |
|
||||
| | event_definitions: |
|
||||
| | - "/etc/telegraf/events_definition.json" |
|
||||
| | core_events: |
|
||||
| | - events: |
|
||||
| | - INST_RETIRED.ANY |
|
||||
| | - linux_cpu: |
|
||||
| | metrics: ["cpufreq"] |
|
||||
| | |
|
||||
+----------------+------------------------------------------------+
|
||||
|
||||
#. Re-apply the application.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
|
||||
|
||||
.. note::
|
||||
|
||||
Power Metrics may increase the scheduling latency due to perf and |MSR|
|
||||
readings. It was observed that there was a latency impact of around 3 µs on
|
||||
average, plus spikes with significant increases in maximum latency values.
|
||||
There was also an impact on the kernel processing time. Applications that
|
||||
run with priorities at or above 50 in real time kernel isolated CPUs should
|
||||
allow kernel services to avoid unexpected system behavior.
|
||||
|
||||
|
||||
Configuration Requirement for Power Metrics and linux_cpu
|
||||
---------------------------------------------------------
|
||||
|
||||
If the BIOS is not configured to delegate control to the operating system, the
|
||||
``linux_cpu`` metrics may not function as expected. Remove ``linux_cpu`` to ensure that
|
||||
power-metrics operate correctly. In this case, metrics generated by ``linux_cpu``
|
||||
will not be available.
|
||||
|
||||
To verify that the BIOS is properly configured, a frequency driver should be
|
||||
loaded in Linux. You can check this by running the :command:`cpupower frequency-info` command.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
|
||||
sysadmin@controller-0:~$ cpupower frequency-info
|
||||
analyzing CPU 0:
|
||||
driver: intel_pstate
|
||||
CPUs which run at the same hardware frequency: 0
|
||||
CPUs which need to have their frequency coordinated by software: 0
|
||||
maximum transition latency: Cannot determine or is not supported.
|
||||
hardware limits: 800 MHz - 3.60 GHz
|
||||
available cpufreq governors: performance powersave
|
||||
current policy: frequency should be within 800 MHz and 2.50 GHz.
|
||||
The governor "performance" may decide which speed to use
|
||||
within this range.
|
||||
current CPU frequency: Unable to call hardware
|
||||
current CPU frequency: 2.50 GHz (asserted by call to kernel)
|
||||
boost state support:
|
||||
Supported: yes
|
||||
Active: yes
|
||||
|
||||
If there is no delegation from the BIOS to the operating system, the ``linux_cpu``
|
||||
module may fail to function correctly. To enable power-metrics, it is necessary
|
||||
to remove the ``linux_cpu`` module. In this scenario, the performance metrics
|
||||
generated by the ``linux_cpu`` module will not be available.
|
||||
|
||||
Add input plugins
|
||||
Example:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
sysadmin@compute-0:~$ cpupower frequency-info
|
||||
analyzing CPU 0:
|
||||
no or unknown cpufreq driver is active on this CPU
|
||||
CPUs which run at the same hardware frequency: Not Available
|
||||
CPUs which need to have their frequency coordinated by software: Not Available
|
||||
maximum transition latency: Cannot determine or is not supported.
|
||||
Not Available
|
||||
available cpufreq governors: Not Available
|
||||
Unable to determine current policy
|
||||
current CPU frequency: Unable to call hardware
|
||||
current CPU frequency: Unable to call to kernel
|
||||
boost state support:
|
||||
Supported: yes
|
||||
Active: yes
|
||||
|
||||
Intel Power Stat Configuration Behavior
|
||||
---------------------------------------
|
||||
|
||||
This section describes the expected behavior for the [[inputs.intel_powerstat]]
|
||||
configuration for different configuration scenarios:
|
||||
|
||||
- Empty configuration
|
||||
|
||||
When the ``platform_metrics`` parameter is set to an empty array, as shown
|
||||
below, all the metrics should be restricted from being returned. This means, no
|
||||
metrics will be provided in this configuration.
|
||||
|
||||
[[inputs.intel_powerstat]]
|
||||
platform_metrics = []
|
||||
|
||||
- Default configuration
|
||||
|
||||
With either the default configuration or when the [[inputs.intel_powerstat]]
|
||||
input is used without specifying platform_metrics, only the following metrics
|
||||
should be enabled:
|
||||
|
||||
current_power_consumption
|
||||
current_dram_power_consumption
|
||||
thermal_design_power
|
||||
|
||||
This default behavior ensures that only the essential power consumption metrics
|
||||
are collected.
|
||||
|
||||
- Specific platform metrics
|
||||
|
||||
If specific metrics are enabled using the following ``platform_metrics``
|
||||
parameter, only the metrics specified in the ``platform_metrics`` array will be
|
||||
returned. No other metrics will be included beyond the explicitly listed ones.
|
||||
|
||||
[[inputs.intel_powerstat]]
|
||||
platform_metrics = ["cpu_base_frequency", ...]
|
||||
|
||||
Add Input Plugins
|
||||
-----------------
|
||||
|
||||
You can add new plugins overriding the plugins column.
|
||||
You can add new plugins by overriding the inputs parameter.
|
||||
|
||||
#. Add the cgroups plugin.
|
||||
Example of overriding the powerstat plugin:
|
||||
|
||||
#. Add the ``cpu_c3_state_residency`` metric to the ``intel_powerstat/cpu_metrics`` plugin.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-cgroups.yaml
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml
|
||||
config:
|
||||
inputs:
|
||||
- cgroup:
|
||||
paths: ["/sys/fs/cgroup/cpu","/sys/fs/cgroup/cpu/*","/sys/fs/cgroup/cpu/*/*",]
|
||||
files: ["cpuacct.usage", "cpuacct.usage_percpu", "cpu.cfs_period_us", "cpu.cfs_quota_us", "cpu.shares", "cpu.stat"]
|
||||
# Default plugins to collect power-metrics data
|
||||
- intel_powerstat:
|
||||
cpu_metrics:
|
||||
- "cpu_frequency"
|
||||
- "cpu_busy_frequency"
|
||||
- "cpu_temperature"
|
||||
- "cpu_c0_state_residency"
|
||||
- "cpu_c1_state_residency"
|
||||
- "cpu_c3_state_residency"
|
||||
- "cpu_c6_state_residency"
|
||||
- "cpu_busy_cycles"
|
||||
package_metrics:
|
||||
- "current_power_consumption"
|
||||
- "current_dram_power_consumption"
|
||||
- "thermal_design_power"
|
||||
- "cpu_base_frequency"
|
||||
- "uncore_frequency"
|
||||
- intel_pmu:
|
||||
event_definitions:
|
||||
- "/etc/telegraf/events_definition.json"
|
||||
core_events:
|
||||
- events:
|
||||
- INST_RETIRED.ANY
|
||||
- linux_cpu:
|
||||
metrics: ["cpufreq"]
|
||||
|
||||
#. Apply the override.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
system helm-override-update power-metrics telegraf power-metrics --values /path/to/file.yaml
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-cgroups.yaml
|
||||
+----------------+--------------------------------+
|
||||
| Property | Value |
|
||||
+----------------+--------------------------------+
|
||||
| name | telegraf |
|
||||
| namespace | power-metrics |
|
||||
| user_overrides | config: |
|
||||
| | inputs: |
|
||||
| | - cgroup: |
|
||||
| | files: |
|
||||
| | - cpuacct.usage |
|
||||
| | - cpuacct.usage_percpu |
|
||||
| | - cpu.cfs_period_us |
|
||||
| | - cpu.cfs_quota_us |
|
||||
| | - cpu.shares |
|
||||
| | - cpu.stat |
|
||||
| | paths: |
|
||||
| | - /sys/fs/cgroup/cpu |
|
||||
| | - /sys/fs/cgroup/cpu/* |
|
||||
| | - /sys/fs/cgroup/cpu/*/* |
|
||||
| | |
|
||||
+----------------+--------------------------------+
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml
|
||||
|
||||
+----------------+------------------------------------------------+
|
||||
| Property | Value |
|
||||
+----------------+------------------------------------------------+
|
||||
| name | telegraf |
|
||||
| namespace | power-metrics |
|
||||
| user_overrides | config: |
|
||||
| | inputs: |
|
||||
| | - intel_powerstat: |
|
||||
| | cpu_metrics: |
|
||||
| | - cpu_frequency |
|
||||
| | - cpu_busy_frequency |
|
||||
| | - cpu_temperature |
|
||||
| | - cpu_c0_state_residency |
|
||||
| | - cpu_c1_state_residency |
|
||||
| | - cpu_c3_state_residency |
|
||||
| | - cpu_c6_state_residency |
|
||||
| | - cpu_busy_cycles |
|
||||
| | package_metrics: |
|
||||
| | - current_power_consumption |
|
||||
| | - current_dram_power_consumption |
|
||||
| | - thermal_design_power |
|
||||
| | - cpu_base_frequency |
|
||||
| | - uncore_frequency |
|
||||
| | - intel_pmu: |
|
||||
| | event_definitions: |
|
||||
| | - "/etc/telegraf/events_definition.json" |
|
||||
| | core_events: |
|
||||
| | - events: |
|
||||
| | - INST_RETIRED.ANY |
|
||||
| | - linux_cpu: |
|
||||
| | metrics: ["cpufreq"] |
|
||||
| | |
|
||||
+----------------+------------------------------------------------+
|
||||
|
||||
#. Re-apply the application.
|
||||
|
||||
@ -247,57 +412,79 @@ You can add new plugins overriding the plugins column.
|
||||
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
|
||||
|
||||
#. If required, add configmap and volumes via override.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
volumes:
|
||||
- name: telegraf-example
|
||||
configMap:
|
||||
name: telegraf-example
|
||||
mountPoints:
|
||||
- name: telegraf-example
|
||||
mountPath: /path/to/file.json
|
||||
subPath: file.json
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
system helm-override-update power-metrics telegraf power-metrics --values /path/to/file.yaml
|
||||
|
||||
For more information on Telegraf plugins, see
|
||||
https://github.com/influxdata/telegraf#documentation.
|
||||
|
||||
|
||||
Remove input plugins
|
||||
Remove Input Plugins
|
||||
--------------------
|
||||
|
||||
You can remove plugins by setting their value to false in the plugins column.
|
||||
You can remove plugins by overriding the inputs parameter.
|
||||
|
||||
#. Remove the cgroups plugin.
|
||||
#. Remove the ``linux_cpu`` plugin.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-cgroups.yaml
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml
|
||||
config:
|
||||
inputs:
|
||||
- cgroup: false
|
||||
# Default plugins to collect power-metrics data
|
||||
- intel_powerstat:
|
||||
cpu_metrics:
|
||||
- "cpu_frequency"
|
||||
- "cpu_busy_frequency"
|
||||
- "cpu_temperature"
|
||||
- "cpu_c0_state_residency"
|
||||
- "cpu_c1_state_residency"
|
||||
- "cpu_c3_state_residency"
|
||||
- "cpu_c6_state_residency"
|
||||
- "cpu_busy_cycles"
|
||||
package_metrics:
|
||||
- "current_power_consumption"
|
||||
- "current_dram_power_consumption"
|
||||
- "thermal_design_power"
|
||||
- "cpu_base_frequency"
|
||||
- "uncore_frequency"
|
||||
- intel_pmu:
|
||||
event_definitions:
|
||||
- "/etc/telegraf/events_definition.json"
|
||||
core_events:
|
||||
- events:
|
||||
- INST_RETIRED.ANY
|
||||
|
||||
#. Apply the override.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values ./telegraf-cgroups.yaml
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml
|
||||
|
||||
+----------------+-------------------+
|
||||
| Property | Value |
|
||||
+----------------+-------------------+
|
||||
| name | telegraf |
|
||||
| namespace | power-metrics |
|
||||
| user_overrides | config: |
|
||||
| | inputs: |
|
||||
| | - cgroup: false |
|
||||
| | |
|
||||
+----------------+-------------------+
|
||||
+----------------+------------------------------------------------+
|
||||
| Property | Value |
|
||||
+----------------+------------------------------------------------+
|
||||
| name | telegraf |
|
||||
| namespace | power-metrics |
|
||||
| user_overrides | config: |
|
||||
| | inputs: |
|
||||
| | - intel_powerstat: |
|
||||
| | cpu_metrics: |
|
||||
| | - cpu_frequency |
|
||||
| | - cpu_busy_frequency |
|
||||
| | - cpu_temperature |
|
||||
| | - cpu_c0_state_residency |
|
||||
| | - cpu_c1_state_residency |
|
||||
| | - cpu_c3_state_residency |
|
||||
| | - cpu_c6_state_residency |
|
||||
| | - cpu_busy_cycles |
|
||||
| | package_metrics: |
|
||||
| | - current_power_consumption |
|
||||
| | - current_dram_power_consumption |
|
||||
| | - thermal_design_power |
|
||||
| | - cpu_base_frequency |
|
||||
| | - uncore_frequency |
|
||||
| | - intel_pmu: |
|
||||
| | event_definitions: |
|
||||
| | - "/etc/telegraf/events_definition.json" |
|
||||
| | core_events: |
|
||||
| | - events: |
|
||||
| | - INST_RETIRED.ANY |
|
||||
| | |
|
||||
+----------------+------------------------------------------------+
|
||||
|
||||
#. Re-apply the application.
|
||||
|
||||
@ -305,7 +492,7 @@ You can remove plugins by setting their value to false in the plugins column.
|
||||
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
|
||||
|
||||
Modify Telegraf data collection interval
|
||||
Modify Telegraf Data Collection Interval
|
||||
----------------------------------------
|
||||
|
||||
Telegraf report its metrics each 10 seconds, but you can modify this time
|
||||
@ -319,7 +506,30 @@ interval with the following command:
|
||||
cAdvisor
|
||||
--------
|
||||
|
||||
Enable and disable Perf Events on cAdvisor
|
||||
Enable or Disable cAdvisor
|
||||
--------------------------
|
||||
|
||||
To enable or disable cAdvisor, use the following command:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics cadvisor power-metrics --set cadvisor_enabled=true
|
||||
+----------------+------------------------+
|
||||
| Property | Value |
|
||||
+----------------+------------------------+
|
||||
| name | cadvisor |
|
||||
| namespace | power-metrics |
|
||||
| user_overrides | cadvisor_enabled: true |
|
||||
| | |
|
||||
+----------------+------------------------+
|
||||
|
||||
Reapply the power-metrics application and wait for the pod to restart.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
[sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
|
||||
|
||||
Enable and Disable Perf Events on cAdvisor
|
||||
------------------------------------------
|
||||
|
||||
To enable or disable Perf Events on cAdvisor, use the following command:
|
||||
|
Loading…
x
Reference in New Issue
Block a user