diff --git a/doc/source/node_management/kubernetes/install-power-metrics-application-a12de3db7478.rst b/doc/source/node_management/kubernetes/install-power-metrics-application-a12de3db7478.rst index 9f7d42331..5c998ce3b 100644 --- a/doc/source/node_management/kubernetes/install-power-metrics-application-a12de3db7478.rst +++ b/doc/source/node_management/kubernetes/install-power-metrics-application-a12de3db7478.rst @@ -54,7 +54,7 @@ Uncore events can only be loaded from the following cpu models: * - 0xCF - Intel Emerald Rapids X -Source: https://github.com/influxdata/telegraf/issues/13098#issuecomment-1512585422 +Source: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_powerstat/README.md#supported-cpu-models .. rubric:: |proc| @@ -126,120 +126,285 @@ You can activate the Intel |PMU| plugin with the following command: | | | +----------------+-------------------+ +Override Input Plugins +---------------------- -Override intel_powerstat plugin -------------------------------- +You can change the default input plugins parameters by override. -You can change the default ``intel_powerstat`` plugin parameters by override. - -The plugin parameters include CPU and package metrics, and also the read method -of |MSR|. +The default plugin parameters include CPU and package metrics. The list of available options for both CPU and package metrics can be found on the powerstat documentation: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_powerstat/README.md#configuration -It is worth noting that when overriding, the user must inform both metrics -parameters (cpu and package), otherwise the plugin would stop collecting the -missing metrics. +.. note:: -The ``read_method`` parameter specifies the reading method of |MSR|. This -parameter accepts two values, concurrent or sequential. The default is -concurrent. Concurrent method uses goroutines to read each |MSR| value -concurrently. - -The sequential method reads each value sequentially. This reduces latency -overhead when using preempt-rt kernel with isolated cores, but might cause loss -of precision on metrics calculation. + When overriding, you must inform both metrics parameters (CPU and package), + otherwise the plugin would stop collecting the missing metrics. Example of overriding the powerstat plugin: -.. code-block:: none +.. rubric:: |proc| - [sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml - config: - inputs: - intel_powerstat: - read_method: "sequential" - cpu_metrics: ["cpu_frequency","cpu_busy_frequency","cpu_temperature","cpu_c0_state_residency","cpu_c1_state_residency","cpu_c6_state_residency","cpu_busy_cycles"] - package_metrics: ["current_power_consumption","current_dram_power_consumption","thermal_design_power","cpu_base_frequency"] +#. Update the input parameters. - [sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml + .. code-block:: none - +----------------+--------------------------------------+ - | Property | Value | - +----------------+--------------------------------------+ - | name | telegraf | - | namespace | power-metrics | - | user_overrides | config: | - | | intel_powerstat: | - | | cpu_metrics: | - | | - cpu_frequency | - | | - cpu_busy_frequency | - | | - cpu_temperature | - | | - cpu_c0_state_residency | - | | - cpu_c1_state_residency | - | | - cpu_c6_state_residency | - | | - cpu_busy_cycles | - | | package_metrics: | - | | - current_power_consumption | - | | - current_dram_power_consumption | - | | - thermal_design_power | - | | - cpu_base_frequency | - | | read_method: sequential | - | | | - +----------------+--------------------------------------+ + [sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml + config: + inputs: + # Default plugins to collect power-metrics data + - intel_powerstat: + cpu_metrics: + - "cpu_frequency" + - "cpu_busy_frequency" + - "cpu_temperature" + - "cpu_c0_state_residency" + - "cpu_c1_state_residency" + - "cpu_c6_state_residency" + - "cpu_busy_cycles" + package_metrics: + - "current_power_consumption" + - "current_dram_power_consumption" + - "thermal_design_power" + - "cpu_base_frequency" + - "uncore_frequency" + - intel_pmu: + event_definitions: + - "/etc/telegraf/events_definition.json" + core_events: + - events: + - INST_RETIRED.ANY + - linux_cpu: + metrics: ["cpufreq"] -Re-apply the app. +#. Apply the override. + + .. code-block:: none + + [sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml + + +----------------+------------------------------------------------+ + | Property | Value | + +----------------+------------------------------------------------+ + | name | telegraf | + | namespace | power-metrics | + | user_overrides | config: | + | | inputs: | + | | - intel_powerstat: | + | | cpu_metrics: | + | | - cpu_frequency | + | | - cpu_busy_frequency | + | | - cpu_temperature | + | | - cpu_c0_state_residency | + | | - cpu_c1_state_residency | + | | - cpu_c6_state_residency | + | | - cpu_busy_cycles | + | | package_metrics: | + | | - current_power_consumption | + | | - current_dram_power_consumption | + | | - thermal_design_power | + | | - cpu_base_frequency | + | | - uncore_frequency | + | | - intel_pmu: | + | | event_definitions: | + | | - "/etc/telegraf/events_definition.json" | + | | core_events: | + | | - events: | + | | - INST_RETIRED.ANY | + | | - linux_cpu: | + | | metrics: ["cpufreq"] | + | | | + +----------------+------------------------------------------------+ + +#. Re-apply the application. + + .. code-block:: none + + [sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics + +.. note:: + + Power Metrics may increase the scheduling latency due to perf and |MSR| + readings. It was observed that there was a latency impact of around 3 µs on + average, plus spikes with significant increases in maximum latency values. + There was also an impact on the kernel processing time. Applications that + run with priorities at or above 50 in real time kernel isolated CPUs should + allow kernel services to avoid unexpected system behavior. + + +Configuration Requirement for Power Metrics and linux_cpu +--------------------------------------------------------- + +If the BIOS is not configured to delegate control to the operating system, the +``linux_cpu`` metrics may not function as expected. Remove ``linux_cpu`` to ensure that +power-metrics operate correctly. In this case, metrics generated by ``linux_cpu`` +will not be available. + +To verify that the BIOS is properly configured, a frequency driver should be +loaded in Linux. You can check this by running the :command:`cpupower frequency-info` command. + +Example: .. code-block:: none - [sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics + sysadmin@controller-0:~$ cpupower frequency-info + analyzing CPU 0: + driver: intel_pstate + CPUs which run at the same hardware frequency: 0 + CPUs which need to have their frequency coordinated by software: 0 + maximum transition latency: Cannot determine or is not supported. + hardware limits: 800 MHz - 3.60 GHz + available cpufreq governors: performance powersave + current policy: frequency should be within 800 MHz and 2.50 GHz. + The governor "performance" may decide which speed to use + within this range. + current CPU frequency: Unable to call hardware + current CPU frequency: 2.50 GHz (asserted by call to kernel) + boost state support: + Supported: yes + Active: yes +If there is no delegation from the BIOS to the operating system, the ``linux_cpu`` +module may fail to function correctly. To enable power-metrics, it is necessary +to remove the ``linux_cpu`` module. In this scenario, the performance metrics +generated by the ``linux_cpu`` module will not be available. -Add input plugins +Example: + +.. code-block:: none + + sysadmin@compute-0:~$ cpupower frequency-info + analyzing CPU 0: + no or unknown cpufreq driver is active on this CPU + CPUs which run at the same hardware frequency: Not Available + CPUs which need to have their frequency coordinated by software: Not Available + maximum transition latency: Cannot determine or is not supported. + Not Available + available cpufreq governors: Not Available + Unable to determine current policy + current CPU frequency: Unable to call hardware + current CPU frequency: Unable to call to kernel + boost state support: + Supported: yes + Active: yes + +Intel Power Stat Configuration Behavior +--------------------------------------- + +This section describes the expected behavior for the [[inputs.intel_powerstat]] +configuration for different configuration scenarios: + +- Empty configuration + + When the ``platform_metrics`` parameter is set to an empty array, as shown + below, all the metrics should be restricted from being returned. This means, no + metrics will be provided in this configuration. + + [[inputs.intel_powerstat]] + platform_metrics = [] + +- Default configuration + + With either the default configuration or when the [[inputs.intel_powerstat]] + input is used without specifying platform_metrics, only the following metrics + should be enabled: + + current_power_consumption + current_dram_power_consumption + thermal_design_power + + This default behavior ensures that only the essential power consumption metrics + are collected. + +- Specific platform metrics + + If specific metrics are enabled using the following ``platform_metrics`` + parameter, only the metrics specified in the ``platform_metrics`` array will be + returned. No other metrics will be included beyond the explicitly listed ones. + + [[inputs.intel_powerstat]] + platform_metrics = ["cpu_base_frequency", ...] + +Add Input Plugins ----------------- -You can add new plugins overriding the plugins column. +You can add new plugins by overriding the inputs parameter. -#. Add the cgroups plugin. +Example of overriding the powerstat plugin: + +#. Add the ``cpu_c3_state_residency`` metric to the ``intel_powerstat/cpu_metrics`` plugin. .. code-block:: none - [sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-cgroups.yaml + [sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml config: inputs: - - cgroup: - paths: ["/sys/fs/cgroup/cpu","/sys/fs/cgroup/cpu/*","/sys/fs/cgroup/cpu/*/*",] - files: ["cpuacct.usage", "cpuacct.usage_percpu", "cpu.cfs_period_us", "cpu.cfs_quota_us", "cpu.shares", "cpu.stat"] + # Default plugins to collect power-metrics data + - intel_powerstat: + cpu_metrics: + - "cpu_frequency" + - "cpu_busy_frequency" + - "cpu_temperature" + - "cpu_c0_state_residency" + - "cpu_c1_state_residency" + - "cpu_c3_state_residency" + - "cpu_c6_state_residency" + - "cpu_busy_cycles" + package_metrics: + - "current_power_consumption" + - "current_dram_power_consumption" + - "thermal_design_power" + - "cpu_base_frequency" + - "uncore_frequency" + - intel_pmu: + event_definitions: + - "/etc/telegraf/events_definition.json" + core_events: + - events: + - INST_RETIRED.ANY + - linux_cpu: + metrics: ["cpufreq"] #. Apply the override. .. code-block:: none - system helm-override-update power-metrics telegraf power-metrics --values /path/to/file.yaml - [sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-cgroups.yaml - +----------------+--------------------------------+ - | Property | Value | - +----------------+--------------------------------+ - | name | telegraf | - | namespace | power-metrics | - | user_overrides | config: | - | | inputs: | - | | - cgroup: | - | | files: | - | | - cpuacct.usage | - | | - cpuacct.usage_percpu | - | | - cpu.cfs_period_us | - | | - cpu.cfs_quota_us | - | | - cpu.shares | - | | - cpu.stat | - | | paths: | - | | - /sys/fs/cgroup/cpu | - | | - /sys/fs/cgroup/cpu/* | - | | - /sys/fs/cgroup/cpu/*/* | - | | | - +----------------+--------------------------------+ + [sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml + + +----------------+------------------------------------------------+ + | Property | Value | + +----------------+------------------------------------------------+ + | name | telegraf | + | namespace | power-metrics | + | user_overrides | config: | + | | inputs: | + | | - intel_powerstat: | + | | cpu_metrics: | + | | - cpu_frequency | + | | - cpu_busy_frequency | + | | - cpu_temperature | + | | - cpu_c0_state_residency | + | | - cpu_c1_state_residency | + | | - cpu_c3_state_residency | + | | - cpu_c6_state_residency | + | | - cpu_busy_cycles | + | | package_metrics: | + | | - current_power_consumption | + | | - current_dram_power_consumption | + | | - thermal_design_power | + | | - cpu_base_frequency | + | | - uncore_frequency | + | | - intel_pmu: | + | | event_definitions: | + | | - "/etc/telegraf/events_definition.json" | + | | core_events: | + | | - events: | + | | - INST_RETIRED.ANY | + | | - linux_cpu: | + | | metrics: ["cpufreq"] | + | | | + +----------------+------------------------------------------------+ #. Re-apply the application. @@ -247,57 +412,79 @@ You can add new plugins overriding the plugins column. [sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics -#. If required, add configmap and volumes via override. - - .. code-block:: none - - volumes: - - name: telegraf-example - configMap: - name: telegraf-example - mountPoints: - - name: telegraf-example - mountPath: /path/to/file.json - subPath: file.json - - .. code-block:: none - - system helm-override-update power-metrics telegraf power-metrics --values /path/to/file.yaml - -For more information on Telegraf plugins, see -https://github.com/influxdata/telegraf#documentation. - - -Remove input plugins +Remove Input Plugins -------------------- -You can remove plugins by setting their value to false in the plugins column. +You can remove plugins by overriding the inputs parameter. -#. Remove the cgroups plugin. +#. Remove the ``linux_cpu`` plugin. .. code-block:: none - [sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-cgroups.yaml + [sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml config: inputs: - - cgroup: false + # Default plugins to collect power-metrics data + - intel_powerstat: + cpu_metrics: + - "cpu_frequency" + - "cpu_busy_frequency" + - "cpu_temperature" + - "cpu_c0_state_residency" + - "cpu_c1_state_residency" + - "cpu_c3_state_residency" + - "cpu_c6_state_residency" + - "cpu_busy_cycles" + package_metrics: + - "current_power_consumption" + - "current_dram_power_consumption" + - "thermal_design_power" + - "cpu_base_frequency" + - "uncore_frequency" + - intel_pmu: + event_definitions: + - "/etc/telegraf/events_definition.json" + core_events: + - events: + - INST_RETIRED.ANY #. Apply the override. .. code-block:: none - [sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values ./telegraf-cgroups.yaml + [sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml - +----------------+-------------------+ - | Property | Value | - +----------------+-------------------+ - | name | telegraf | - | namespace | power-metrics | - | user_overrides | config: | - | | inputs: | - | | - cgroup: false | - | | | - +----------------+-------------------+ + +----------------+------------------------------------------------+ + | Property | Value | + +----------------+------------------------------------------------+ + | name | telegraf | + | namespace | power-metrics | + | user_overrides | config: | + | | inputs: | + | | - intel_powerstat: | + | | cpu_metrics: | + | | - cpu_frequency | + | | - cpu_busy_frequency | + | | - cpu_temperature | + | | - cpu_c0_state_residency | + | | - cpu_c1_state_residency | + | | - cpu_c3_state_residency | + | | - cpu_c6_state_residency | + | | - cpu_busy_cycles | + | | package_metrics: | + | | - current_power_consumption | + | | - current_dram_power_consumption | + | | - thermal_design_power | + | | - cpu_base_frequency | + | | - uncore_frequency | + | | - intel_pmu: | + | | event_definitions: | + | | - "/etc/telegraf/events_definition.json" | + | | core_events: | + | | - events: | + | | - INST_RETIRED.ANY | + | | | + +----------------+------------------------------------------------+ #. Re-apply the application. @@ -305,7 +492,7 @@ You can remove plugins by setting their value to false in the plugins column. [sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics -Modify Telegraf data collection interval +Modify Telegraf Data Collection Interval ---------------------------------------- Telegraf report its metrics each 10 seconds, but you can modify this time @@ -319,7 +506,30 @@ interval with the following command: cAdvisor -------- -Enable and disable Perf Events on cAdvisor +Enable or Disable cAdvisor +-------------------------- + +To enable or disable cAdvisor, use the following command: + +.. code-block:: none + + [sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics cadvisor power-metrics --set cadvisor_enabled=true + +----------------+------------------------+ + | Property | Value | + +----------------+------------------------+ + | name | cadvisor | + | namespace | power-metrics | + | user_overrides | cadvisor_enabled: true | + | | | + +----------------+------------------------+ + +Reapply the power-metrics application and wait for the pod to restart. + +.. code-block:: none + + [sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics + +Enable and Disable Perf Events on cAdvisor ------------------------------------------ To enable or disable Perf Events on cAdvisor, use the following command: