Merge "Follow-up to fix for power action failure"
This commit is contained in:
commit
a1cbfa5be0
@ -48,8 +48,8 @@ IPMI configuration
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If there are slow or unresponsive BMCs in the environment, the
|
||||
``retry_timeout`` configuration option in the ``[ipmi]`` section may need
|
||||
to be lowered. The default is fairly conservative, as setting this timeout
|
||||
``command_retry_timeout`` configuration option in the ``[ipmi]`` section may
|
||||
need to be lowered. The default is fairly conservative, as setting this timeout
|
||||
too low can cause older BMCs to crash and require a hard-reset.
|
||||
|
||||
Collecting sensor data
|
||||
|
@ -1140,9 +1140,10 @@
|
||||
# Minimum value: 1
|
||||
#soft_power_off_timeout = 600
|
||||
|
||||
# Number of seconds to wait for power operations to complete
|
||||
# on the baremetal node before declaring the power operation
|
||||
# has failed (integer value)
|
||||
# Number of seconds to wait for power operations to complete,
|
||||
# i.e., so that a baremetal node is in the desired power
|
||||
# state. If timed out, the power operation is considered a
|
||||
# failure. (integer value)
|
||||
# Minimum value: 2
|
||||
#power_state_change_timeout = 30
|
||||
|
||||
@ -1900,13 +1901,14 @@
|
||||
# From ironic
|
||||
#
|
||||
|
||||
# Maximum time in seconds to retry, retryable IPMI operations.
|
||||
# For example if the requested action fails because the BMC is
|
||||
# busy. There is a tradeoff when setting this value. Setting
|
||||
# this too low may cause older BMCs to crash and require a
|
||||
# hard reset. However, setting too high can cause the sync
|
||||
# power state periodic task to hang when there are slow or
|
||||
# unresponsive BMCs. (integer value)
|
||||
# Maximum time in seconds to retry retryable IPMI operations.
|
||||
# (An operation is retryable, for example, if the requested
|
||||
# operation fails because the BMC is busy.) There is a
|
||||
# tradeoff when setting this value. Setting this too low may
|
||||
# cause older BMCs to crash and require a hard reset. However,
|
||||
# setting too high can cause the sync power state periodic
|
||||
# task to hang when there are slow or unresponsive BMCs.
|
||||
# (integer value)
|
||||
#command_retry_timeout = 60
|
||||
|
||||
# DEPRECATED: Maximum time in seconds to retry IPMI
|
||||
@ -1917,10 +1919,11 @@
|
||||
# slow or unresponsive BMCs. (integer value)
|
||||
# This option is deprecated for removal.
|
||||
# Its value may be silently ignored in the future.
|
||||
# Reason: Option ipmi.command_retry_timeout should be used to
|
||||
# define IPMI command retries and option
|
||||
# conductor.power_state_change_timeout should be use to define
|
||||
# timeout value for waiting for power operations to complete
|
||||
# Reason: Use option [ipmi]/command_retry_timeout to specify
|
||||
# the timeout value for IPMI command retries, and use option
|
||||
# [conductor]/power_state_change_timeout to specify the
|
||||
# timeout value for waiting for a power operation to complete
|
||||
# so that a baremetal node reaches the desired power state.
|
||||
#retry_timeout = <None>
|
||||
|
||||
# Minimum time, in seconds, between IPMI operations sent to a
|
||||
|
@ -70,6 +70,15 @@ def node_set_boot_device(task, device, persistent=False):
|
||||
|
||||
|
||||
def node_wait_for_power_state(task, new_state, timeout=None):
|
||||
"""Wait for node to be in new power state.
|
||||
|
||||
:param task: a TaskManager instance.
|
||||
:param new_state: the desired new power state, one of the power states
|
||||
in :mod:`ironic.common.states`.
|
||||
:param timeout: number of seconds to wait before giving up. If not
|
||||
specified, uses the conductor.power_state_change_timeout config value.
|
||||
:raises: PowerStateFailure if timed out
|
||||
"""
|
||||
retry_timeout = (timeout or CONF.conductor.power_state_change_timeout)
|
||||
|
||||
def _wait():
|
||||
|
@ -148,8 +148,9 @@ opts = [
|
||||
cfg.IntOpt('power_state_change_timeout',
|
||||
min=2, default=30,
|
||||
help=_('Number of seconds to wait for power operations to '
|
||||
'complete on the baremetal node before declaring the '
|
||||
'power operation has failed')),
|
||||
'complete, i.e., so that a baremetal node is in the '
|
||||
'desired power state. If timed out, the power operation '
|
||||
'is considered a failure.')),
|
||||
]
|
||||
|
||||
|
||||
|
@ -22,9 +22,10 @@ from ironic.common.i18n import _
|
||||
opts = [
|
||||
cfg.IntOpt('command_retry_timeout',
|
||||
default=60,
|
||||
help=_('Maximum time in seconds to retry, retryable IPMI '
|
||||
'operations. For example if the requested action fails '
|
||||
'because the BMC is busy. There is a tradeoff when '
|
||||
help=_('Maximum time in seconds to retry retryable IPMI '
|
||||
'operations. (An operation is retryable, for '
|
||||
'example, if the requested operation fails '
|
||||
'because the BMC is busy.) There is a tradeoff when '
|
||||
'setting this value. Setting this too low may cause '
|
||||
'older BMCs to crash and require a hard reset. However, '
|
||||
'setting too high can cause the sync power state '
|
||||
@ -38,13 +39,14 @@ opts = [
|
||||
'sync power state periodic task to hang when there are '
|
||||
'slow or unresponsive BMCs.'),
|
||||
deprecated_for_removal=True,
|
||||
deprecated_reason=_('Option ipmi.command_retry_timeout should '
|
||||
'be used to define IPMI command retries '
|
||||
'and option '
|
||||
'conductor.power_state_change_timeout '
|
||||
'should be use to define timeout value for '
|
||||
'waiting for power operations to '
|
||||
'complete')),
|
||||
deprecated_reason=_('Use option [ipmi]/command_retry_timeout '
|
||||
'to specify the timeout value for '
|
||||
'IPMI command retries, and use option '
|
||||
'[conductor]/power_state_change_timeout to '
|
||||
'specify the timeout value for waiting for '
|
||||
'a power operation to complete so that a '
|
||||
'baremetal node reaches the desired '
|
||||
'power state.')),
|
||||
cfg.IntOpt('min_command_interval',
|
||||
default=5,
|
||||
help=_('Minimum time, in seconds, between IPMI operations '
|
||||
|
@ -396,7 +396,7 @@ def _exec_ipmitool(driver_info, command, check_exit_code=None):
|
||||
args.append(option)
|
||||
args.append(driver_info[name])
|
||||
|
||||
# TODO(sambetts) Remove useage of ipmi.retry_timeout in Queens
|
||||
# TODO(sambetts) Remove usage of ipmi.retry_timeout in Queens
|
||||
timeout = CONF.ipmi.retry_timeout or CONF.ipmi.command_retry_timeout
|
||||
|
||||
# specify retry timing more precisely, if supported
|
||||
@ -473,7 +473,7 @@ def _set_and_wait(task, power_action, driver_info, timeout=None):
|
||||
:returns: one of ironic.common.states
|
||||
|
||||
"""
|
||||
# TODO(sambetts) Remove useage of ipmi.retry_timeout in Queens
|
||||
# TODO(sambetts) Remove usage of ipmi.retry_timeout in Queens
|
||||
default_timeout = CONF.ipmi.retry_timeout
|
||||
|
||||
if power_action == states.POWER_ON:
|
||||
|
@ -1,14 +1,21 @@
|
||||
---
|
||||
deprecations:
|
||||
- |
|
||||
Configuration option IPMI.retry_timeout is deprecated in favor of new
|
||||
options IPMI.command_retry_timeout, and
|
||||
CONDUCTOR.power_state_change_timeout
|
||||
Configuration option ``[ipmi]/retry_timeout`` is deprecated in favor of
|
||||
these new options:
|
||||
|
||||
* ``[ipmi]/command_retry_timeout``: timeout value to wait for an IPMI
|
||||
command to complete (be acknowledged by the baremetal node)
|
||||
* ``[conductor]/power_state_change_timeout``: timeout value to wait for
|
||||
a power operation to complete, so that the baremetal node is in the
|
||||
desired new power state
|
||||
fixes:
|
||||
- |
|
||||
Prevents the IPMI driver from needlessly checking status if the power
|
||||
change action fails. Additionally stop retrying power actions and power
|
||||
status polls if we receive a non-retryable error from ipmitool.
|
||||
https//bugs.launchpad.net/ironic/+bug/1675529. New configuration option
|
||||
`power_state_change_timeout` added to define how many seconds to wait for a
|
||||
server to change power state when a power action is requested.
|
||||
Prevents the IPMI driver from needlessly checking status of the baremetal
|
||||
node if a power change action fails. Additionally, stops retrying power
|
||||
actions and power status polls on receipt of a non-retryable error from
|
||||
ipmitool. For more information, see
|
||||
https//bugs.launchpad.net/ironic/+bug/1675529. A new configuration option
|
||||
``[conductor]/power_state_change_timeout`` can be used to specify how many
|
||||
seconds to wait for a baremetal node to change power state when a power
|
||||
action is requested.
|
||||
|
Loading…
Reference in New Issue
Block a user