Follow-up to fix for power action failure

This is a follow-up patch to the patch so that the power status
is not retried if a power action fails:
ee5d4942a1c33736ffe05ec01619142be400c2f4

It addresses the comments as well as adds more clarification
and updates the documentation to refer to the new
[ipmi]command_retry_timeout config option.

Change-Id: Ib21544da260565ae399e2d07b32af9bd8b810280
Related-Bug: #1692895
This commit is contained in:
Ruby Loo 2017-07-11 12:43:08 -04:00
parent bc5efdf459
commit 578f01678c
7 changed files with 61 additions and 39 deletions

View File

@ -48,8 +48,8 @@ IPMI configuration
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
If there are slow or unresponsive BMCs in the environment, the If there are slow or unresponsive BMCs in the environment, the
``retry_timeout`` configuration option in the ``[ipmi]`` section may need ``command_retry_timeout`` configuration option in the ``[ipmi]`` section may
to be lowered. The default is fairly conservative, as setting this timeout need to be lowered. The default is fairly conservative, as setting this timeout
too low can cause older BMCs to crash and require a hard-reset. too low can cause older BMCs to crash and require a hard-reset.
Collecting sensor data Collecting sensor data

View File

@ -1140,9 +1140,10 @@
# Minimum value: 1 # Minimum value: 1
#soft_power_off_timeout = 600 #soft_power_off_timeout = 600
# Number of seconds to wait for power operations to complete # Number of seconds to wait for power operations to complete,
# on the baremetal node before declaring the power operation # i.e., so that a baremetal node is in the desired power
# has failed (integer value) # state. If timed out, the power operation is considered a
# failure. (integer value)
# Minimum value: 2 # Minimum value: 2
#power_state_change_timeout = 30 #power_state_change_timeout = 30
@ -1900,13 +1901,14 @@
# From ironic # From ironic
# #
# Maximum time in seconds to retry, retryable IPMI operations. # Maximum time in seconds to retry retryable IPMI operations.
# For example if the requested action fails because the BMC is # (An operation is retryable, for example, if the requested
# busy. There is a tradeoff when setting this value. Setting # operation fails because the BMC is busy.) There is a
# this too low may cause older BMCs to crash and require a # tradeoff when setting this value. Setting this too low may
# hard reset. However, setting too high can cause the sync # cause older BMCs to crash and require a hard reset. However,
# power state periodic task to hang when there are slow or # setting too high can cause the sync power state periodic
# unresponsive BMCs. (integer value) # task to hang when there are slow or unresponsive BMCs.
# (integer value)
#command_retry_timeout = 60 #command_retry_timeout = 60
# DEPRECATED: Maximum time in seconds to retry IPMI # DEPRECATED: Maximum time in seconds to retry IPMI
@ -1917,10 +1919,11 @@
# slow or unresponsive BMCs. (integer value) # slow or unresponsive BMCs. (integer value)
# This option is deprecated for removal. # This option is deprecated for removal.
# Its value may be silently ignored in the future. # Its value may be silently ignored in the future.
# Reason: Option ipmi.command_retry_timeout should be used to # Reason: Use option [ipmi]/command_retry_timeout to specify
# define IPMI command retries and option # the timeout value for IPMI command retries, and use option
# conductor.power_state_change_timeout should be use to define # [conductor]/power_state_change_timeout to specify the
# timeout value for waiting for power operations to complete # timeout value for waiting for a power operation to complete
# so that a baremetal node reaches the desired power state.
#retry_timeout = <None> #retry_timeout = <None>
# Minimum time, in seconds, between IPMI operations sent to a # Minimum time, in seconds, between IPMI operations sent to a

View File

@ -70,6 +70,15 @@ def node_set_boot_device(task, device, persistent=False):
def node_wait_for_power_state(task, new_state, timeout=None): def node_wait_for_power_state(task, new_state, timeout=None):
"""Wait for node to be in new power state.
:param task: a TaskManager instance.
:param new_state: the desired new power state, one of the power states
in :mod:`ironic.common.states`.
:param timeout: number of seconds to wait before giving up. If not
specified, uses the conductor.power_state_change_timeout config value.
:raises: PowerStateFailure if timed out
"""
retry_timeout = (timeout or CONF.conductor.power_state_change_timeout) retry_timeout = (timeout or CONF.conductor.power_state_change_timeout)
def _wait(): def _wait():

View File

@ -148,8 +148,9 @@ opts = [
cfg.IntOpt('power_state_change_timeout', cfg.IntOpt('power_state_change_timeout',
min=2, default=30, min=2, default=30,
help=_('Number of seconds to wait for power operations to ' help=_('Number of seconds to wait for power operations to '
'complete on the baremetal node before declaring the ' 'complete, i.e., so that a baremetal node is in the '
'power operation has failed')), 'desired power state. If timed out, the power operation '
'is considered a failure.')),
] ]

View File

@ -22,9 +22,10 @@ from ironic.common.i18n import _
opts = [ opts = [
cfg.IntOpt('command_retry_timeout', cfg.IntOpt('command_retry_timeout',
default=60, default=60,
help=_('Maximum time in seconds to retry, retryable IPMI ' help=_('Maximum time in seconds to retry retryable IPMI '
'operations. For example if the requested action fails ' 'operations. (An operation is retryable, for '
'because the BMC is busy. There is a tradeoff when ' 'example, if the requested operation fails '
'because the BMC is busy.) There is a tradeoff when '
'setting this value. Setting this too low may cause ' 'setting this value. Setting this too low may cause '
'older BMCs to crash and require a hard reset. However, ' 'older BMCs to crash and require a hard reset. However, '
'setting too high can cause the sync power state ' 'setting too high can cause the sync power state '
@ -38,13 +39,14 @@ opts = [
'sync power state periodic task to hang when there are ' 'sync power state periodic task to hang when there are '
'slow or unresponsive BMCs.'), 'slow or unresponsive BMCs.'),
deprecated_for_removal=True, deprecated_for_removal=True,
deprecated_reason=_('Option ipmi.command_retry_timeout should ' deprecated_reason=_('Use option [ipmi]/command_retry_timeout '
'be used to define IPMI command retries ' 'to specify the timeout value for '
'and option ' 'IPMI command retries, and use option '
'conductor.power_state_change_timeout ' '[conductor]/power_state_change_timeout to '
'should be use to define timeout value for ' 'specify the timeout value for waiting for '
'waiting for power operations to ' 'a power operation to complete so that a '
'complete')), 'baremetal node reaches the desired '
'power state.')),
cfg.IntOpt('min_command_interval', cfg.IntOpt('min_command_interval',
default=5, default=5,
help=_('Minimum time, in seconds, between IPMI operations ' help=_('Minimum time, in seconds, between IPMI operations '

View File

@ -396,7 +396,7 @@ def _exec_ipmitool(driver_info, command, check_exit_code=None):
args.append(option) args.append(option)
args.append(driver_info[name]) args.append(driver_info[name])
# TODO(sambetts) Remove useage of ipmi.retry_timeout in Queens # TODO(sambetts) Remove usage of ipmi.retry_timeout in Queens
timeout = CONF.ipmi.retry_timeout or CONF.ipmi.command_retry_timeout timeout = CONF.ipmi.retry_timeout or CONF.ipmi.command_retry_timeout
# specify retry timing more precisely, if supported # specify retry timing more precisely, if supported
@ -473,7 +473,7 @@ def _set_and_wait(task, power_action, driver_info, timeout=None):
:returns: one of ironic.common.states :returns: one of ironic.common.states
""" """
# TODO(sambetts) Remove useage of ipmi.retry_timeout in Queens # TODO(sambetts) Remove usage of ipmi.retry_timeout in Queens
default_timeout = CONF.ipmi.retry_timeout default_timeout = CONF.ipmi.retry_timeout
if power_action == states.POWER_ON: if power_action == states.POWER_ON:

View File

@ -1,14 +1,21 @@
--- ---
deprecations: deprecations:
- | - |
Configuration option IPMI.retry_timeout is deprecated in favor of new Configuration option ``[ipmi]/retry_timeout`` is deprecated in favor of
options IPMI.command_retry_timeout, and these new options:
CONDUCTOR.power_state_change_timeout
* ``[ipmi]/command_retry_timeout``: timeout value to wait for an IPMI
command to complete (be acknowledged by the baremetal node)
* ``[conductor]/power_state_change_timeout``: timeout value to wait for
a power operation to complete, so that the baremetal node is in the
desired new power state
fixes: fixes:
- | - |
Prevents the IPMI driver from needlessly checking status if the power Prevents the IPMI driver from needlessly checking status of the baremetal
change action fails. Additionally stop retrying power actions and power node if a power change action fails. Additionally, stops retrying power
status polls if we receive a non-retryable error from ipmitool. actions and power status polls on receipt of a non-retryable error from
https//bugs.launchpad.net/ironic/+bug/1675529. New configuration option ipmitool. For more information, see
`power_state_change_timeout` added to define how many seconds to wait for a https//bugs.launchpad.net/ironic/+bug/1675529. A new configuration option
server to change power state when a power action is requested. ``[conductor]/power_state_change_timeout`` can be used to specify how many
seconds to wait for a baremetal node to change power state when a power
action is requested.