Follow-up to fix for power action failure
This is a follow-up patch to the patch so that the power status is not retried if a power action fails: ee5d4942a1c33736ffe05ec01619142be400c2f4 It addresses the comments as well as adds more clarification and updates the documentation to refer to the new [ipmi]command_retry_timeout config option. Change-Id: Ib21544da260565ae399e2d07b32af9bd8b810280 Related-Bug: #1692895
This commit is contained in:
parent
bc5efdf459
commit
578f01678c
@ -48,8 +48,8 @@ IPMI configuration
|
|||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
If there are slow or unresponsive BMCs in the environment, the
|
If there are slow or unresponsive BMCs in the environment, the
|
||||||
``retry_timeout`` configuration option in the ``[ipmi]`` section may need
|
``command_retry_timeout`` configuration option in the ``[ipmi]`` section may
|
||||||
to be lowered. The default is fairly conservative, as setting this timeout
|
need to be lowered. The default is fairly conservative, as setting this timeout
|
||||||
too low can cause older BMCs to crash and require a hard-reset.
|
too low can cause older BMCs to crash and require a hard-reset.
|
||||||
|
|
||||||
Collecting sensor data
|
Collecting sensor data
|
||||||
|
@ -1140,9 +1140,10 @@
|
|||||||
# Minimum value: 1
|
# Minimum value: 1
|
||||||
#soft_power_off_timeout = 600
|
#soft_power_off_timeout = 600
|
||||||
|
|
||||||
# Number of seconds to wait for power operations to complete
|
# Number of seconds to wait for power operations to complete,
|
||||||
# on the baremetal node before declaring the power operation
|
# i.e., so that a baremetal node is in the desired power
|
||||||
# has failed (integer value)
|
# state. If timed out, the power operation is considered a
|
||||||
|
# failure. (integer value)
|
||||||
# Minimum value: 2
|
# Minimum value: 2
|
||||||
#power_state_change_timeout = 30
|
#power_state_change_timeout = 30
|
||||||
|
|
||||||
@ -1900,13 +1901,14 @@
|
|||||||
# From ironic
|
# From ironic
|
||||||
#
|
#
|
||||||
|
|
||||||
# Maximum time in seconds to retry, retryable IPMI operations.
|
# Maximum time in seconds to retry retryable IPMI operations.
|
||||||
# For example if the requested action fails because the BMC is
|
# (An operation is retryable, for example, if the requested
|
||||||
# busy. There is a tradeoff when setting this value. Setting
|
# operation fails because the BMC is busy.) There is a
|
||||||
# this too low may cause older BMCs to crash and require a
|
# tradeoff when setting this value. Setting this too low may
|
||||||
# hard reset. However, setting too high can cause the sync
|
# cause older BMCs to crash and require a hard reset. However,
|
||||||
# power state periodic task to hang when there are slow or
|
# setting too high can cause the sync power state periodic
|
||||||
# unresponsive BMCs. (integer value)
|
# task to hang when there are slow or unresponsive BMCs.
|
||||||
|
# (integer value)
|
||||||
#command_retry_timeout = 60
|
#command_retry_timeout = 60
|
||||||
|
|
||||||
# DEPRECATED: Maximum time in seconds to retry IPMI
|
# DEPRECATED: Maximum time in seconds to retry IPMI
|
||||||
@ -1917,10 +1919,11 @@
|
|||||||
# slow or unresponsive BMCs. (integer value)
|
# slow or unresponsive BMCs. (integer value)
|
||||||
# This option is deprecated for removal.
|
# This option is deprecated for removal.
|
||||||
# Its value may be silently ignored in the future.
|
# Its value may be silently ignored in the future.
|
||||||
# Reason: Option ipmi.command_retry_timeout should be used to
|
# Reason: Use option [ipmi]/command_retry_timeout to specify
|
||||||
# define IPMI command retries and option
|
# the timeout value for IPMI command retries, and use option
|
||||||
# conductor.power_state_change_timeout should be use to define
|
# [conductor]/power_state_change_timeout to specify the
|
||||||
# timeout value for waiting for power operations to complete
|
# timeout value for waiting for a power operation to complete
|
||||||
|
# so that a baremetal node reaches the desired power state.
|
||||||
#retry_timeout = <None>
|
#retry_timeout = <None>
|
||||||
|
|
||||||
# Minimum time, in seconds, between IPMI operations sent to a
|
# Minimum time, in seconds, between IPMI operations sent to a
|
||||||
|
@ -70,6 +70,15 @@ def node_set_boot_device(task, device, persistent=False):
|
|||||||
|
|
||||||
|
|
||||||
def node_wait_for_power_state(task, new_state, timeout=None):
|
def node_wait_for_power_state(task, new_state, timeout=None):
|
||||||
|
"""Wait for node to be in new power state.
|
||||||
|
|
||||||
|
:param task: a TaskManager instance.
|
||||||
|
:param new_state: the desired new power state, one of the power states
|
||||||
|
in :mod:`ironic.common.states`.
|
||||||
|
:param timeout: number of seconds to wait before giving up. If not
|
||||||
|
specified, uses the conductor.power_state_change_timeout config value.
|
||||||
|
:raises: PowerStateFailure if timed out
|
||||||
|
"""
|
||||||
retry_timeout = (timeout or CONF.conductor.power_state_change_timeout)
|
retry_timeout = (timeout or CONF.conductor.power_state_change_timeout)
|
||||||
|
|
||||||
def _wait():
|
def _wait():
|
||||||
|
@ -148,8 +148,9 @@ opts = [
|
|||||||
cfg.IntOpt('power_state_change_timeout',
|
cfg.IntOpt('power_state_change_timeout',
|
||||||
min=2, default=30,
|
min=2, default=30,
|
||||||
help=_('Number of seconds to wait for power operations to '
|
help=_('Number of seconds to wait for power operations to '
|
||||||
'complete on the baremetal node before declaring the '
|
'complete, i.e., so that a baremetal node is in the '
|
||||||
'power operation has failed')),
|
'desired power state. If timed out, the power operation '
|
||||||
|
'is considered a failure.')),
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
|
@ -22,9 +22,10 @@ from ironic.common.i18n import _
|
|||||||
opts = [
|
opts = [
|
||||||
cfg.IntOpt('command_retry_timeout',
|
cfg.IntOpt('command_retry_timeout',
|
||||||
default=60,
|
default=60,
|
||||||
help=_('Maximum time in seconds to retry, retryable IPMI '
|
help=_('Maximum time in seconds to retry retryable IPMI '
|
||||||
'operations. For example if the requested action fails '
|
'operations. (An operation is retryable, for '
|
||||||
'because the BMC is busy. There is a tradeoff when '
|
'example, if the requested operation fails '
|
||||||
|
'because the BMC is busy.) There is a tradeoff when '
|
||||||
'setting this value. Setting this too low may cause '
|
'setting this value. Setting this too low may cause '
|
||||||
'older BMCs to crash and require a hard reset. However, '
|
'older BMCs to crash and require a hard reset. However, '
|
||||||
'setting too high can cause the sync power state '
|
'setting too high can cause the sync power state '
|
||||||
@ -38,13 +39,14 @@ opts = [
|
|||||||
'sync power state periodic task to hang when there are '
|
'sync power state periodic task to hang when there are '
|
||||||
'slow or unresponsive BMCs.'),
|
'slow or unresponsive BMCs.'),
|
||||||
deprecated_for_removal=True,
|
deprecated_for_removal=True,
|
||||||
deprecated_reason=_('Option ipmi.command_retry_timeout should '
|
deprecated_reason=_('Use option [ipmi]/command_retry_timeout '
|
||||||
'be used to define IPMI command retries '
|
'to specify the timeout value for '
|
||||||
'and option '
|
'IPMI command retries, and use option '
|
||||||
'conductor.power_state_change_timeout '
|
'[conductor]/power_state_change_timeout to '
|
||||||
'should be use to define timeout value for '
|
'specify the timeout value for waiting for '
|
||||||
'waiting for power operations to '
|
'a power operation to complete so that a '
|
||||||
'complete')),
|
'baremetal node reaches the desired '
|
||||||
|
'power state.')),
|
||||||
cfg.IntOpt('min_command_interval',
|
cfg.IntOpt('min_command_interval',
|
||||||
default=5,
|
default=5,
|
||||||
help=_('Minimum time, in seconds, between IPMI operations '
|
help=_('Minimum time, in seconds, between IPMI operations '
|
||||||
|
@ -396,7 +396,7 @@ def _exec_ipmitool(driver_info, command, check_exit_code=None):
|
|||||||
args.append(option)
|
args.append(option)
|
||||||
args.append(driver_info[name])
|
args.append(driver_info[name])
|
||||||
|
|
||||||
# TODO(sambetts) Remove useage of ipmi.retry_timeout in Queens
|
# TODO(sambetts) Remove usage of ipmi.retry_timeout in Queens
|
||||||
timeout = CONF.ipmi.retry_timeout or CONF.ipmi.command_retry_timeout
|
timeout = CONF.ipmi.retry_timeout or CONF.ipmi.command_retry_timeout
|
||||||
|
|
||||||
# specify retry timing more precisely, if supported
|
# specify retry timing more precisely, if supported
|
||||||
@ -473,7 +473,7 @@ def _set_and_wait(task, power_action, driver_info, timeout=None):
|
|||||||
:returns: one of ironic.common.states
|
:returns: one of ironic.common.states
|
||||||
|
|
||||||
"""
|
"""
|
||||||
# TODO(sambetts) Remove useage of ipmi.retry_timeout in Queens
|
# TODO(sambetts) Remove usage of ipmi.retry_timeout in Queens
|
||||||
default_timeout = CONF.ipmi.retry_timeout
|
default_timeout = CONF.ipmi.retry_timeout
|
||||||
|
|
||||||
if power_action == states.POWER_ON:
|
if power_action == states.POWER_ON:
|
||||||
|
@ -1,14 +1,21 @@
|
|||||||
---
|
---
|
||||||
deprecations:
|
deprecations:
|
||||||
- |
|
- |
|
||||||
Configuration option IPMI.retry_timeout is deprecated in favor of new
|
Configuration option ``[ipmi]/retry_timeout`` is deprecated in favor of
|
||||||
options IPMI.command_retry_timeout, and
|
these new options:
|
||||||
CONDUCTOR.power_state_change_timeout
|
|
||||||
|
* ``[ipmi]/command_retry_timeout``: timeout value to wait for an IPMI
|
||||||
|
command to complete (be acknowledged by the baremetal node)
|
||||||
|
* ``[conductor]/power_state_change_timeout``: timeout value to wait for
|
||||||
|
a power operation to complete, so that the baremetal node is in the
|
||||||
|
desired new power state
|
||||||
fixes:
|
fixes:
|
||||||
- |
|
- |
|
||||||
Prevents the IPMI driver from needlessly checking status if the power
|
Prevents the IPMI driver from needlessly checking status of the baremetal
|
||||||
change action fails. Additionally stop retrying power actions and power
|
node if a power change action fails. Additionally, stops retrying power
|
||||||
status polls if we receive a non-retryable error from ipmitool.
|
actions and power status polls on receipt of a non-retryable error from
|
||||||
https//bugs.launchpad.net/ironic/+bug/1675529. New configuration option
|
ipmitool. For more information, see
|
||||||
`power_state_change_timeout` added to define how many seconds to wait for a
|
https//bugs.launchpad.net/ironic/+bug/1675529. A new configuration option
|
||||||
server to change power state when a power action is requested.
|
``[conductor]/power_state_change_timeout`` can be used to specify how many
|
||||||
|
seconds to wait for a baremetal node to change power state when a power
|
||||||
|
action is requested.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user