bfc2ad56d5
There are some Ironic execution workflows where there is not an easy way to retry, such as when attempting to hand off the processing of an async task to a conductor. Task handoff can require releasing a lock on the node, so the next entity processing the task can acquire the lock itself. However, this is vulnerable to race conditions, as there is no uniform retry mechanism built in to such handoffs. Consider the continue_node_deploy/clean logic, which does this: method = 'continue_node_%s' % operation # Need to release the lock to let the conductor take it task.release_resources() getattr(rpc, method)(task.context, uuid, topic=topic If another process obtains a lock between the releasing of resources and the acquiring of the lock during the continue_node_* operation, and holds the lock longer than the max attempt * interval window (which defaults to 3 seconds), then the handoff will never complete. Beyond that, because there is no proper queue for processes waiting on the lock, there is no fairness, so it's also possible that instead of one long lock being held, the lock is obtained and held for a short window several times by other competing processes. This manifests as nodes occasionally getting stuck in the "DEPLOYING" state during a deploy. For example, a user may attempt to open or access the serial console before the deploy is complete--the serial console process obtains a lock and starves the conductor of the lock, so the conductor cannot finish the deploy. It's also possible a long heartbeat or badly-timed sequence of heartbeats could do the same. To fix this, this commit introduces the concept of a "patient" lock, which will retry indefinitely until it doesn't encounter the NodeLocked exception. This overrides any retry behavior. .. note:: There may be other cases where such a lock is desired. Story: #2008323 Change-Id: I9937fab18a50111ec56a3fd023cdb9d510a1e990 |
||
---|---|---|
api-ref | ||
devstack | ||
doc | ||
etc | ||
ironic | ||
playbooks/ci-workarounds | ||
releasenotes | ||
tools | ||
zuul.d | ||
.gitignore | ||
.gitreview | ||
.mailmap | ||
.stestr.conf | ||
bindep.txt | ||
CONTRIBUTING.rst | ||
driver-requirements.txt | ||
LICENSE | ||
lower-constraints.txt | ||
README.rst | ||
reno.yaml | ||
requirements.txt | ||
setup.cfg | ||
setup.py | ||
test-requirements.txt | ||
tox.ini |
Ironic
Team and repository tags
Overview
Ironic consists of an API and plug-ins for managing and provisioning physical machines in a security-aware and fault-tolerant manner. It can be used with nova as a hypervisor driver, or standalone service using bifrost. By default, it will use PXE and IPMI to interact with bare metal machines. Ironic also supports vendor-specific plug-ins which may implement additional functionality.
Ironic is distributed under the terms of the Apache License, Version 2.0. The full terms and conditions of this license are detailed in the LICENSE file.
Project resources
- Documentation: https://docs.openstack.org/ironic/latest
- Source: https://opendev.org/openstack/ironic
- Bugs: https://storyboard.openstack.org/#!/project/943
- Wiki: https://wiki.openstack.org/wiki/Ironic
- APIs: https://docs.openstack.org/api-ref/baremetal/index.html
- Release Notes: https://docs.openstack.org/releasenotes/ironic/
- Design Specifications: https://specs.openstack.org/openstack/ironic-specs/
Project status, bugs, and requests for feature enhancements (RFEs) are tracked in StoryBoard: https://storyboard.openstack.org/#!/project/943
For information on how to contribute to ironic, see https://docs.openstack.org/ironic/latest/contributor