A service for managing and provisioning Bare Metal servers.
Go to file
Jason Anderson bfc2ad56d5
Always retry locking when performing task handoff
There are some Ironic execution workflows where there is not an easy way
to retry, such as when attempting to hand off the processing of an async
task to a conductor. Task handoff can require releasing a lock on the
node, so the next entity processing the task can acquire the lock
itself. However, this is vulnerable to race conditions, as there is no
uniform retry mechanism built in to such handoffs. Consider the
continue_node_deploy/clean logic, which does this:

  method = 'continue_node_%s' % operation
  # Need to release the lock to let the conductor take it
  task.release_resources()
  getattr(rpc, method)(task.context, uuid, topic=topic

If another process obtains a lock between the releasing of resources and
the acquiring of the lock during the continue_node_* operation, and
holds the lock longer than the max attempt * interval window (which
defaults to 3 seconds), then the handoff will never complete. Beyond
that, because there is no proper queue for processes waiting on the
lock, there is no fairness, so it's also possible that instead of one
long lock being held, the lock is obtained and held for a short window
several times by other competing processes.

This manifests as nodes occasionally getting stuck in the "DEPLOYING"
state during a deploy. For example, a user may attempt to open or access
the serial console before the deploy is complete--the serial console
process obtains a lock and starves the conductor of the lock, so the
conductor cannot finish the deploy. It's also possible a long heartbeat
or badly-timed sequence of heartbeats could do the same.

To fix this, this commit introduces the concept of a "patient" lock,
which will retry indefinitely until it doesn't encounter the NodeLocked
exception. This overrides any retry behavior.

  .. note::
     There may be other cases where such a lock is desired.

Story: #2008323
Change-Id: I9937fab18a50111ec56a3fd023cdb9d510a1e990
2020-11-24 09:41:38 -06:00
api-ref Add 'agent_token' to heartbeat request 2020-09-10 14:19:21 +00:00
devstack Merge "devstack: log all requests to sushy-emulator" 2020-11-02 12:36:31 +00:00
doc Merge "Allow passing rootfs_uuid for the standalone case" 2020-10-23 00:18:45 +00:00
etc Remove qemu-img rootwrap filter 2020-08-18 16:12:57 +02:00
ironic Always retry locking when performing task handoff 2020-11-24 09:41:38 -06:00
playbooks/ci-workarounds Native zuulv3 grenade multinode multitenant 2020-09-16 23:33:42 +02:00
releasenotes Always retry locking when performing task handoff 2020-11-24 09:41:38 -06:00
tools Update checking reno script to use python3 2020-10-11 22:13:21 +08:00
zuul.d Merge "CI: increase cleaning timeout and tie it to PXE boot timeout" 2020-10-30 19:19:20 +00:00
.gitignore Migrate to stestr as unit tests runner 2017-09-22 08:56:34 +00:00
.gitreview OpenDev Migration Patch 2019-04-19 19:40:53 +00:00
.mailmap Add my new address to .mailmap 2020-04-13 07:29:37 -07:00
.stestr.conf Migrate to stestr as unit tests runner 2017-09-22 08:56:34 +00:00
bindep.txt CI: update bindep for centos-8 py36 job changes 2020-10-03 18:22:36 -07:00
CONTRIBUTING.rst Project Contributing updates for Goal 2020-02-20 02:01:21 +00:00
driver-requirements.txt Add GPU reporting to idrac-wsman inspect interface 2020-09-30 18:33:53 -04:00
LICENSE Added project infrastructure needs. 2013-05-02 14:55:43 -04:00
lower-constraints.txt Fix handling OctetString for pysnmp 2020-09-29 13:00:52 +00:00
README.rst Add ironic-specs link to readme.rst 2019-08-30 17:16:09 +08:00
reno.yaml tell reno to ignore the kilo branch 2020-02-07 16:42:15 -05:00
requirements.txt Fix lower-constraints for Ubuntu Focal 2020-09-11 04:23:12 +00:00
setup.cfg Merge "Add Redfish BIOS interface to idrac HW type" 2020-09-25 02:32:22 +00:00
setup.py Cleanup Python 2.7 support 2020-04-03 17:49:23 +02:00
test-requirements.txt Update test requirements 2020-10-15 16:00:47 +02:00
tox.ini Merge "Enforce autospec in some api controllers modules" 2020-11-03 11:17:50 +00:00

Ironic

Team and repository tags

image

Overview

Ironic consists of an API and plug-ins for managing and provisioning physical machines in a security-aware and fault-tolerant manner. It can be used with nova as a hypervisor driver, or standalone service using bifrost. By default, it will use PXE and IPMI to interact with bare metal machines. Ironic also supports vendor-specific plug-ins which may implement additional functionality.

Ironic is distributed under the terms of the Apache License, Version 2.0. The full terms and conditions of this license are detailed in the LICENSE file.

Project resources

Project status, bugs, and requests for feature enhancements (RFEs) are tracked in StoryBoard: https://storyboard.openstack.org/#!/project/943

For information on how to contribute to ironic, see https://docs.openstack.org/ironic/latest/contributor