This patch provides implementations to feature of adding inspect wait state. Changes covered in this patch: * Added state and transitions, state diagram regenerated. * inspector and oneview inspect interface now return INSPECTWAIT instead of INSPECTING. Move node to inspect wait if inspect interface returns INSPECTING or INSPECTWAIT. * Add a timeout option to conductor, and a periodic task to check timeout in the inspect wait state. Story: #1725211 Task: #10630 Partial-Bug: #1725211 Change-Id: Ie76bfdad5966014a4dae826919ff5705462c743b
10 KiB
Ironic's State Machine
State Machine Diagram
The diagram below shows the provisioning states that an Ironic node goes through during the lifetime of a node. The diagram also depicts the events that transition the node to different states.
Stable states are highlighted with a thicker border. All transitions from stable states are initiated by API requests. There are a few other API-initiated-transitions that are possible from non-stable states. The events for these API-initiated transitions are indicated with '(via API)'. Internally, the conductor initiates the other transitions (depicted in gray).
State Descriptions
- enroll (stable state)
-
This is the state that all nodes start off in when created using API version 1.11 or newer. When a node is in the
enroll
state, the only thing ironic knows about it is that it exists, and ironic cannot take any further action by itself. Once a node has its driver/interfaces and their required information set innode.driver_info
, the node can be transitioned to theverifying
state by setting the node's provision state using themanage
verb. - verifying
-
ironic will validate that it can manage the node using the information given in
node.driver_info
and with either the driver/hardware type and interfaces it has been assigned. This involves going out and confirming that the credentials work to access whatever node control mechanism they talk to. - manageable (stable state)
-
Once ironic has verified that it can manage the node using the driver/interfaces and credentials passed in at node create time, the node will be transitioned to the
manageable
state. Frommanageable
, nodes can transition to:manageable
(throughcleaning
) by setting the node's provision state using theclean
verb.manageable
(throughinspecting
) by setting the node's provision state using theinspect
verb.available
(throughcleaning
if automatic cleaning is enabled) by setting the node's provision state using theprovide
verb.active
(throughadopting
) by setting the node's provision state using theadopt
verb.
manageable
is the state that a node should be moved into when any updates need to be made to it such as changes to fields in driver_info and updates to networking information on ironic ports assigned to the node.manageable
is also the only stable state that can be transitioned to, from these failure states:adopt failed
clean failed
inspect failed
- inspecting
-
inspecting
will utilize node introspection to update hardware-derived node properties to reflect the current state of the hardware. Typically, the node will transition tomanageable
if inspection is synchronous, orinspect wait
if asynchronous. The node will transition toinspect failed
if error occurred. - inspect wait
-
This is the provision state used when an asynchronous inspection is in progress. A successfully inspected node shall transition to
manageable
state. - inspect failed
-
This is the state a node will move into when inspection of the node fails. From here the node can transitioned to:
inspecting
by setting the node's provision state using theinspect
verb.manageable
by setting the node's provision state using themanage
verb
- cleaning
-
Nodes in the
cleaning
state are being scrubbed and reprogrammed into a known configuration.When a node is in the
cleaning
state it means that the conductor is executing the clean step (for out-of-band clean steps) or preparing the environment (building PXE configuration files, configuring the DHCP, etc) to boot the ramdisk for running in-band clean steps. - clean wait
-
Just like the
cleaning
state, the nodes in theclean wait
state are being scrubbed and reprogrammed. The difference is that in theclean wait
state the conductor is waiting for the ramdisk to boot or the clean step which is running in-band to finish.The cleaning process of a node in the
clean wait
state can be interrupted by setting the node's provision state using theabort
verb if the task that is running allows it. - available (stable state)
-
After nodes have been successfully preconfigured and cleaned, they are moved into the
available
state and are ready to be provisioned. Fromavailable
, nodes can transition to:active
(throughdeploying
) by setting the node's provision state using theactive
verb.manageable
by setting the node's provision state using themanage
verb
- deploying
-
Nodes in
deploying
are being prepared to run a workload on them. This consists of running a series of tasks, such as:- Setting appropriate BIOS configurations
- Partitioning drives and laying down file systems.
- Creating any additional resources (node-specific network config, a config drive partition, etc.) that may be required by additional subsystems.
- wait call-back
-
Just like the
deploying
state, the nodes inwait call-back
are being deployed. The difference is that inwait call-back
the conductor is waiting for the ramdisk to boot or execute parts of the deployment which need to run in-band on the node (for example, installing the bootloader, or writing the image to the disk).The deployment of a node in
wait call-back
can be interrupted by setting the node's provision state using thedeleted
verb. - deploy failed
-
This is the state a node will move into when a deployment fails, for example a timeout waiting for the ramdisk to PXE boot. From here the node can be transitioned to:
active
(throughdeploying
) by setting the node's provision state using either theactive
orrebuild
verbs.available
(throughdeleting
andcleaning
) by setting the node's provision state using thedeleted
verb.
- active (stable state)
-
Nodes in
active
have a workload running on them. ironic may collect out-of-band sensor information (including power state) on a regular basis. Nodes inactive
can transition to:available
(throughdeleting
andcleaning
) by setting the node's provision state using thedeleted
verb.active
(throughdeploying
) by setting the node's provision state using therebuild
verb.rescue
(throughrescuing
) by setting the node's provision state using therescue
verb.
- deleting
-
Nodes in
deleting
state are being torn down from running an active workload. Indeleting
, ironic tears down and removes any configuration and resources it added indeploying
orrescuing
. - error (stable state)
-
This is the state a node will move into when deleting an active deployment fails. From
error
, nodes can transition to:available
(throughdeleting
andcleaning
) by setting the node's provision state using thedeleted
verb.
- adopting
-
This state allows ironic to take over management of a baremetal node with an existing workload on it. Ordinarily when a baremetal node is enrolled and managed by ironic, it must transition through
cleaning
anddeploying
to reachactive
state. However, those baremetal nodes that have an existing workload on them, do not need to be deployed or cleaned again, so this transition allows these nodes to move directly frommanageable
toactive
. - rescuing
-
Nodes in
rescuing
are being prepared to perform rescue operations. This consists of running a series of tasks, such as:- Setting appropriate BIOS configurations.
- Creating any additional resources (node-specific network config, etc.) that may be required by additional subsystems.
- rescue wait
-
Just like the
rescuing
state, the nodes inrescue wait
are being rescued. The difference is that inrescue wait
the conductor is waiting for the ramdisk to boot or execute parts of the rescue which need to run in-band on the node (for example, setting the password for user namedrescue
).The rescue operation of a node in
rescue wait
can be aborted by setting the node's provision state using theabort
verb. - rescue failed
-
This is the state a node will move into when a rescue operation fails, for example a timeout waiting for the ramdisk to PXE boot. From here the node can be transitioned to:
rescue
(throughrescuing
) by setting the node's provision state using therescue
verb.active
(throughunrescuing
) by setting the node's provision state using theunrescue
verb.available
(throughdeleting
) by setting the node's provision state using thedeleted
verb.
- rescue (stable state)
-
Nodes in
rescue
have a rescue ramdisk running on them. Ironic may collect out-of-band sensor information (including power state) on a regular basis. Nodes inrescue
can transition to:active
(throughunrescuing
) by setting the node's provision state using theunrescue
verb.available
(throughdeleting
) by setting the node's provision state using thedeleted
verb.
- unrescuing
-
Nodes in
unrescuing
are being prepared to transition toactive
state fromrescue
state. This consists of running a series of tasks, such as setting appropriate BIOS configurations such as changing boot device. - unrescue failed
-
This is the state a node will move into when an unrescue operation fails. From here the node can be transitioned to:
rescue
(throughrescuing
) by setting the node's provision state using therescue
verb.active
(throughunrescuing
) by setting the node's provision state using theunrescue
verb.available
(throughdeleting
) by setting the node's provision state using thedeleted
verb.