Thorough replacement of git.openstack.org URLs with their opendev.org counterparts. Change-Id: Ic19e45986e9f650c1deb297f463e679fd268ad97
25 KiB
Rolling Upgrades
The ironic (ironic-api and ironic-conductor) services support rolling upgrades, starting with a rolling upgrade from the Ocata to the Pike release. This describes the design of rolling upgrades, followed by notes for developing new features or modifying an IronicObject.
Design
Rolling upgrades between releases
Ironic follows the release-cycle-with-intermediary
release model. The releases are semantic-versioned, in the form
<major>.<minor>.<patch>. We refer to a
named release
of ironic as the release associated with a
development cycle like Pike.
In addition, ironic follows the standard deprecation policy, which says that the deprecation period must be at least three months and a cycle boundary. This means that there will never be anything that is both deprecated and removed between two named releases.
Rolling upgrades will be supported between:
- named release N to N+1 (starting with N == Ocata)
- any named release to its latest revision, containing backported bug fixes. Because those bug fixes can contain improvements to the upgrade process, the operator should patch the system before upgrading between named releases.
- most recent named release N (and semver releases newer than N) to master. As with the above bullet point, there may be a bug or a feature introduced on a master branch, that we want to remove before publishing a named release. Deprecation policy allows to do this in a 3 month time frame. If the feature was included and removed in intermediate releases, there should be a release note added, with instructions on how to do a rolling upgrade to master from an affected release or release span. This would typically instruct the operator to upgrade to a particular intermediate release, before upgrading to master.
Rolling upgrade process
Ironic supports rolling upgrades as described in the upgrade guide <../admin/upgrade-guide>
.
The upgrade process will cause the ironic services to be running the
FromVer
and ToVer
releases in this order:
- Upgrade ironic code and run database schema migrations via the
ironic-dbsync upgrade
command. - Upgrade code and restart ironic-conductor services, one at a time.
- Upgrade code and restart ironic-api services, one at a time.
- Unpin API, RPC and object versions so that the services can now use
the latest versions in
ToVer
. This is done via updating the configuration option described below in API, RPC and object version pinning and then restarting the services. ironic-conductor services should be restarted first, followed by the ironic-api services. This is to ensure that when new functionality is exposed on the unpinned API service (via API micro version), it is available on the backend.
step | ironic-api | ironic-conductor |
---|---|---|
|
all FromVer | all FromVer |
|
all FromVer | some FromVer, some ToVer-pinned |
|
all FromVer | all ToVer-pinned |
|
some FromVer, some ToVer-pinned | all ToVer-pinned |
|
all ToVer-pinned | all ToVer-pinned |
|
all ToVer-pinned | some ToVer-pinned, some ToVer |
|
all ToVer-pinned | all ToVer |
|
some ToVer-pinned, some ToVer | all ToVer |
|
all ToVer | all ToVer |
Policy for changes to the DB model
The policy for changes to the DB model is as follows:
- Adding new items to the DB model is supported.
- The dropping of columns or tables and corresponding objects' fields
is subject to ironic's deprecation
policy. But its alembic script has to wait one more deprecation
period, otherwise an
unknown column
exception will be thrown whenFromVer
services access the DB. This is becauseironic-dbsync upgrade
upgrades the DB schema butFromVer
services still contain the dropped field in their SQLAlchemy DB model. - An
alembic.op.alter_column()
to rename or resize a column is not allowed. Instead, split it into multiple operations, with one operation per release cycle (to maintain compatibility with an old SQLAlchemy model). For example, to rename a column, add the new column in release N, then remove the old column in release N+1. - Some implementations of SQL's
ALTER TABLE
, such as adding foreign keys in PostgreSQL, may impose table locks and cause downtime. If the change cannot be avoided and the impact is significant (e.g. the table can be frequently accessed and/or store a large dataset), these cases must be mentioned in the release notes.
API, RPC and object version pinning
For the ironic services to be running old and new releases at the same time during a rolling upgrade, the services need to be able to handle different API, RPC and object versions.
This versioning is handled via the configuration option:
[DEFAULT]/pin_release_version
. It is used to pin the API,
RPC and IronicObject (e.g., Node, Conductor, Chassis, Port, and
Portgroup) versions for all the ironic services.
The default value of empty indicates that ironic-api and
ironic-conductor will use the latest versions of API, RPC and
IronicObjects. Its possible values are releases, named (e.g.
ocata
) or sem-versioned (e.g. 7.0
).
Internally, in common/release_mappings.py, ironic maintains a mapping that indicates the API, RPC and IronicObject versions associated with each release. This mapping is maintained manually.
During a rolling upgrade, the services using the new release will set the configuration option value to be the name (or version) of the old release. This will indicate to the services running the new release, which API, RPC and object versions that they should be compatible with, in order to communicate with the services using the old release.
Handling API versions
When the (newer) service is pinned, the maximum API version it supports will be the pinned version -- which the older service supports (as described above at API, RPC and object version pinning). The ironic-api service returns HTTP status code 406 for any requests with API versions that are higher than this maximum version.
Handling RPC versions
ConductorAPI.__init__()
sets the version_cap
variable to the desired (latest or
pinned) RPC API version and passes it to the RPCClient
as
an initialization parameter. This variable is then used to determine the
maximum requested message version that the RPCClient
can
send.
Each RPC call can customize the request according to this
version_cap
. The Ironic RPC
versions section below has more details about this.
Handling IronicObject versions
Internally, ironic services deal with IronicObjects in their latest versions. Only at these boundaries, when the IronicObject enters or leaves the service, do we deal with object versioning:
- getting objects from the database: convert to latest version
- saving objects to the database: if pinned, save in pinned version; else save in latest version
- serializing objects (to send over RPC): if pinned, send pinned version; else send latest version
- deserializing objects (receiving objects from RPC): convert to latest version
The ironic-api service also has to handle API requests/responses based on whether or how a feature is supported by the API version and object versions. For example, when the ironic-api service is pinned, it can only allow actions that are available to the object's pinned version, and cannot allow actions that are only available for the latest version of that object.
To support this:
- All the database tables (SQLAlchemy models) of the IronicObjects
have a column named
version
. The value is the version of the object that is saved in the database. - The method
IronicObject.get_target_version()
returns the target version. If pinned, the pinned version is returned. Otherwise, the latest version is returned. - The method
IronicObject.convert_to_version()
converts the object into the target version. The target version may be a newer or older version than the existing version of the object. The bulk of the work is done in the helper methodIronicObject._convert_to_version()
. Subclasses that have new versions redefine this to perform the actual conversions.
In the following,
- The old release is
FromVer
; it uses version 1.14 of a Node object. - The new release is
ToVer
. It uses version 1.15 of a Node object --this has a deprecatedextra
field and a newmeta
field that replacesextra
. - db_obj['meta'] and db_obj['extra'] are the database representations of those node fields.
Getting objects from the database (API/conductor <-- DB)
Both ironic-api and ironic-conductor services read values from the
database. These values are converted to IronicObjects via the method
IronicObject._from_db_object()
. This method always returns
the IronicObject in its latest version, even if it was in an older
version in the database. This is done regardless of the service being
pinned or not.
Note that if an object is converted to a later version, that
IronicObject will retain any changes (in its
_changed_fields
field) resulting from that conversion. This
is needed in case the object gets saved later, in the latest
version.
For example, if the node in the database is in version 1.14 and has db_obj['extra'] set:
- a
FromVer
service will get a Node with node.extra = db_obj['extra'] (and no knowledge of node.meta since it doesn't exist) - a
ToVer
service (pinned or unpinned), will get a Node with:- node.meta = db_obj['extra']
- node.extra = None
- node._changed_fields = ['meta', 'extra']
Saving objects to the database (API/conductor --> DB)
The version used for saving IronicObjects to the database is determined as follows:
- For an unpinned service, the object is saved in its latest version. Since objects are always in their latest version, no conversions are needed.
- For a pinned service, the object is saved in its pinned version. Since objects are always in their latest version, the object needs to be converted to the pinned version before being saved.
The method IronicObject.do_version_changes_for_db()
handles this logic, returning a dictionary of changed fields and their
new values (similar to the existing
oslo.versionedobjects.VersionedObject.obj_get_changes()
).
Since we do not keep track internally, of the database version of an
object, the object's version
field will always be part of
these changes.
The Rolling upgrade process (at step 3.1) ensures that by the time an object can be saved in its latest version, all services are running the newer release (although some may still be pinned) and can handle the latest object versions.
An interesting situation can occur when the services are as described
in step 3.1. It is possible for an IronicObject to be saved in a newer
version and subsequently get saved in an older version. For example, a
ToVer
unpinned conductor might save a node in version 1.5.
A subsequent request may cause a ToVer
pinned conductor to
replace and save the same node in version 1.4!
Sending objects via RPC (API/conductor -> RPC)
When a service makes an RPC request, any IronicObjects that are sent
as part of that request are serialized into entities or primitives via
IronicObjectSerializer.serialize_entity()
. The version used
for objects being serialized is as follows:
- For an unpinned service, the object is serialized to its latest version. Since objects are always in their latest version, no conversions are needed.
- For a pinned service, the object is serialized to its pinned version. Since objects are always in their latest version, the object is converted to the pinned version before being serialized. The converted object includes changes that resulted from the conversion; this is needed so that the service at the other end of the RPC request has the necessary information if that object will be saved to the database.
Receiving objects via RPC (API/conductor <- RPC)
When a service receives an RPC request, any entities that are part of
the request need to be deserialized (via
oslo.versionedobjects.VersionedObjectSerializer.deserialize_entity()
).
For entities that represent IronicObjects, we want the deserialization
process (via IronicObjectSerializer._process_object()
) to
result in IronicObjects that are in their latest version, regardless of
the version they were sent in and regardless of whether the receiving
service is pinned or not. Again, any objects that are converted will
retain the changes that resulted from the conversion, useful if that
object is later saved to the database.
For example, a FromVer
ironic-api could issue an
update_node()
RPC request with a node in version 1.4, where
node.extra was changed (so node._changed_fields = ['extra']). This node
will be serialized in version 1.4. The receiving ToVer
pinned ironic-conductor deserializes it and converts it to version 1.5.
The resulting node will have node.meta set (to the changed value from
node.extra in v1.4), node.extra = None, and node._changed_fields =
['meta', 'extra'].
When developing a new feature or modifying an IronicObject
When adding a new feature or changing an IronicObject, they need to be coded so that things work during a rolling upgrade.
The following describe areas where the code may need to be changed, as well as some points to keep in mind when developing code.
ironic-api
During a rolling upgrade, the new, pinned ironic-api is talking to a new conductor that might also be pinned. There may also be old ironic-api services. So the new, pinned ironic-api service needs to act like it was the older service:
- New features should not be made available, unless they are somehow
totally supported in the old and new releases. Pinning the API version
is in place to handle this.
- If, for whatever reason, the API version pinning doesn't prevent a request from being handled that cannot or should not be handled, it should be coded so that the response has HTTP status code 406 (Not Acceptable). This is the same response to requests that have an incorrect (old) version specified.
Ironic RPC versions
When the signature (arguments) of an RPC method is changed or new methods are added, the following needs to be considered:
- The RPC version must be incremented and be the same value for both
the client (
ironic/conductor/rpcapi.py
, used by ironic-api) and the server (ironic/conductor/manager.py
, used by ironic-conductor). It should also be updated inironic/common/release_mappings.py
. - Until there is a major version bump, new arguments of an RPC method can only be added as optional. Existing arguments cannot be removed or changed in incompatible ways with the method in older RPC versions.
- ironic-api (client-side) sets a version cap (by passing the version
cap to the constructor of oslo_messaging.RPCClient). This "pinning" is
in place during a rolling upgrade when the
[DEFAULT]/pin_release_version
configuration option is set. - New RPC methods are not available when the service is pinned to the older release version. In this case, the corresponding REST API function should return a server error or implement alternative behaviours.
- Methods which change arguments should run
client.can_send_version()
to see if the version of the request is compatible with the version cap of the RPC Client. Otherwise the request needs to be created to work with a previous version that is supported. - ironic-conductor (server-side) should tolerate older versions of requests in order to keep working during the rolling upgrade process. The behaviour of ironic-conductor will depend on the input parameters passed from the client-side.
- Old methods can be removed only after they are no longer used by a previous named release.
Object versions
When subclasses of ironic.objects.base.IronicObject
are
modified, the following needs to be considered:
Any change of fields or change in signature of remotable methods needs a bump of the object version. The object versions are also maintained in
ironic/common/release_mappings.py
.New objects must be added to
ironic/common/release_mappings.py
. Also for the first releases they should be excluded from the version check by adding their class names to theNEW_MODELS
list inironic/cmd/dbsync.py
.The arguments of remotable methods (methods which are remoted to the conductor via RPC) can only be added as optional. They cannot be removed or changed in an incompatible way (to the previous release).
Field types cannot be changed. Instead, create a new field and deprecate the old one.
There is a unit test that generates the hash of an object using its fields and the signatures of its remotable methods. Objects that have a version bump need to be updated in the expected_object_fingerprints dictionary; otherwise this test will fail. A failed test can also indicate to the developer that their change(s) to an object require a version bump.
When new version objects communicate with old version objects and when reading or writing to the database,
ironic.objects.base.IronicObject._convert_to_version()
will be called to convert objects to the target version. Objects should implement their own ._convert_to_version() to remove or alter fields which were added or changed after the target version:def _convert_to_version(self, target_version, remove_unavailable_fields=True): """Convert to the target version. Subclasses should redefine this method, to do the conversion of the object to the target version. Convert the object to the target version. The target version may be the same, older, or newer than the version of the object. This is used for DB interactions as well as for serialization/deserialization. The remove_unavailable_fields flag is used to distinguish these two cases: 1) For serialization/deserialization, we need to remove the unavailable fields, because the service receiving the object may not know about these fields. remove_unavailable_fields is set to True in this case. 2) For DB interactions, we need to set the unavailable fields to their appropriate values so that these fields are saved in the DB. (If they are not set, the VersionedObject magic will not know to save/update them to the DB.) remove_unavailable_fields is set to False in this case. :param target_version: the desired version of the object :param remove_unavailable_fields: True to remove fields that are unavailable in the target version; set this to True when (de)serializing. False to set the unavailable fields to appropriate values; set this to False for DB interactions.
This method must handle:
- converting from an older version to a newer version
- converting from a newer version to an older version
- making sure, when converting, that you take into consideration other object fields that may have been affected by a field (value) only available in a newer version. For example, if field 'new' is only available in Node version 1.5 and Node.affected = Node.new+3, when converting to 1.4 (an older version), you may need to change the value of Node.affected too.
Online data migrations
The ironic-dbsync online_data_migrations
command will
perform online data migrations.
Keep in mind the Policy for changes to the DB model. Future incompatible changes in SQLAlchemy models, like removing or renaming columns and tables can break rolling upgrades (when ironic services are run with different release versions simultaneously). It is forbidden to remove these database resources when they may still be used by the previous named release.
When creating new Alembic migrations which modify existing models, make sure that any new columns default to NULL. Test the migration out on a non-empty database to make sure that any new constraints don't cause the database to be locked out for normal operations.
You can find an overview on what DDL operations may cause downtime in https://dev.mysql.com/doc/refman/5.7/en/innodb-create-index-overview.html. (You should also check older, widely deployed InnoDB versions for issues.) In the case of PostgreSQL, adding a foreign key may lock a whole table for writes.
Make sure to add a release note if there are any downtime-related concerns.
Backfilling default values, and migrating data between columns or
between tables must be implemented inside an online migration script. A
script is a database API method (added to ironic/db/api.py
and ironic/db/sqlalchemy/api.py
) which takes two
arguments:
- context: an admin context
- max_count: this is used to limit the query. It is the maximum number of objects to migrate; >= 0. If zero, all the objects will be migrated.
It returns a two-tuple:
- the total number of objects that need to be migrated, at the start of the method, and
- the number of migrated objects.
In this method, the version column can be used to select and update old objects.
The method name should be added to the list of
ONLINE_MIGRATIONS
in ironic/cmd/dbsync.py
.
The method should be removed in the next named release after this one.
After online data migrations are completed and the SQLAlchemy models no longer contain old fields, old columns can be removed from the database. This takes at least 3 releases, since we have to wait until the previous named release no longer contains references to the old schema. Before removing any resources from the database by modifying the schema, make sure that your implementation checks that all objects in the affected tables have been migrated. This check can be implemented using the version column.
"ironic-dbsync upgrade" command
The ironic-dbsync upgrade
command first checks that the
versions of the objects are compatible with the (new) release of ironic,
before it will make any DB schema changes. If one or more objects are
not compatible, the upgrade will not be performed.
This check is done by comparing the objects' version
field in the database with the expected (or supported) versions of these
objects. The supported versions are the versions specified in
ironic.common.release_mappings.RELEASE_MAPPING
. The newly
created tables cannot pass this check and thus have to be excluded by
adding their object class names (e.g. Node
) to
ironic.cmd.dbsync.NEW_MODELS
.