Konrad Gube 82a67d7e19 Re-propose extend volume completion action

Previously-approved: 2023.1, 2023.2, 2024.1

For previous discussion see the change at:
https://review.opendev.org/c/openstack/cinder-specs/+/866718

Blueprint: extend-volume-completion-action
Change-Id: I1cb685683d317afbbb82d6556046db731852572b

2024-06-20 11:04:49 +02:00

17 KiB

Raw Permalink Blame History

Add extend volume completion action

https://blueprints.launchpad.net/cinder/+spec/extend-volume-completion-action

This blueprint proposes a new volume action that can be used by Nova to notify Cinder on success or failure when handling volume-extended external server events. The new volume action is used to add support for extending attached volumes to the NFS, NetApp NFS, Powerstore NFS, and Quobyte volume drivers.

Problem description

Many remotefs-based volume drivers in Cinder use the qemu-img resize command to extend volume files. However, when the volume is attached to a guest, QEMU will lock the file and qemu-img will be unable to resize it.

In this case, only the QEMU process holding the lock can resize the volume, which can be triggered through the QEMU monitor command block-resize.

There is currently no adequate way for Cinder to use this feature, so the NFS, NetApp NFS, Powerstore NFS, and Quobyte volume drivers all disable extending attached volumes.

Use Cases

As a user, I want to extend a NFS/NetApp NFS/Powerstore NFS/Quobyte volume while it is attached to an instance and I want the volume size and status to reflect the success or failure of the operation.

Proposed change

Nova's libvirt driver uses the block-resize command when handling the volume-extended external server event, to inform QEMU that the size of an attached volume has changed. It is in principle also capable of extending a volume file, but is currently unable to provide feedback to Cinder on the success of the operation.

Currently, Cinder will send the volume-extended external server event to Nova only after it has finalized the extend operation and reset the volume status from extending back to in-use.

This spec proposes to give volume drivers a mechanism to hold off finalizing the extend operation until after the volume-extended event has been sent and Cinder has received feedback from Nova that it was handled successfully.

This spec also proposes a new volume action that Nova will use to provide this feedback to Cinder.

API

A new API microversion is introduced, adding the new os-extend_volume_completion volume action.

The volume action takes a boolean error argument, indicating success or failure to extend the attached volume. It is intended to be used exclusively by Nova to notify Cinder, and an appropriate policy will be added to enforce this.

API.extend_volume_completion

The new volume action will be handled by a new method in the volume API:

def extend_volume_completion(self,
                             context: context.RequestContext,
                             volume: objects.Volume,
                             error: bool) -> None:

The new method expects the volume to have status extending, and to have the keys extend_reservations and extend_new_size in its admin metadata. The first should hold a list of quota reservations, and the second should contain an integer larger than the volume's current size, representing the new size after extending.

If these conditions are not met, then an InvalidVolume exception will be raised, resulting in an HTTP response of 400 Bad Request.

If the conditions are met, it will remove size and reservations from the admin metadata and call VolumeManager.extend_volume_completion() via RPC, passing both as arguments.

VolumeManager.extend_volume_completion

def extend_volume_completion(self,
                             context: context.RequestContext,
                             volume: objects.Volume,
                             new_size: int,
                             reservations: list[str],
                             error: bool) -> None:

The behavior of this method depends heavily on the error argument:

If error is True, the method will roll back the quota reservations, set the volume status to error_extending, and log the error.
If error is False, it will finalize the quota reservation, update the size field of the volume to the new size, and reset the volume status to available or in-use, depending on the presence of attachments. It will also update the pool stats and send a resize.end notification with the new volume size.

This is identical to how VolumeManager.extend_volume() currently handles success and failure of the volume driver's extend_volume() method, except that this method will not notify Nova with the volume-extended external server event.

VolumeDriver.extend_volume

A mechanism will be introduced by which the driver's extend_volume() method can signal to the volume manager that it has to wait for a response from Nova before finishing the extend operation. This could take the form of a return value or a new exception that the volume manager will have to catch.

The NFS, NetApp NFS, Powerstore NFS, and Quobyte volume drivers currently have checks in their respective extend_volume methods, that will raise an exception if the volume to be resized is attached, causing the operation to fail. Those checks will be removed.

Instead, the drivers will catch any exceptions resulting from the volume files being locked (see the proposed change to nfs.py in¹ for an example on how to do that), and notify the volume manager that feedback from Nova is required.

VolumeManager.extend_volume

The call to the volume driver's extend_volume() method will be handled as follows:

If the call fails, extend_volume_completion will be called with error=True.
If the call succeeds, but the volume is not attached, extend_volume_completion will be called with error=False.
If the call succeeds, and the volume is attached, extend_volume_completion will be called with error=False and Nova will be notified with the external server event.

This matches the current inline behavior of the method, and covers offline extend for all drivers, as well as online extend for the drivers that previously supported it.

To support remotefs-based drivers that have to rely on Nova for online extend, two aditional cases will be handled:

If the driver notifies the volume manager that a response from Nova is required, but the volume is not attached, or the volume is attached to more than one instance, it will be handled as failure and extend_volume_completion will be called with error=True.

QEMU can not resize shared volume files, because they are locked read-only, so adding multi-attach support for this feature is currently not worthwhile. However, support may be added later if other drivers require it, e.g. by enabling Cinder to handle multiple completion actions for the same volume.
If the driver notifies the volume manager that a response from Nova is required, and the volume is attached to exactly one instance, then Cinder will store the quota reservations and the target size in the in the admin metadata with the keys extend_reservations and extend_new_size.

It will then attempt to send the volume-extended external server event with the new Nova API microversion proposed in², making sure that Nova supports using the os-extend_volume_completion action.
- If the volume-extended event has been submitted to Nova successfully, this method will just return normally. The volume will now be left in status extending, which will signal to Nova that it should respond with the os-extend_volume_completion action, as described in the Nova subsection.
- If the volume-extended event could not be submitted, the operation will be rolled back by calling extend_volume_completion with error=True.
  
  This can happen if Nova doesn't support the required microversion yet, or if the external event API responded with an error code such as 403 or 404.

Visible Admin Metadata

extend_new_size has to be stored in the admin metadata, because the regular volume metadata is editable by users. A malicious user could otherwise edit the target size during the operation to bypass their quota.

Admin metadata of volumes is not visible to clients, but Cinder supports mapping select keys to the regular metadata, shadowing any user-set values of the same key.

The key extend_new_size will be added to the list of visible admin metadata in cinder/api/api_utils.py, so that Nova is able to read the target size of the extend operation.

OpenStack SDK

Support for the new volume action will be added to the OpenStack SDK, which Nova will use to call it.

Nova

When the Nova API receives a volume-extended external server event, and the call used the new microversion proposed in³, it will check the target compute service version. If a target compute agent is too old to support the feature, the API will discard the event and call the os-extend_volume_completion volume action with "error": true.

Otherwise, the event will be forwarded to the compute agent. When handling the volume-extended external server event, compute will check the volume status:

If the volume status is extending, then compute will attempt to read extend_new_size from the volume's metadata and use this value as the new size of the volume, instead of the volume size field.

After successfully extending the volume, it will call the extend volume completion action of the volume, with "error": false.

If anything goes wrong, including extend_new_size being missing from the metadata, or being smaller than the current size of the volume, compute will log the error and call the extend volume completion action with "error": true.
For any other volume status the event will be handled as before.

The changes in Nova are detailed in the current version of the Nova spec at ⁴.

os-reset_status

When resetting from status extending, the os-reset_status volume action will check for the extend_reservations key in the admin metadata. If it finds quota reservation keys, it will try to roll them back.

This is done to avoid a pile up of quota reservations in case communication between Cinder and Nova was lost and the status has to be reset to retry the resize.

The keys extend_reservations and extend_new_size will then be removed from the admin metadata.

Alternatives

A previous change tried to use the volume-extended external server event to support online extend for the NFS driver⁵, but did not rely on feedback from Nova to Cinder at all. Instead, it would just set the new size of the volume, change the status back to in-use, notify Nova, and hope for the best.

If anything went wrong on Nova's side, this would still result in a volume state indicating that the operation was successful, which is not acceptable.
The specs at⁶ and⁷ proposed a new synchronous API in Nova that can be used to trigger an assisted resize operation. This API would provide a single mechanism to trigger the resize operation, communicate the new size to Nova, and get feedback on the success of the operation.

The problem with a synchronous API is, that RPC and API timeouts limit the maximum time an extend operation can take. For QEMU, this seemed to be acceptable, because storage preallocation is hard disabled for the block-resize command, and because all currently plausible file systems support sparse file operations.

However, as reviewers in⁸ have pointed out, this may not be true for other volume or virt drivers that might require this API in the future. It would also break with the established pattern of asynchronous coordination between Nova and Cinder, which includes the assisted snapshot and volume migration features.
Following this pattern, we could make the proposed API asynchronous and use a new callback in Cinder, similar to Nova's os-assisted-volume-snapshots API, which uses the os-update_snapshot_status snapshot action to provide feedback to Cinder.

The function of the new Nova API would then just be to trigger the operation and to communicate the new size. The question is then, whether that warrants adding a new API to Nova, since there are existing mechanisms that could be used for either.
The existing mechanism for triggering the extend operation in Nova is, of course, the volume-extended external server event. Using it for this purpose, as this spec proposes, requires the target size to be transferred separately, because external server events only have a single text field that is freely usable, which for volume-extended is already used for the volume ID.

Besides storing it in the admin metadata, as this spec proposes, there is also the option of updating the size field of the volume, as⁹ was essentially doing.

This would require the volume size field to be reset on a failure. If an error response from Nova was lost, the volume would just keep the new size. We would need to extend os-reset_status to allow a size reset, or something similar to clean up volumes like this. This would be possible, but updating the size field only after the volume was successfully extended seems like a cleaner solution.
We could also extend the external server event API to accept additional data for events, and use this to communicate the new size to Nova.

This option was judged favorably by reviewers on the previous version of this spec,¹⁰, but it would be a more complex change to the Nova API.

However, if additional data fields become available in a future version of the external server event API, it would be a relatively minor change to use those instead of the volume metadata.

Data model impact

None

REST API impact

Starting with the new microversion, the POST /v3/{project_id}/volumes/{volume_id}/action API will accept request bodies of the following form:

{
    "os-extend_volume_completion": {
        "error": false
    }
}

with error indicating success or failure of the resize operation.

If the volume does not exist, the return code will be 404 Not Found.

If the volume status and admin metadata do not indicate that Cinder was waiting for an extend volume completion action, the return code will be 400 Bad Request.

Otherwise the return code will be 202 Accepted.

The new volume action is intended to only be used by Nova and will require the caller to have admin permissions.

Security impact

None

Active/Active HA impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

None

Other deployer impact

None

Developer impact

None

Implementation

Assignee(s)

Primary assignee:: kgube

Work Items

Move extend completion code from VolumeManager.extend_volume to new method and add tests.
Create new volume action and add unit tests.
Add a new microversion for the new os-extend_volume_completion action.
Add OpenStack SDK support.
Add Nova support.
Update drivers to use the feature.
Adapt the devstack-plugin-nfs-tempest CI-jobs to also test online volume extend.

Dependencies

Nova support of the callback¹¹.

Testing

Unit tests for the volume action will test the conditions all possible API responses.
Unit tests for VolumeManager.extend_volume will test all the code paths described in VolumeManager.extend_volume.
The new volume action cannot be independently tested by Tempest, because it requires the volume to be in a state that cannot be reproduced externally. It is, however, covered by the existing tests for online volume extend when they are run with one of the volume drivers that use this feature. The devstack-plugin-nfs-tempest jobs that run as part of the Cinder and Nova CI gates will be configured to enable online volume extend tests.

Documentation Impact

The Block Storage API reference will be updated to include the new volume action.

The volume driver support matrix will be updated to show online resize support for the affected drivers.

17 KiB Raw Permalink Blame History

Add extend volume completion action

Problem description

Use Cases

Proposed change

API

API.extend_volume_completion

VolumeManager.extend_volume_completion

VolumeDriver.extend_volume

VolumeManager.extend_volume

Visible Admin Metadata

OpenStack SDK

Nova

os-reset_status

Alternatives

Data model impact

REST API impact

Security impact

Active/Active HA impact

Notifications impact

Other end user impact

Performance Impact

Other deployer impact

Developer impact

Implementation

Assignee(s)

Work Items

Dependencies

Testing

Documentation Impact

References

17 KiB

Raw Permalink Blame History