This spec proposes a way to easily track which volumes and snapshots are temporary and should not be considered when updating the usage quota. Change-Id: I0a3f5836641dec535c2d2bf49cbf3a435faa8224 Implements: blueprint temp-resources
7.1 KiB
Temporary Resource Tracking
https://blueprints.launchpad.net/cinder/+spec/temp-resources
Improve Cinder's temporary resource tracking to prevent related quota issues.
Problem description
Cinder doesn't currently have a consistent way of tracking temporary resources, which leads to quota bugs.
In some cases temporary volumes use the temporary
key in
the admin metadata table to mark them and in other cases we determine a
volume is temporary based on its migration_status
field,
there are even cases where volumes are not being marked as temporary.
Due to this roundabout way of marking temporary volumes and having
multiple options makes our Cinder code error prone, as is clear by the
number of bugs around it.
As for temporary snapshots, Cinder doesn't currently have any way of reliably tracking them, so the code creating temporary resources assumes that everything will run smoothly and the deletion code in the method will be called after successfully completing the operation. Sometimes that is not true, as the operation could fail and leave the temporary resource behind, forcing users to delete them manually, which messes up the quota, since the REST API delete call doesn't know it shouldn't touch the quota.
When we say that we don't have a reliable way of tracking snapshots
we refer to the fact that even though snapshots have a name that helps
identify them, such as [revert] volume %s backup snapshot
and backup-snap-%s
, these are also valid snapshot names
that a user can assign, so we cannot rely on them to differentiate
temporary snapshots.
Use Cases
There are several cases where this feature will be useful:
- Revert to snapshot is configured to use a temporary snapshot, but either the revert fails or the deletion of the temporary volume fails, so the user ends up manually deleting the snapshot, and the quota is kept in sync with reality.
- Creating a backup of an in-use volume when
backup_use_temp_snapshot
is enabled fails, or the deletion of the temporary resource failed, forcing the user to manually deleting the snapshot, and the user wants the quota to be kept in sync with reality. - A driver may have some slow code that gets triggered when cloning or creating a snapshot for performance reasons but that would not be reasonable to execute for temporary volumes. An example would be the flattening of cloned volumes on the RBD driver.
Proposed change
The proposed solution is to have an explicit DB field that indicates whether a resource should be counted towards quota or not.
The field would be named use_quota
and it would be added
to the volumes
and snapshots
DB tables. We
currently don't have temporary backups, so no field would be added to
the backups
DB table.
This would replace the temporary
admin metadata entry
and the migration_status
entry in 2 cycles, since we need
to keep supporting rolling upgrades where we could be running code that
doesn't know about the new use_quota
field.
Alternatives
An alternative solution would be to use the temporary
key in the volumes' admin metadata table like we are doing in some case
and create one such table for snapshots as well.
With that alternative DB queries could become more complex, unlike with the proposed solution where they would become simpler.
Data model impact
Adds a use_quota
DB field of type Boolean
to both volumes
and snapshots
tables.
It will have an online data migration to set the
use_quota
field for existing volumes as well as an updated
save
method for Volume
and
Snapshot
OVOs that sets this field whenever they are
saved.
REST API impact
There won't be any new REST API endpoint since the
use_quota
field is an internal field and we don't want
users or administrators modifying it.
But since this is useful information we will add this field to the
volume's JSON response for all endpoints that return it, although with a
more user oriented name consumes_quota
:
- Create volume
- Show volume
- Update volume
- List detailed volumes
- Create snapshot
- Show snapshot
- Update snapshot
- List detailed snapshots
Security impact
None.
Active/Active HA impact
None, since this mostly just affects whether quota code is called or not when receiving REST API delete requests.
Notifications impact
None.
Other end user impact
The change requires a patch on the python-cinderclient to show the
new returned field consumes_quota
.
Performance Impact
There should be no performance detriment with this change, since the field would be added at creation time and would not require additional DB queries.
Moreover performance improvements should be possible in the future once we remove compatibility code with the current temporary volume checks, for example not requiring writing to the admin metadata table, making quota sync calculations directly on the DB, etc.
Other deployer impact
None.
Developer impact
By default Volume and Snapshot OVOs will use quota on creation (set
use_quota
to True
) and when developers want to
create temporary resources that don't consume quota on creation or
release it on deletion will need to pass use_quota=False
at
creation time.
Also when doing quota (adding or removing) new code will have to check this field in Volumes and Snapshots.
It will no longer be necessary to add additional admin metadata or
check the migration_status
, which should make it coding
easier and reduce the number of related bugs.
Implementation
Assignee(s)
- Primary assignee:
-
Gorka Eguileor (geguileo)
Work Items
- DB schema changes.
- DB online migration and OVO changes.
- Update existing operations that mark volumes as temporary to use the
new
use_quota
field. - Update operations that are not currently marking resources as
temporary to do so with the new
use_quota
field. - REST API changes to return the
use_quota
field asconsumes_quota
. - Cinderclient changes.
Dependencies
None.
Testing
No new tempest test will be added, since the case we want to fix is mostly around error situations that we cannot force in tempest.
Unit tests will be provided as with any other patch.
Documentation Impact
The API reference documentation will be updated.
References
Proposed Cinder code implementation:
- https://review.opendev.org/c/openstack/cinder/+/786385
- https://review.opendev.org/c/openstack/cinder/+/786386
Proposed python-cinderclient code implementation:
Proposed code to leverage this new functionality in the RBD driver to not flatten temporary resources: