Provisioning Improvements
Spec to add support for standardized over-provisioning calculations and make all drivers status reports consistent between different backends. Change-Id: Ie709333d75fb8c1c4693eae7d09d7f4925dc3fca
This commit is contained in:
parent
24d1cbbee9
commit
64703bf1c7
490
specs/queens/provisioning-improvements.rst
Normal file
490
specs/queens/provisioning-improvements.rst
Normal file
@ -0,0 +1,490 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=========================
|
||||
Provisioning Improvements
|
||||
=========================
|
||||
|
||||
Include the URL of your launchpad blueprint:
|
||||
|
||||
https://blueprints.launchpad.net/cinder/+spec/provisioning-improvements
|
||||
|
||||
Cinder provisioning is still a source of pain for everyone: end users, admins,
|
||||
and developers. Multiple factors have contributed to our current situation,
|
||||
like the information being dispersed in different specs, developer's reference,
|
||||
and even the code, but we also have cases of misinterpreted documentation,
|
||||
documentation not keeping up with the project's evolution, and even misleading
|
||||
or incorrect documentation.
|
||||
|
||||
This spec will build on those that preceded it on the same topic [1]_ [2]_ and
|
||||
others related to the subject [3]_ to bring a consolidated and updated view on
|
||||
the matter as well as add a some minor improvements and fix some issues in
|
||||
hopes that we can provide a better experience for all involved.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Our current situation is quite chaotic, we have volume creations that fail
|
||||
based on an incorrect capacity calculation that would have succeeded if we had
|
||||
just received an update on the stats of the backend, we have drivers that are
|
||||
reporting incorrect data on the stats, and we have volumes that cannot be
|
||||
created when they should be allowed to.
|
||||
|
||||
Before going any further we first need to define the terms we'll be using to
|
||||
ensure they hold the same meaning for all of us, as this is the source of some
|
||||
of our current issues.
|
||||
|
||||
Disagreement on the mapping of these terms and their description is
|
||||
understandable, but for the sake of understanding each other we'll hold below
|
||||
descriptions as true since they were defined as such in our specs and most of
|
||||
our code.
|
||||
|
||||
Improvements on the word used for the terms and field names can be discussed at
|
||||
another time and updated in the specs, documentation, and code accordingly.
|
||||
|
||||
For the sake of completeness and to remove any misunderstandings that could
|
||||
lead to different implementations on the drivers, which we currently have, the
|
||||
descriptions will also include some clarifications and examples that may
|
||||
reference current Cinder code.
|
||||
|
||||
Terminology
|
||||
-----------
|
||||
|
||||
GB:
|
||||
Even though this is formally known as the symbol representation of a gigabyte
|
||||
-decimal unit of measurement- we will be using it throughout the spec and our
|
||||
code as the symbol for the gibibyte -binary unit of measurement as defined by
|
||||
the International Electrotechnical Commission (IEC), with symbol GiB-, so
|
||||
when we talk about 1GB we are talking about 1024MB, and the same applies to
|
||||
TB and MB.
|
||||
|
||||
Total capacity:
|
||||
It is the total physical capacity that would be available in the storage
|
||||
array's pool being used by Cinder if no volumes were present.
|
||||
|
||||
This is currently being reported by the drivers as `total_capacity_gb` and,
|
||||
as the name indicates, should be reported in GB and with a precision no
|
||||
greater than 2 decimals.
|
||||
|
||||
If the storage array has 5TB of space but the pool used in Cinder is limited
|
||||
to 1TB then the driver should be reporting a `total_capacity_gb` of 1024GB.
|
||||
|
||||
Volume size:
|
||||
It is the maximum physical size that a volume can take in the storage array.
|
||||
|
||||
This is referenced throughout the code as `volume_size`.
|
||||
|
||||
For a thick volume the `volume_size` will be the same as the free capacity we
|
||||
lost when it was provisioned, whereas for a thin volume it will be greater
|
||||
than the space used for the volume in the storage array until the volume gets
|
||||
completely full.
|
||||
|
||||
Free capacity:
|
||||
It is the current physical capacity available in the storage array's pool
|
||||
being used by Cinder. The number and volume sizes of the thin and thick
|
||||
volumes that have been provisioned by Cinder or directly in the storage array
|
||||
are irrelevant here.
|
||||
|
||||
This is currently being reported by the drivers as `free_capacity_gb` and, as
|
||||
the name indicates, should be reported in GB and with a precision no greater
|
||||
than 2 decimals.
|
||||
|
||||
If the storage array has 5TB of space with a total of 3TB available for all
|
||||
its pools but Cinder is using a pool that has a limit of 1TB of which it has
|
||||
already used 400GB and someone has manually created volumes outside of Cinder
|
||||
that are currently using 124GB of space, then the driver should be reporting
|
||||
a `free_capacity_gb` of 500GB (1TB = 1024GB = 400GB + 124GB + 500GB).
|
||||
|
||||
Provisioned capacity:
|
||||
The amount of capacity that would be used in the storage array's pool being
|
||||
used by Cinder if all the volumes present in there were completely full.
|
||||
|
||||
This is currently being reported by the drivers as `provisioned_capacity_gb`
|
||||
and, as the name indicates, should be reported in GB and with a precision no
|
||||
greater than 2 decimals. This is a required field and *must always be
|
||||
present*.
|
||||
|
||||
This includes not only volumes created by Cinder but also all other existing
|
||||
volumes in that backend, but *does not include snapshots*.
|
||||
|
||||
Let's expand the earlier example from "free capacity" where 524GB of the
|
||||
available 1TB had already been used, and say that the 124GB that were
|
||||
externally created were all used by 1GB thick volumes, and that Cinder was
|
||||
using the 400GB with 400 thick volumes of 1GB and 20 empty thin volumes of
|
||||
20GB each. In this situation our reported `provisioned_capacity_gb` value
|
||||
should be 924GB ((124 * 1GB) + (400 * 1GB) + (20 * 20GB)).
|
||||
|
||||
If a driver does not report the `provisioned_capacity_gb` data we'll use the
|
||||
automatically calculated `allocated_capacity_gb` as described below.
|
||||
|
||||
Allocated capacity:
|
||||
Contrary to what the name may suggest this is not referring to the
|
||||
"allocated" space on the storage array, but to the provisioned volumes
|
||||
created by this specific Cinder Volume backend process on the storage array's
|
||||
pool being used by Cinder and that still present.
|
||||
|
||||
Important to notice that this refers to a specific service backend, so if you
|
||||
are running a multi-backend Cinder service or multiple Cinder Volume services
|
||||
where you have more than one backend configured to use the same storage
|
||||
array's pool, then each one of these backends will only be reporting the
|
||||
sum of the `volume_size` of the volumes they created and not the sum of all
|
||||
the `volume_size` of the volumes that have been created by a Cinder service.
|
||||
|
||||
This is currently being reported by the Volume service as
|
||||
`allocated_capacity_gb` and, as the name indicates, should be reported in GB.
|
||||
|
||||
For two volumes had been created, one thick and one thin, each one of 1GB,
|
||||
then you'll be reporting 2GB as `allocated_capacity_gb`, but if you were to
|
||||
unmanage one of those volumes then you would only be reporting 1GB, even if
|
||||
the volume is still there and will still be counted in the
|
||||
`provisioned_capacity_gb`.
|
||||
|
||||
This field is calculated directly by the Cinder core code and drivers should
|
||||
not calculate or report this information on their `get_volume_stats` method.
|
||||
|
||||
Over subscription ratio:
|
||||
It is the maximum ratio between the "provisioned capacity" and the "total
|
||||
capacity" represented as a real number. A ratio of 1.0 means that the
|
||||
"provisioned capacity" cannot exceed the "total capacity" whereas a value of
|
||||
5.0 means that the Cinder backend is allowed to create as much as 5 times the
|
||||
"total capacity" of the storage array's pool in volumes.
|
||||
|
||||
This will only have effect when a thin provisioned volume is being created,
|
||||
and will be ignored for thick provisioned.
|
||||
|
||||
This is currently being reported by the drivers as
|
||||
`max_over_subscription_ratio` with a greater or equal value to 1.0,
|
||||
preferably with no more than a 2 decimal precision.
|
||||
|
||||
This value is optional, and when missing from the driver's status report the
|
||||
value defined in the `[DEFAULT]` section on the Cinder scheduler receiving
|
||||
the request will be used. So vendors should make sure that they are
|
||||
correctly returning this value in their drivers if they support thin
|
||||
provisioning and admins should make sure they have a consistent default value
|
||||
of the `max_over_subscription_ratio` across all scheduler nodes.
|
||||
|
||||
Note that this ratio is per backend or per pool depending on driver
|
||||
implementation.
|
||||
|
||||
Reserved percentage:
|
||||
Represents the percentage of the storage array's "total capacity" that is
|
||||
reserved and should not be used for calculations. It is represented by an
|
||||
integer value going from 0 up to 100.
|
||||
|
||||
This is currently being reported by the drivers as
|
||||
`reserved_percentage` with a greater or equal value to 1.0, preferably
|
||||
with no more than a 2 decimal precision.
|
||||
|
||||
Default value is 0 if the field is missing in the status report from the
|
||||
backend or if the user has not defined it in the backend's Cinder
|
||||
configuration. This is per backend or per pool depending on driver
|
||||
implementation.
|
||||
|
||||
Provisioning support:
|
||||
Cinder backends may support up to two different types of provisioning, *thin*
|
||||
and *thick* and drivers are expected to indicate as capable of one of them at
|
||||
least in their capabilities report.
|
||||
|
||||
The way to report support for these is setting to true the boolean fields
|
||||
`thin_provisioning_support` and/or `thick_provisioning_support`. And non
|
||||
reported provisioning types will default to false.
|
||||
|
||||
A Cinder backend may support both provisioning types at the same time.
|
||||
|
||||
Volume provisioning type:
|
||||
For Cinder backends that only support one of the provisioning types all
|
||||
volumes created on them will be of that type, and we can use the volume
|
||||
type's extra specs to make the scheduler filter out backends not supporting a
|
||||
specific provisioning type:
|
||||
|
||||
- 'thin_provisioning_support': '<is> True' or '<is> False'
|
||||
- 'thick_provisioning_support': '<is> True' or '<is> False'
|
||||
|
||||
But if our deployment is using a backend that is supporting both provisioning
|
||||
types simultaneously we need to be explicit about the type of provisioning we
|
||||
want for a volume using the volume type's extra spec `provisioning:type` and
|
||||
setting it to `thin` or `thick`.
|
||||
|
||||
If no `provisioning:type` is defined for a volume it will default to thin if
|
||||
the backend is capable of it, and the driver is expected to honor this
|
||||
assumption.
|
||||
|
||||
Incorrect reports
|
||||
-----------------
|
||||
|
||||
Given above terms which were originally defined in their corresponding specs,
|
||||
even if there may be additional comments in this one, we can determine that
|
||||
there are a good number of Cinder drivers that do not follow these definitions
|
||||
and are reporting what would be incorrect values.
|
||||
|
||||
Reporting incorrect values means that on a heterogeneous cloud you'll have
|
||||
inconsistent scheduling and an admin will not be able to make sense of the
|
||||
stats from the volumes.
|
||||
|
||||
To illustrate this here are some of the interpretations we can see across
|
||||
different drivers for the `provisioned_capacity_gb`:
|
||||
|
||||
* Sum of all the volumes' max sizes, which is correct.
|
||||
* Sum of all the volumes' physical disk usage, which is wrong.
|
||||
* Sum of the Cinder volumes' physical disk usage, which is wrong.
|
||||
|
||||
And something similar happens with the `allocated_capacity_gb` where drivers go
|
||||
and report the value directly instead of letting the Cinder core code take care
|
||||
of it. Drivers have been known to report here the following information:
|
||||
|
||||
* Sum of the Cinder volumes' physical disk usage, which is correct.
|
||||
* Sum of the physical disk usage, which is wrong.
|
||||
* Sum of all the volumes' max sizes, which is wrong.
|
||||
|
||||
Provisioning calculations
|
||||
-------------------------
|
||||
|
||||
Some of the creation failures are based on the `provisioned_capacity_gb` value
|
||||
being wrong, but there are other cases where Cinder's calculations for over
|
||||
provisioning do not match industry's standard definition, which for some admins
|
||||
create confusion and undesired behavior.
|
||||
|
||||
Standard provisioning calculation to check if a volume of `volume_size` fits
|
||||
is::
|
||||
|
||||
((provisioned_capacity_gb + volume_size) <=
|
||||
(total_capacity_gb
|
||||
x (1 - (reserved_percentage / 100.0))
|
||||
x max_over_subscription_ratio))
|
||||
|
||||
Whereas the Cinder calculations, which were agreed on as the best calculations
|
||||
for being considered safer are::
|
||||
|
||||
(volume_size <=
|
||||
(free_capacity_gb
|
||||
- (total_capacity_gb x reserved_percentage / 100.0))
|
||||
x max_over_subscription_ratio)
|
||||
|
||||
|
||||
Calculating max over subscription ratio
|
||||
---------------------------------------
|
||||
|
||||
Most deployments have very dynamic workloads each with different physical
|
||||
storage requirements, which means that one month we may require many volumes
|
||||
of which we barely use any space and next month we may require fewer volumes
|
||||
but use most of the provisioned capacity.
|
||||
|
||||
This makes it almost impossible to accurately model our storage requirements
|
||||
at deployment time, which is precisely when we have to set the
|
||||
`max_over_subscription_ratio` for our Cinder backends.
|
||||
|
||||
As requirements change one option would be to change the configuration and
|
||||
restart our Cinder Volume services, but since Cinder is also in the data path
|
||||
the restart may take a long time to do and will have a considerable impact on
|
||||
our cloud users.
|
||||
|
||||
Not being able to determine beforehand the best `max_over_subscription_ratio`
|
||||
and not being able to easily restart the Cinder service is a common pain that
|
||||
most operators have with backends supporting thin provisioning.
|
||||
|
||||
Use Cases
|
||||
=========
|
||||
|
||||
The basic case for fixing the status report is where we would like to have
|
||||
consistent reporting from our backends for the admins to see in the logs and
|
||||
for the scheduler to use.
|
||||
|
||||
Any operator using thin provisioning storage that wants to optimize their
|
||||
storage usage and dynamically adjust to the dynamic requirements of its cloud.
|
||||
|
||||
As for the alternative calculations it would greatly benefit any backend that
|
||||
is close to their full capacity or one that is creating huge volumes that
|
||||
usually never get filled in.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Incorrect reports
|
||||
-----------------
|
||||
|
||||
Since we have consolidated all the documentation in one place, this spec, where
|
||||
we clearly state expected driver behavior all driver maintainers will be urged
|
||||
to make their drivers compliant with it. This will mean adapt their drivers
|
||||
to follow this document's definition for `provisioned_capacity_gb` and stop
|
||||
reporting the `allocated_capacity_gb` field.
|
||||
|
||||
Automatic over subscription ratio calculation
|
||||
---------------------------------------------
|
||||
|
||||
To allow automatic over subscription ratio calculation we will add a new
|
||||
configuration option named `auto_max_over_subscription_ratio` that will
|
||||
instruct Cinder to use configured `max_over_subscription_ratio` as a starting
|
||||
reference when the backend is empty and then, when there is data calculate the
|
||||
current value on each driver stats report with the following formula::
|
||||
|
||||
adjusted_total = `total_capacity_gb` x (1 - (`reserved_percentage` / 100.0))
|
||||
ratio = `provisioned_capacity_gb` / adjusted_total - `free_capacity_gb`
|
||||
|
||||
If the driver is not reporting `provisioned_capacity_gb` then we'll proceed to
|
||||
use the `allocated_capacity_gb` instead::
|
||||
|
||||
adjusted_total = `total_capacity_gb` x (1 - (`reserved_percentage` / 100.0))
|
||||
ratio = `allocated_capacity_gb` / adjusted_total - `free_capacity_gb`
|
||||
|
||||
This new configuration option will be independent of the drivers and it will be
|
||||
part of Cinder's core code, so if `auto_max_over_subscription_ratio` is not
|
||||
defined or set to `False` then Cinder will continue behaving as it is now
|
||||
(returning the `max_over_subscription_ratio` that is reported by the driver or
|
||||
the one configured by default if not present). But if it is set to True it
|
||||
will always return calculated ratio as explained.
|
||||
|
||||
There are a couple of drivers that are already doing this, Pure and Kaminario's
|
||||
K2, but with different configuration options,
|
||||
`pure_automatic_max_oversubscription_ratio` and
|
||||
`auto_calc_max_oversubscription_ratio` respectively, so we'll deprecate those
|
||||
configuration options and remove the code within those drivers when we add the
|
||||
generic code to Cinder.
|
||||
|
||||
Provisioning calculations
|
||||
-------------------------
|
||||
|
||||
Instead of keep fighting with admins and developers on which one of the
|
||||
approaches is best -standard calculation or Cinder's- we will be adding a new
|
||||
configuration option called `over_provisioning_calculation` which will take
|
||||
values `standard` and `cinder` and default to `cinder` for backward
|
||||
compatibility and that will be used by the `CapacityFilter` to determine which
|
||||
one of the mechanism to use.
|
||||
|
||||
This configuration option will also affect `CapacityWeigher` as it will need to
|
||||
do the free space calculation according to the standard definition as well.
|
||||
|
||||
As one can assume thick provisioning will have no modifications on its
|
||||
behavior.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
* Don't support standard over-provisioning calculations.
|
||||
* Instead of modifying `CapacityFilter` and `CapacityWeigher` create 2 new
|
||||
classes.
|
||||
* Instead of adding the `over_provisioning_calculation` configuration option
|
||||
make the filter use the options JSON file provided by
|
||||
`scheduler_json_config_location` . This data seems to be currently missing
|
||||
on some of the operations like migrate, extend, so that would need to
|
||||
changed.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
N/A
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
The only affected API will be the `get_pools` API that will be able to return
|
||||
the 2 new fields, `total_used_capacity_gb` and `cinder_used_capacity_gb`, when
|
||||
they are being reported by the driver's `get_volume_stats` method. Fields will
|
||||
not be present if the drivers are not reporting them.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
N/A
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
N/A
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
The user may see new fields when calling cinderclient's `get_pools`.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Depending on the driver and the storage array, performance could increase or
|
||||
decrease, since getting provisioned sizes instead of physical sizes could be
|
||||
faster or slower.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
With the change of values returned by Cinder backends for
|
||||
`allocated_capacity_gb` and `provisioned_capacity_gb` we may experience
|
||||
failures on creating volumes until we correct the values of
|
||||
`reserved_percentage` and `max_over_subscription_ratio` in our cloud to the
|
||||
right values, since we may have been using incorrect ones.
|
||||
|
||||
Two new configuration options will be added:
|
||||
|
||||
* `auto_max_over_subscription_ratio`: Boolean value that will instruct Cinder
|
||||
to automatically calculate the over subscription ratio based on current usage
|
||||
instead of using a fixed value.
|
||||
|
||||
* `over_provisioning_calculation`: Will allow to select what kind of
|
||||
calculations the `CapacityFilter` does to determine if there is space for a
|
||||
volume in a backend. Acceptable values are `standard` and `cinder`. Default
|
||||
values will be `cinder`.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
Driver maintainer will need to verify, and fix if necessary, their stat reports
|
||||
for `allocated_capacity_gb` and `provisioned_capacity_gb` unless they start
|
||||
using the new `auto_max_over_subscription_ratio` configuration option.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
None
|
||||
|
||||
Other contributors:
|
||||
None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* File bugs for drivers that are not in compliance.
|
||||
* Fix drivers stat reporting.
|
||||
* Add support for the 3 new fields, `provisioned_capacity_precission`,
|
||||
`total_used_capacity_gb` and `cinder_used_capacity_gb` in the scheduler, the
|
||||
`get_pools` API, and the client.
|
||||
* Modify `CapacityFilter` to support the standard over-provisioning
|
||||
calculation.
|
||||
* Modify `CapacityWeigher` to support standard over-provisioning calculations.
|
||||
* Add to the volume manager the estimation mechanism for drivers that don't
|
||||
report `provisioned_capacity_gb`.
|
||||
* Update all the developers reference docs to ensure that there is no more
|
||||
confusion on what the report stats need to return, and make sure that the
|
||||
wiki page on how to contribute a driver links to that documentation
|
||||
explaining the importance of following it when writing the driver.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
N/A
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
New unit tests will be added to test the changed code.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Since our current documentation is lacking in this aspect this will add and
|
||||
update it to reflect what's expected of the driver in the stats reports.
|
||||
|
||||
End user documentation should also be updated.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] https://specs.openstack.org/openstack/cinder-specs/specs/kilo/over-subscription-in-thin-provisioning.html
|
||||
.. [2] https://specs.openstack.org/openstack/cinder-specs/specs/newton/differentiate-thick-thin-in-scheduler.html
|
||||
.. [3] https://specs.openstack.org/openstack/cinder-specs/specs/liberty/standard-capabilities.html
|
Loading…
x
Reference in New Issue
Block a user