![Gorka Eguileor](/assets/img/avatar_default.png)
During the implementation phase of the Provisioning Improvements specs an error in the calculation formula of the merged spec was discovered, so the patch used the formula from a previous revision of the spec patch. The implementation also diverged in not adding a new configuration option but overloading the existing one to support the `auto` option. This patch updates the Provisioning Improvements specs to match the implementation from change id If30bb6276f58532c0f78ac268544a8804008770e Change-Id: Id4b7aa252ad0404d3038a1279e0d673a0809c7e8
497 lines
20 KiB
ReStructuredText
497 lines
20 KiB
ReStructuredText
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License.
|
|
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
=========================
|
|
Provisioning Improvements
|
|
=========================
|
|
|
|
Include the URL of your launchpad blueprint:
|
|
|
|
https://blueprints.launchpad.net/cinder/+spec/provisioning-improvements
|
|
|
|
Cinder provisioning is still a source of pain for everyone: end users, admins,
|
|
and developers. Multiple factors have contributed to our current situation,
|
|
like the information being dispersed in different specs, developer's reference,
|
|
and even the code, but we also have cases of misinterpreted documentation,
|
|
documentation not keeping up with the project's evolution, and even misleading
|
|
or incorrect documentation.
|
|
|
|
This spec will build on those that preceded it on the same topic [1]_ [2]_ and
|
|
others related to the subject [3]_ to bring a consolidated and updated view on
|
|
the matter as well as add a some minor improvements and fix some issues in
|
|
hopes that we can provide a better experience for all involved.
|
|
|
|
|
|
Problem description
|
|
===================
|
|
|
|
Our current situation is quite chaotic, we have volume creations that fail
|
|
based on an incorrect capacity calculation that would have succeeded if we had
|
|
just received an update on the stats of the backend, we have drivers that are
|
|
reporting incorrect data on the stats, and we have volumes that cannot be
|
|
created when they should be allowed to.
|
|
|
|
Before going any further we first need to define the terms we'll be using to
|
|
ensure they hold the same meaning for all of us, as this is the source of some
|
|
of our current issues.
|
|
|
|
Disagreement on the mapping of these terms and their description is
|
|
understandable, but for the sake of understanding each other we'll hold below
|
|
descriptions as true since they were defined as such in our specs and most of
|
|
our code.
|
|
|
|
Improvements on the word used for the terms and field names can be discussed at
|
|
another time and updated in the specs, documentation, and code accordingly.
|
|
|
|
For the sake of completeness and to remove any misunderstandings that could
|
|
lead to different implementations on the drivers, which we currently have, the
|
|
descriptions will also include some clarifications and examples that may
|
|
reference current Cinder code.
|
|
|
|
Terminology
|
|
-----------
|
|
|
|
GB:
|
|
Even though this is formally known as the symbol representation of a gigabyte
|
|
-decimal unit of measurement- we will be using it throughout the spec and our
|
|
code as the symbol for the gibibyte -binary unit of measurement as defined by
|
|
the International Electrotechnical Commission (IEC), with symbol GiB-, so
|
|
when we talk about 1GB we are talking about 1024MB, and the same applies to
|
|
TB and MB.
|
|
|
|
Total capacity:
|
|
It is the total physical capacity that would be available in the storage
|
|
array's pool being used by Cinder if no volumes were present.
|
|
|
|
This is currently being reported by the drivers as `total_capacity_gb` and,
|
|
as the name indicates, should be reported in GB and with a precision no
|
|
greater than 2 decimals.
|
|
|
|
If the storage array has 5TB of space but the pool used in Cinder is limited
|
|
to 1TB then the driver should be reporting a `total_capacity_gb` of 1024GB.
|
|
|
|
Volume size:
|
|
It is the maximum physical size that a volume can take in the storage array.
|
|
|
|
This is referenced throughout the code as `volume_size`.
|
|
|
|
For a thick volume the `volume_size` will be the same as the free capacity we
|
|
lost when it was provisioned, whereas for a thin volume it will be greater
|
|
than the space used for the volume in the storage array until the volume gets
|
|
completely full.
|
|
|
|
Free capacity:
|
|
It is the current physical capacity available in the storage array's pool
|
|
being used by Cinder. The number and volume sizes of the thin and thick
|
|
volumes that have been provisioned by Cinder or directly in the storage array
|
|
are irrelevant here.
|
|
|
|
This is currently being reported by the drivers as `free_capacity_gb` and, as
|
|
the name indicates, should be reported in GB and with a precision no greater
|
|
than 2 decimals.
|
|
|
|
If the storage array has 5TB of space with a total of 3TB available for all
|
|
its pools but Cinder is using a pool that has a limit of 1TB of which it has
|
|
already used 400GB and someone has manually created volumes outside of Cinder
|
|
that are currently using 124GB of space, then the driver should be reporting
|
|
a `free_capacity_gb` of 500GB (1TB = 1024GB = 400GB + 124GB + 500GB).
|
|
|
|
Provisioned capacity:
|
|
The amount of capacity that would be used in the storage array's pool being
|
|
used by Cinder if all the volumes present in there were completely full.
|
|
|
|
This is currently being reported by the drivers as `provisioned_capacity_gb`
|
|
and, as the name indicates, should be reported in GB and with a precision no
|
|
greater than 2 decimals. This is a required field and *must always be
|
|
present*.
|
|
|
|
This includes not only volumes created by Cinder but also all other existing
|
|
volumes in that backend, but *does not include snapshots*.
|
|
|
|
Let's expand the earlier example from "free capacity" where 524GB of the
|
|
available 1TB had already been used, and say that the 124GB that were
|
|
externally created were all used by 1GB thick volumes, and that Cinder was
|
|
using the 400GB with 400 thick volumes of 1GB and 20 empty thin volumes of
|
|
20GB each. In this situation our reported `provisioned_capacity_gb` value
|
|
should be 924GB ((124 * 1GB) + (400 * 1GB) + (20 * 20GB)).
|
|
|
|
If a driver does not report the `provisioned_capacity_gb` data we'll use the
|
|
automatically calculated `allocated_capacity_gb` as described below.
|
|
|
|
Allocated capacity:
|
|
Contrary to what the name may suggest this is not referring to the
|
|
"allocated" space on the storage array, but to the provisioned volumes
|
|
created by this specific Cinder Volume backend process on the storage array's
|
|
pool being used by Cinder and that still present.
|
|
|
|
Important to notice that this refers to a specific service backend, so if you
|
|
are running a multi-backend Cinder service or multiple Cinder Volume services
|
|
where you have more than one backend configured to use the same storage
|
|
array's pool, then each one of these backends will only be reporting the
|
|
sum of the `volume_size` of the volumes they created and not the sum of all
|
|
the `volume_size` of the volumes that have been created by a Cinder service.
|
|
|
|
This is currently being reported by the Volume service as
|
|
`allocated_capacity_gb` and, as the name indicates, should be reported in GB.
|
|
|
|
For two volumes had been created, one thick and one thin, each one of 1GB,
|
|
then you'll be reporting 2GB as `allocated_capacity_gb`, but if you were to
|
|
unmanage one of those volumes then you would only be reporting 1GB, even if
|
|
the volume is still there and will still be counted in the
|
|
`provisioned_capacity_gb`.
|
|
|
|
This field is calculated directly by the Cinder core code and drivers should
|
|
not calculate or report this information on their `get_volume_stats` method.
|
|
|
|
Over subscription ratio:
|
|
It is the maximum ratio between the "provisioned capacity" and the "total
|
|
capacity" represented as a real number. A ratio of 1.0 means that the
|
|
"provisioned capacity" cannot exceed the "total capacity" whereas a value of
|
|
5.0 means that the Cinder backend is allowed to create as much as 5 times the
|
|
"total capacity" of the storage array's pool in volumes.
|
|
|
|
This will only have effect when a thin provisioned volume is being created,
|
|
and will be ignored for thick provisioned.
|
|
|
|
This is currently being reported by the drivers as
|
|
`max_over_subscription_ratio` with a greater or equal value to 1.0,
|
|
preferably with no more than a 2 decimal precision.
|
|
|
|
This value is optional, and when missing from the driver's status report the
|
|
value defined in the `[DEFAULT]` section on the Cinder scheduler receiving
|
|
the request will be used. So vendors should make sure that they are
|
|
correctly returning this value in their drivers if they support thin
|
|
provisioning and admins should make sure they have a consistent default value
|
|
of the `max_over_subscription_ratio` across all scheduler nodes.
|
|
|
|
Note that this ratio is per backend or per pool depending on driver
|
|
implementation.
|
|
|
|
Reserved percentage:
|
|
Represents the percentage of the storage array's "total capacity" that is
|
|
reserved and should not be used for calculations. It is represented by an
|
|
integer value going from 0 up to 100.
|
|
|
|
This is currently being reported by the drivers as
|
|
`reserved_percentage` with a greater or equal value to 1.0, preferably
|
|
with no more than a 2 decimal precision.
|
|
|
|
Default value is 0 if the field is missing in the status report from the
|
|
backend or if the user has not defined it in the backend's Cinder
|
|
configuration. This is per backend or per pool depending on driver
|
|
implementation.
|
|
|
|
Provisioning support:
|
|
Cinder backends may support up to two different types of provisioning, *thin*
|
|
and *thick* and drivers are expected to indicate as capable of one of them at
|
|
least in their capabilities report.
|
|
|
|
The way to report support for these is setting to true the boolean fields
|
|
`thin_provisioning_support` and/or `thick_provisioning_support`. And non
|
|
reported provisioning types will default to false.
|
|
|
|
A Cinder backend may support both provisioning types at the same time.
|
|
|
|
Volume provisioning type:
|
|
For Cinder backends that only support one of the provisioning types all
|
|
volumes created on them will be of that type, and we can use the volume
|
|
type's extra specs to make the scheduler filter out backends not supporting a
|
|
specific provisioning type:
|
|
|
|
- 'thin_provisioning_support': '<is> True' or '<is> False'
|
|
- 'thick_provisioning_support': '<is> True' or '<is> False'
|
|
|
|
But if our deployment is using a backend that is supporting both provisioning
|
|
types simultaneously we need to be explicit about the type of provisioning we
|
|
want for a volume using the volume type's extra spec `provisioning:type` and
|
|
setting it to `thin` or `thick`.
|
|
|
|
If no `provisioning:type` is defined for a volume it will default to thin if
|
|
the backend is capable of it, and the driver is expected to honor this
|
|
assumption.
|
|
|
|
Incorrect reports
|
|
-----------------
|
|
|
|
Given above terms which were originally defined in their corresponding specs,
|
|
even if there may be additional comments in this one, we can determine that
|
|
there are a good number of Cinder drivers that do not follow these definitions
|
|
and are reporting what would be incorrect values.
|
|
|
|
Reporting incorrect values means that on a heterogeneous cloud you'll have
|
|
inconsistent scheduling and an admin will not be able to make sense of the
|
|
stats from the volumes.
|
|
|
|
To illustrate this here are some of the interpretations we can see across
|
|
different drivers for the `provisioned_capacity_gb`:
|
|
|
|
* Sum of all the volumes' max sizes, which is correct.
|
|
* Sum of all the volumes' physical disk usage, which is wrong.
|
|
* Sum of the Cinder volumes' physical disk usage, which is wrong.
|
|
|
|
And something similar happens with the `allocated_capacity_gb` where drivers go
|
|
and report the value directly instead of letting the Cinder core code take care
|
|
of it. Drivers have been known to report here the following information:
|
|
|
|
* Sum of the Cinder volumes' physical disk usage, which is correct.
|
|
* Sum of the physical disk usage, which is wrong.
|
|
* Sum of all the volumes' max sizes, which is wrong.
|
|
|
|
Provisioning calculations
|
|
-------------------------
|
|
|
|
Some of the creation failures are based on the `provisioned_capacity_gb` value
|
|
being wrong, but there are other cases where Cinder's calculations for over
|
|
provisioning do not match industry's standard definition, which for some admins
|
|
create confusion and undesired behavior.
|
|
|
|
Standard provisioning calculation to check if a volume of `volume_size` fits
|
|
is::
|
|
|
|
((provisioned_capacity_gb + volume_size) <=
|
|
(total_capacity_gb
|
|
x (1 - (reserved_percentage / 100.0))
|
|
x max_over_subscription_ratio))
|
|
|
|
Whereas the Cinder calculations, which were agreed on as the best calculations
|
|
for being considered safer are::
|
|
|
|
(volume_size <=
|
|
(free_capacity_gb
|
|
- (total_capacity_gb x reserved_percentage / 100.0))
|
|
x max_over_subscription_ratio)
|
|
|
|
|
|
Calculating max over subscription ratio
|
|
---------------------------------------
|
|
|
|
Most deployments have very dynamic workloads each with different physical
|
|
storage requirements, which means that one month we may require many volumes
|
|
of which we barely use any space and next month we may require fewer volumes
|
|
but use most of the provisioned capacity.
|
|
|
|
This makes it almost impossible to accurately model our storage requirements
|
|
at deployment time, which is precisely when we have to set the
|
|
`max_over_subscription_ratio` for our Cinder backends.
|
|
|
|
As requirements change one option would be to change the configuration and
|
|
restart our Cinder Volume services, but since Cinder is also in the data path
|
|
the restart may take a long time to do and will have a considerable impact on
|
|
our cloud users.
|
|
|
|
Not being able to determine beforehand the best `max_over_subscription_ratio`
|
|
and not being able to easily restart the Cinder service is a common pain that
|
|
most operators have with backends supporting thin provisioning.
|
|
|
|
Use Cases
|
|
=========
|
|
|
|
The basic case for fixing the status report is where we would like to have
|
|
consistent reporting from our backends for the admins to see in the logs and
|
|
for the scheduler to use.
|
|
|
|
Any operator using thin provisioning storage that wants to optimize their
|
|
storage usage and dynamically adjust to the dynamic requirements of its cloud.
|
|
|
|
As for the alternative calculations it would greatly benefit any backend that
|
|
is close to their full capacity or one that is creating huge volumes that
|
|
usually never get filled in.
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
Incorrect reports
|
|
-----------------
|
|
|
|
Since we have consolidated all the documentation in one place, this spec, where
|
|
we clearly state expected driver behavior all driver maintainers will be urged
|
|
to make their drivers compliant with it. This will mean adapt their drivers
|
|
to follow this document's definition for `provisioned_capacity_gb` and stop
|
|
reporting the `allocated_capacity_gb` field.
|
|
|
|
Automatic over subscription ratio calculation
|
|
---------------------------------------------
|
|
|
|
To allow automatic over subscription ratio calculation we will make existing
|
|
configuration option `max_over_subscription_ratio` into a polymorphic option
|
|
that accepts not only integers and floats, but also a new string value `auto`
|
|
that will instruct Cinder to use our default value of 20 as the starting
|
|
reference when the backend is empty and then, when there is data calculate the
|
|
current value on each driver stats report with the following formula::
|
|
|
|
ratio = 1 + (`provisioned_capacity_gb` /
|
|
(`total_capacity_gb` - `free_capacity_gb` + 1))
|
|
|
|
If the driver is not reporting `provisioned_capacity_gb` then we'll proceed to
|
|
use the `allocated_capacity_gb` instead::
|
|
|
|
ratio = 1 + (`allocated_capacity_gb` /
|
|
(`total_capacity_gb` - `free_capacity_gb` + 1))
|
|
|
|
We are not considering the reserved capacity in this formula because the
|
|
scheduler's capacity filter already takes it into consideration when
|
|
calculating the virtual free capacity.
|
|
|
|
There are a couple of drivers that are already doing this, Pure and Kaminario's
|
|
K2, but with different configuration options,
|
|
`pure_automatic_max_oversubscription_ratio` and
|
|
`auto_calc_max_oversubscription_ratio` respectively, so those configuration
|
|
options will take precedence, if configured, over this new feature, but we'll
|
|
encourage vendors to deprecate those options.
|
|
|
|
Drivers that are making use of the `max_over_subscription_ratio` in a non
|
|
compatible way will raise an error and fail to start, logging an appropriate
|
|
message if configured with this new feature.
|
|
|
|
|
|
Provisioning calculations
|
|
-------------------------
|
|
|
|
Instead of keep fighting with admins and developers on which one of the
|
|
approaches is best -standard calculation or Cinder's- we will be adding a new
|
|
configuration option called `over_provisioning_calculation` which will take
|
|
values `standard` and `cinder` and default to `cinder` for backward
|
|
compatibility and that will be used by the `CapacityFilter` to determine which
|
|
one of the mechanism to use.
|
|
|
|
This configuration option will also affect `CapacityWeigher` as it will need to
|
|
do the free space calculation according to the standard definition as well.
|
|
|
|
As one can assume thick provisioning will have no modifications on its
|
|
behavior.
|
|
|
|
Alternatives
|
|
------------
|
|
|
|
* Don't support standard over-provisioning calculations.
|
|
* Instead of modifying `CapacityFilter` and `CapacityWeigher` create 2 new
|
|
classes.
|
|
* Instead of adding the `over_provisioning_calculation` configuration option
|
|
make the filter use the options JSON file provided by
|
|
`scheduler_json_config_location` . This data seems to be currently missing
|
|
on some of the operations like migrate, extend, so that would need to
|
|
changed.
|
|
|
|
Data model impact
|
|
-----------------
|
|
|
|
N/A
|
|
|
|
REST API impact
|
|
---------------
|
|
|
|
The only affected API will be the `get_pools` API that will be able to return
|
|
the 2 new fields, `total_used_capacity_gb` and `cinder_used_capacity_gb`, when
|
|
they are being reported by the driver's `get_volume_stats` method. Fields will
|
|
not be present if the drivers are not reporting them.
|
|
|
|
Security impact
|
|
---------------
|
|
|
|
N/A
|
|
|
|
Notifications impact
|
|
--------------------
|
|
|
|
N/A
|
|
|
|
Other end user impact
|
|
---------------------
|
|
|
|
The user may see new fields when calling cinderclient's `get_pools`.
|
|
|
|
Performance Impact
|
|
------------------
|
|
|
|
Depending on the driver and the storage array, performance could increase or
|
|
decrease, since getting provisioned sizes instead of physical sizes could be
|
|
faster or slower.
|
|
|
|
Other deployer impact
|
|
---------------------
|
|
|
|
With the change of values returned by Cinder backends for
|
|
`allocated_capacity_gb` and `provisioned_capacity_gb` we may experience
|
|
failures on creating volumes until we correct the values of
|
|
`reserved_percentage` and `max_over_subscription_ratio` in our cloud to the
|
|
right values, since we may have been using incorrect ones.
|
|
|
|
A configuration option will be modified:
|
|
|
|
* `max_over_subscription_ratio`: Changes its type from Float to String and
|
|
adds a new possible value, `auto`, that will enable this new feature.
|
|
|
|
A new configuration options will be added:
|
|
|
|
* `over_provisioning_calculation`: Will allow to select what kind of
|
|
calculations the `CapacityFilter` does to determine if there is space for a
|
|
volume in a backend. Acceptable values are `standard` and `cinder`. Default
|
|
values will be `cinder`.
|
|
|
|
|
|
Developer impact
|
|
----------------
|
|
|
|
Driver maintainer will need to verify, and fix if necessary, their stat reports
|
|
for `allocated_capacity_gb` and `provisioned_capacity_gb` and if they are using
|
|
`max_over_subscription_ratio` in a non compatible way they should fix their
|
|
driver so that it supports the feature.
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
None
|
|
|
|
Other contributors:
|
|
None
|
|
|
|
Work Items
|
|
----------
|
|
|
|
* File bugs for drivers that are not in compliance.
|
|
* Fix drivers stat reporting.
|
|
* Add support for the 3 new fields, `provisioned_capacity_precission`,
|
|
`total_used_capacity_gb` and `cinder_used_capacity_gb` in the scheduler, the
|
|
`get_pools` API, and the client.
|
|
* Modify `CapacityFilter` to support the standard over-provisioning
|
|
calculation.
|
|
* Modify `CapacityWeigher` to support standard over-provisioning calculations.
|
|
* Add to the volume manager the estimation mechanism for drivers that don't
|
|
report `provisioned_capacity_gb`.
|
|
* Update all the developers reference docs to ensure that there is no more
|
|
confusion on what the report stats need to return, and make sure that the
|
|
wiki page on how to contribute a driver links to that documentation
|
|
explaining the importance of following it when writing the driver.
|
|
|
|
Dependencies
|
|
============
|
|
|
|
N/A
|
|
|
|
Testing
|
|
=======
|
|
|
|
New unit tests will be added to test the changed code.
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
Since our current documentation is lacking in this aspect this will add and
|
|
update it to reflect what's expected of the driver in the stats reports.
|
|
|
|
End user documentation should also be updated.
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] https://specs.openstack.org/openstack/cinder-specs/specs/kilo/over-subscription-in-thin-provisioning.html
|
|
.. [2] https://specs.openstack.org/openstack/cinder-specs/specs/newton/differentiate-thick-thin-in-scheduler.html
|
|
.. [3] https://specs.openstack.org/openstack/cinder-specs/specs/liberty/standard-capabilities.html
|