cinder-specs/specs/queens/provisioning-improvements.rst

..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.

 http://creativecommons.org/licenses/by/3.0/legalcode

=========================
Provisioning Improvements
=========================

Include the URL of your launchpad blueprint:

https://blueprints.launchpad.net/cinder/+spec/provisioning-improvements

Cinder provisioning is still a source of pain for everyone: end users, admins,
and developers.  Multiple factors have contributed to our current situation,
like the information being dispersed in different specs, developer's reference,
and even the code, but we also have cases of misinterpreted documentation,
documentation not keeping up with the project's evolution, and even misleading
or incorrect documentation.

This spec will build on those that preceded it on the same topic [1]_ [2]_ and
others related to the subject [3]_ to bring a consolidated and updated view on
the matter as well as add a some minor improvements and fix some issues in
hopes that we can provide a better experience for all involved.


Problem description
===================

Our current situation is quite chaotic, we have volume creations that fail
based on an incorrect capacity calculation that would have succeeded if we had
just received an update on the stats of the backend, we have drivers that are
reporting incorrect data on the stats, and we have volumes that cannot be
created when they should be allowed to.

Before going any further we first need to define the terms we'll be using to
ensure they hold the same meaning for all of us, as this is the source of some
of our current issues.

Disagreement on the mapping of these terms and their description is
understandable, but for the sake of understanding each other we'll hold below
descriptions as true since they were defined as such in our specs and most of
our code.

Improvements on the word used for the terms and field names can be discussed at
another time and updated in the specs, documentation, and code accordingly.

For the sake of completeness and to remove any misunderstandings that could
lead to different implementations on the drivers, which we currently have, the
descriptions will also include some clarifications and examples that may
reference current Cinder code.

Terminology
-----------

GB:
  Even though this is formally known as the symbol representation of a gigabyte
  -decimal unit of measurement- we will be using it throughout the spec and our
  code as the symbol for the gibibyte -binary unit of measurement as defined by
  the International Electrotechnical Commission (IEC), with symbol GiB-, so
  when we talk about 1GB we are talking about 1024MB, and the same applies to
  TB and MB.

Total capacity:
  It is the total physical capacity that would be available in the storage
  array's pool being used by Cinder if no volumes were present.

  This is currently being reported by the drivers as `total_capacity_gb` and,
  as the name indicates, should be reported in GB and with a precision no
  greater than 2 decimals.

  If the storage array has 5TB of space but the pool used in Cinder is limited
  to 1TB then the driver should be reporting a `total_capacity_gb` of 1024GB.

Volume size:
  It is the maximum physical size that a volume can take in the storage array.

  This is referenced throughout the code as `volume_size`.

  For a thick volume the `volume_size` will be the same as the free capacity we
  lost when it was provisioned, whereas for a thin volume it will be greater
  than the space used for the volume in the storage array until the volume gets
  completely full.

Free capacity:
  It is the current physical capacity available in the storage array's pool
  being used by Cinder.  The number and volume sizes of the thin and thick
  volumes that have been provisioned by Cinder or directly in the storage array
  are irrelevant here.

  This is currently being reported by the drivers as `free_capacity_gb` and, as
  the name indicates, should be reported in GB and with a precision no greater
  than 2 decimals.

  If the storage array has 5TB of space with a total of 3TB available for all
  its pools but Cinder is using a pool that has a limit of 1TB of which it has
  already used 400GB and someone has manually created volumes outside of Cinder
  that are currently using 124GB of space, then the driver should be reporting
  a `free_capacity_gb` of 500GB (1TB = 1024GB = 400GB + 124GB + 500GB).

Provisioned capacity:
  The amount of capacity that would be used in the storage array's pool being
  used by Cinder if all the volumes present in there were completely full.

  This is currently being reported by the drivers as `provisioned_capacity_gb`
  and, as the name indicates, should be reported in GB and with a precision no
  greater than 2 decimals.  This is a required field and *must always be
  present*.

  This includes not only volumes created by Cinder but also all other existing
  volumes in that backend, but *does not include snapshots*.

  Let's expand the earlier example from "free capacity" where 524GB of the
  available 1TB had already been used, and say that the 124GB that were
  externally created were all used by 1GB thick volumes, and that Cinder was
  using the 400GB with 400 thick volumes of 1GB and 20 empty thin volumes of
  20GB each.  In this situation our reported `provisioned_capacity_gb` value
  should be 924GB ((124 * 1GB) + (400 * 1GB) + (20 * 20GB)).

  If a driver does not report the `provisioned_capacity_gb` data we'll use the
  automatically calculated `allocated_capacity_gb` as described below.

Allocated capacity:
  Contrary to what the name may suggest this is not referring to the
  "allocated" space on the storage array, but to the provisioned volumes
  created by this specific Cinder Volume backend process on the storage array's
  pool being used by Cinder and that still present.

  Important to notice that this refers to a specific service backend, so if you
  are running a multi-backend Cinder service or multiple Cinder Volume services
  where you have more than one backend configured to use the same storage
  array's pool, then each one of these backends will only be reporting the
  sum of the `volume_size` of the volumes they created and not the sum of all
  the `volume_size` of the volumes that have been created by a Cinder service.

  This is currently being reported by the Volume service as
  `allocated_capacity_gb` and, as the name indicates, should be reported in GB.

  For two volumes had been created, one thick and one thin, each one of 1GB,
  then you'll be reporting 2GB as `allocated_capacity_gb`, but if you were to
  unmanage one of those volumes then you would only be reporting 1GB, even if
  the volume is still there and will still be counted in the
  `provisioned_capacity_gb`.

  This field is calculated directly by the Cinder core code and drivers should
  not calculate or report this information on their `get_volume_stats` method.

Over subscription ratio:
  It is the maximum ratio between the "provisioned capacity" and the "total
  capacity" represented as a real number.  A ratio of 1.0 means that the
  "provisioned capacity" cannot exceed the "total capacity" whereas a value of
  5.0 means that the Cinder backend is allowed to create as much as 5 times the
  "total capacity" of the storage array's pool in volumes.

  This will only have effect when a thin provisioned volume is being created,
  and will be ignored for thick provisioned.

  This is currently being reported by the drivers as
  `max_over_subscription_ratio` with a greater or equal value to 1.0,
  preferably with no more than a 2 decimal precision.

  This value is optional, and when missing from the driver's status report the
  value defined in the `[DEFAULT]` section on the Cinder scheduler receiving
  the request will be used.  So vendors should make sure that they are
  correctly returning this value in their drivers if they support thin
  provisioning and admins should make sure they have a consistent default value
  of the `max_over_subscription_ratio` across all scheduler nodes.

  Note that this ratio is per backend or per pool depending on driver
  implementation.

Reserved percentage:
  Represents the percentage of the storage array's "total capacity" that is
  reserved and should not be used for calculations.  It is represented by an
  integer value going from 0 up to 100.

  This is currently being reported by the drivers as
  `reserved_percentage` with a greater or equal value to 1.0, preferably
  with no more than a 2 decimal precision.

  Default value is 0 if the field is missing in the status report from the
  backend or if the user has not defined it in the backend's Cinder
  configuration.  This is per backend or per pool depending on driver
  implementation.

Provisioning support:
  Cinder backends may support up to two different types of provisioning, *thin*
  and *thick* and drivers are expected to indicate as capable of one of them at
  least in their capabilities report.

  The way to report support for these is setting to true the boolean fields
  `thin_provisioning_support` and/or `thick_provisioning_support`.  And non
  reported provisioning types will default to false.

  A Cinder backend may support both provisioning types at the same time.

Volume provisioning type:
  For Cinder backends that only support one of the provisioning types all
  volumes created on them will be of that type, and we can use the volume
  type's extra specs to make the scheduler filter out backends not supporting a
  specific provisioning type:

  - 'thin_provisioning_support': '<is> True' or '<is> False'
  - 'thick_provisioning_support': '<is> True' or '<is> False'

  But if our deployment is using a backend that is supporting both provisioning
  types simultaneously we need to be explicit about the type of provisioning we
  want for a volume using the volume type's extra spec `provisioning:type` and
  setting it to `thin` or `thick`.

  If no `provisioning:type` is defined for a volume it will default to thin if
  the backend is capable of it, and the driver is expected to honor this
  assumption.

Incorrect reports
-----------------

Given above terms which were originally defined in their corresponding specs,
even if there may be additional comments in this one, we can determine that
there are a good number of Cinder drivers that do not follow these definitions
and are reporting what would be incorrect values.

Reporting incorrect values means that on a heterogeneous cloud you'll have
inconsistent scheduling and an admin will not be able to make sense of the
stats from the volumes.

To illustrate this here are some of the interpretations we can see across
different drivers for the `provisioned_capacity_gb`:

* Sum of all the volumes' max sizes, which is correct.
* Sum of all the volumes' physical disk usage, which is wrong.
* Sum of the Cinder volumes' physical disk usage, which is wrong.

And something similar happens with the `allocated_capacity_gb` where drivers go
and report the value directly instead of letting the Cinder core code take care
of it.  Drivers have been known to report here the following information:

* Sum of the Cinder volumes' physical disk usage, which is correct.
* Sum of the physical disk usage, which is wrong.
* Sum of all the volumes' max sizes, which is wrong.

Provisioning calculations
-------------------------

Some of the creation failures are based on the `provisioned_capacity_gb` value
being wrong, but there are other cases where Cinder's calculations for over
provisioning do not match industry's standard definition, which for some admins
create confusion and undesired behavior.

Standard provisioning calculation to check if a volume of `volume_size` fits
is::

  ((provisioned_capacity_gb + volume_size) <=
   (total_capacity_gb
    x (1 - (reserved_percentage / 100.0))
    x max_over_subscription_ratio))

Whereas the Cinder calculations, which were agreed on as the best calculations
for being considered safer are::

  (volume_size <=
   (free_capacity_gb
    - (total_capacity_gb x reserved_percentage / 100.0))
   x max_over_subscription_ratio)


Calculating max over subscription ratio
---------------------------------------

Most deployments have very dynamic workloads each with different physical
storage requirements, which means that one month we may require many volumes
of which we barely use any space and next month we may require fewer volumes
but use most of the provisioned capacity.

This makes it almost impossible to accurately model our storage requirements
at deployment time, which is precisely when we have to set the
`max_over_subscription_ratio` for our Cinder backends.

As requirements change one option would be to change the configuration and
restart our Cinder Volume services, but since Cinder is also in the data path
the restart may take a long time to do and will have a considerable impact on
our cloud users.

Not being able to determine beforehand the best `max_over_subscription_ratio`
and not being able to easily restart the Cinder service is a common pain that
most operators have with backends supporting thin provisioning.

Use Cases
=========

The basic case for fixing the status report is where we would like to have
consistent reporting from our backends for the admins to see in the logs and
for the scheduler to use.

Any operator using thin provisioning storage that wants to optimize their
storage usage and dynamically adjust to the dynamic requirements of its cloud.

As for the alternative calculations it would greatly benefit any backend that
is close to their full capacity or one that is creating huge volumes that
usually never get filled in.

Proposed change
===============

Incorrect reports
-----------------

Since we have consolidated all the documentation in one place, this spec, where
we clearly state expected driver behavior all driver maintainers will be urged
to make their drivers compliant with it.  This will mean adapt their drivers
to follow this document's definition for `provisioned_capacity_gb` and stop
reporting the `allocated_capacity_gb` field.

Automatic over subscription ratio calculation
---------------------------------------------

To allow automatic over subscription ratio calculation we will make existing
configuration option `max_over_subscription_ratio` into a polymorphic option
that accepts not only integers and floats, but also a new string value `auto`
that will instruct Cinder to use our default value of 20 as the starting
reference when the backend is empty and then, when there is data calculate the
current value on each driver stats report with the following formula::

  ratio = 1 + (`provisioned_capacity_gb` /
               (`total_capacity_gb` - `free_capacity_gb` + 1))

If the driver is not reporting `provisioned_capacity_gb` then we'll proceed to
use the `allocated_capacity_gb` instead::

  ratio = 1 + (`allocated_capacity_gb` /
               (`total_capacity_gb` - `free_capacity_gb` + 1))

We are not considering the reserved capacity in this formula because the
scheduler's capacity filter already takes it into consideration when
calculating the virtual free capacity.

There are a couple of drivers that are already doing this, Pure and Kaminario's
K2, but with different configuration options,
`pure_automatic_max_oversubscription_ratio` and
`auto_calc_max_oversubscription_ratio` respectively, so those configuration
options will take precedence, if configured, over this new feature, but we'll
encourage vendors to deprecate those options.

Drivers that are making use of the `max_over_subscription_ratio` in a non
compatible way will raise an error and fail to start, logging an appropriate
message if configured with this new feature.


Provisioning calculations
-------------------------

Instead of keep fighting with admins and developers on which one of the
approaches is best -standard calculation or Cinder's- we will be adding a new
configuration option called `over_provisioning_calculation` which will take
values `standard` and `cinder` and default to `cinder` for backward
compatibility and that will be used by the `CapacityFilter` to determine which
one of the mechanism to use.

This configuration option will also affect `CapacityWeigher` as it will need to
do the free space calculation according to the standard definition as well.

As one can assume thick provisioning will have no modifications on its
behavior.

Alternatives
------------

* Don't support standard over-provisioning calculations.
* Instead of modifying `CapacityFilter` and `CapacityWeigher` create 2 new
  classes.
* Instead of adding the `over_provisioning_calculation` configuration option
  make the filter use the options JSON file provided by
  `scheduler_json_config_location` .  This data seems to be currently missing
  on some of the operations like migrate, extend, so that would need to
  changed.

Data model impact
-----------------

N/A

REST API impact
---------------

The only affected API will be the `get_pools` API that will be able to return
the 2 new fields, `total_used_capacity_gb` and `cinder_used_capacity_gb`, when
they are being reported by the driver's `get_volume_stats` method.  Fields will
not be present if the drivers are not reporting them.

Security impact
---------------

N/A

Notifications impact
--------------------

N/A

Other end user impact
---------------------

The user may see new fields when calling cinderclient's `get_pools`.

Performance Impact
------------------

Depending on the driver and the storage array, performance could increase or
decrease, since getting provisioned sizes instead of physical sizes could be
faster or slower.

Other deployer impact
---------------------

With the change of values returned by Cinder backends for
`allocated_capacity_gb` and `provisioned_capacity_gb` we may experience
failures on creating volumes until we correct the values of
`reserved_percentage` and `max_over_subscription_ratio` in our cloud to the
right values, since we may have been using incorrect ones.

A configuration option will be modified:

* `max_over_subscription_ratio`: Changes its type from Float to String and
  adds a new possible value, `auto`, that will enable this new feature.

A new configuration options will be added:

* `over_provisioning_calculation`: Will allow to select what kind of
  calculations the `CapacityFilter` does to determine if there is space for a
  volume in a backend.  Acceptable values are `standard` and `cinder`.  Default
  values will be `cinder`.


Developer impact
----------------

Driver maintainer will need to verify, and fix if necessary, their stat reports
for `allocated_capacity_gb` and `provisioned_capacity_gb` and if they are using
`max_over_subscription_ratio` in a non compatible way they should fix their
driver so that it supports the feature.

Implementation
==============

Assignee(s)
-----------

Primary assignee:
  None

Other contributors:
  None

Work Items
----------

* File bugs for drivers that are not in compliance.
* Fix drivers stat reporting.
* Add support for the 3 new fields, `provisioned_capacity_precission`,
  `total_used_capacity_gb` and `cinder_used_capacity_gb` in the scheduler, the
  `get_pools` API, and the client.
* Modify `CapacityFilter` to support the standard over-provisioning
  calculation.
* Modify `CapacityWeigher` to support standard over-provisioning calculations.
* Add to the volume manager the estimation mechanism for drivers that don't
  report `provisioned_capacity_gb`.
* Update all the developers reference docs to ensure that there is no more
  confusion on what the report stats need to return, and make sure that the
  wiki page on how to contribute a driver links to that documentation
  explaining the importance of following it when writing the driver.

Dependencies
============

N/A

Testing
=======

New unit tests will be added to test the changed code.

Documentation Impact
====================

Since our current documentation is lacking in this aspect this will add and
update it to reflect what's expected of the driver in the stats reports.

End user documentation should also be updated.

References
==========

.. [1] https://specs.openstack.org/openstack/cinder-specs/specs/kilo/over-subscription-in-thin-provisioning.html
.. [2] https://specs.openstack.org/openstack/cinder-specs/specs/newton/differentiate-thick-thin-in-scheduler.html
.. [3] https://specs.openstack.org/openstack/cinder-specs/specs/liberty/standard-capabilities.html