Provisioning Improvements

Spec to add support for standardized over-provisioning calculations and make all drivers status reports consistent between different backends. Change-Id: Ie709333d75fb8c1c4693eae7d09d7f4925dc3fca
2017-08-02 19:53:34 +02:00 · 2017-08-02 19:53:34 +02:00 · 64703bf1c7
commit 64703bf1c7
parent 24d1cbbee9
1 changed files with 490 additions and 0 deletions
--- a/specs/queens/provisioning-improvements.rst
+++ b/specs/queens/provisioning-improvements.rst
@ -0,0 +1,490 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+=========================
+Provisioning Improvements
+=========================
+
+Include the URL of your launchpad blueprint:
+
+https://blueprints.launchpad.net/cinder/+spec/provisioning-improvements
+
+Cinder provisioning is still a source of pain for everyone: end users, admins,
+and developers.  Multiple factors have contributed to our current situation,
+like the information being dispersed in different specs, developer's reference,
+and even the code, but we also have cases of misinterpreted documentation,
+documentation not keeping up with the project's evolution, and even misleading
+or incorrect documentation.
+
+This spec will build on those that preceded it on the same topic [1]_ [2]_ and
+others related to the subject [3]_ to bring a consolidated and updated view on
+the matter as well as add a some minor improvements and fix some issues in
+hopes that we can provide a better experience for all involved.
+
+
+Problem description
+===================
+
+Our current situation is quite chaotic, we have volume creations that fail
+based on an incorrect capacity calculation that would have succeeded if we had
+just received an update on the stats of the backend, we have drivers that are
+reporting incorrect data on the stats, and we have volumes that cannot be
+created when they should be allowed to.
+
+Before going any further we first need to define the terms we'll be using to
+ensure they hold the same meaning for all of us, as this is the source of some
+of our current issues.
+
+Disagreement on the mapping of these terms and their description is
+understandable, but for the sake of understanding each other we'll hold below
+descriptions as true since they were defined as such in our specs and most of
+our code.
+
+Improvements on the word used for the terms and field names can be discussed at
+another time and updated in the specs, documentation, and code accordingly.
+
+For the sake of completeness and to remove any misunderstandings that could
+lead to different implementations on the drivers, which we currently have, the
+descriptions will also include some clarifications and examples that may
+reference current Cinder code.
+
+Terminology
+-----------
+
+GB:
+  Even though this is formally known as the symbol representation of a gigabyte
+  -decimal unit of measurement- we will be using it throughout the spec and our
+  code as the symbol for the gibibyte -binary unit of measurement as defined by
+  the International Electrotechnical Commission (IEC), with symbol GiB-, so
+  when we talk about 1GB we are talking about 1024MB, and the same applies to
+  TB and MB.
+
+Total capacity:
+  It is the total physical capacity that would be available in the storage
+  array's pool being used by Cinder if no volumes were present.
+
+  This is currently being reported by the drivers as `total_capacity_gb` and,
+  as the name indicates, should be reported in GB and with a precision no
+  greater than 2 decimals.
+
+  If the storage array has 5TB of space but the pool used in Cinder is limited
+  to 1TB then the driver should be reporting a `total_capacity_gb` of 1024GB.
+
+Volume size:
+  It is the maximum physical size that a volume can take in the storage array.
+
+  This is referenced throughout the code as `volume_size`.
+
+  For a thick volume the `volume_size` will be the same as the free capacity we
+  lost when it was provisioned, whereas for a thin volume it will be greater
+  than the space used for the volume in the storage array until the volume gets
+  completely full.
+
+Free capacity:
+  It is the current physical capacity available in the storage array's pool
+  being used by Cinder.  The number and volume sizes of the thin and thick
+  volumes that have been provisioned by Cinder or directly in the storage array
+  are irrelevant here.
+
+  This is currently being reported by the drivers as `free_capacity_gb` and, as
+  the name indicates, should be reported in GB and with a precision no greater
+  than 2 decimals.
+
+  If the storage array has 5TB of space with a total of 3TB available for all
+  its pools but Cinder is using a pool that has a limit of 1TB of which it has
+  already used 400GB and someone has manually created volumes outside of Cinder
+  that are currently using 124GB of space, then the driver should be reporting
+  a `free_capacity_gb` of 500GB (1TB = 1024GB = 400GB + 124GB + 500GB).
+
+Provisioned capacity:
+  The amount of capacity that would be used in the storage array's pool being
+  used by Cinder if all the volumes present in there were completely full.
+
+  This is currently being reported by the drivers as `provisioned_capacity_gb`
+  and, as the name indicates, should be reported in GB and with a precision no
+  greater than 2 decimals.  This is a required field and *must always be
+  present*.
+
+  This includes not only volumes created by Cinder but also all other existing
+  volumes in that backend, but *does not include snapshots*.
+
+  Let's expand the earlier example from "free capacity" where 524GB of the
+  available 1TB had already been used, and say that the 124GB that were
+  externally created were all used by 1GB thick volumes, and that Cinder was
+  using the 400GB with 400 thick volumes of 1GB and 20 empty thin volumes of
+  20GB each.  In this situation our reported `provisioned_capacity_gb` value
+  should be 924GB ((124 * 1GB) + (400 * 1GB) + (20 * 20GB)).
+
+  If a driver does not report the `provisioned_capacity_gb` data we'll use the
+  automatically calculated `allocated_capacity_gb` as described below.
+
+Allocated capacity:
+  Contrary to what the name may suggest this is not referring to the
+  "allocated" space on the storage array, but to the provisioned volumes
+  created by this specific Cinder Volume backend process on the storage array's
+  pool being used by Cinder and that still present.
+
+  Important to notice that this refers to a specific service backend, so if you
+  are running a multi-backend Cinder service or multiple Cinder Volume services
+  where you have more than one backend configured to use the same storage
+  array's pool, then each one of these backends will only be reporting the
+  sum of the `volume_size` of the volumes they created and not the sum of all
+  the `volume_size` of the volumes that have been created by a Cinder service.
+
+  This is currently being reported by the Volume service as
+  `allocated_capacity_gb` and, as the name indicates, should be reported in GB.
+
+  For two volumes had been created, one thick and one thin, each one of 1GB,
+  then you'll be reporting 2GB as `allocated_capacity_gb`, but if you were to
+  unmanage one of those volumes then you would only be reporting 1GB, even if
+  the volume is still there and will still be counted in the
+  `provisioned_capacity_gb`.
+
+  This field is calculated directly by the Cinder core code and drivers should
+  not calculate or report this information on their `get_volume_stats` method.
+
+Over subscription ratio:
+  It is the maximum ratio between the "provisioned capacity" and the "total
+  capacity" represented as a real number.  A ratio of 1.0 means that the
+  "provisioned capacity" cannot exceed the "total capacity" whereas a value of
+  5.0 means that the Cinder backend is allowed to create as much as 5 times the
+  "total capacity" of the storage array's pool in volumes.
+
+  This will only have effect when a thin provisioned volume is being created,
+  and will be ignored for thick provisioned.
+
+  This is currently being reported by the drivers as
+  `max_over_subscription_ratio` with a greater or equal value to 1.0,
+  preferably with no more than a 2 decimal precision.
+
+  This value is optional, and when missing from the driver's status report the
+  value defined in the `[DEFAULT]` section on the Cinder scheduler receiving
+  the request will be used.  So vendors should make sure that they are
+  correctly returning this value in their drivers if they support thin
+  provisioning and admins should make sure they have a consistent default value
+  of the `max_over_subscription_ratio` across all scheduler nodes.
+
+  Note that this ratio is per backend or per pool depending on driver
+  implementation.
+
+Reserved percentage:
+  Represents the percentage of the storage array's "total capacity" that is
+  reserved and should not be used for calculations.  It is represented by an
+  integer value going from 0 up to 100.
+
+  This is currently being reported by the drivers as
+  `reserved_percentage` with a greater or equal value to 1.0, preferably
+  with no more than a 2 decimal precision.
+
+  Default value is 0 if the field is missing in the status report from the
+  backend or if the user has not defined it in the backend's Cinder
+  configuration.  This is per backend or per pool depending on driver
+  implementation.
+
+Provisioning support:
+  Cinder backends may support up to two different types of provisioning, *thin*
+  and *thick* and drivers are expected to indicate as capable of one of them at
+  least in their capabilities report.
+
+  The way to report support for these is setting to true the boolean fields
+  `thin_provisioning_support` and/or `thick_provisioning_support`.  And non
+  reported provisioning types will default to false.
+
+  A Cinder backend may support both provisioning types at the same time.
+
+Volume provisioning type:
+  For Cinder backends that only support one of the provisioning types all
+  volumes created on them will be of that type, and we can use the volume
+  type's extra specs to make the scheduler filter out backends not supporting a
+  specific provisioning type:
+
+  - 'thin_provisioning_support': '<is> True' or '<is> False'
+  - 'thick_provisioning_support': '<is> True' or '<is> False'
+
+  But if our deployment is using a backend that is supporting both provisioning
+  types simultaneously we need to be explicit about the type of provisioning we
+  want for a volume using the volume type's extra spec `provisioning:type` and
+  setting it to `thin` or `thick`.
+
+  If no `provisioning:type` is defined for a volume it will default to thin if
+  the backend is capable of it, and the driver is expected to honor this
+  assumption.
+
+Incorrect reports
+-----------------
+
+Given above terms which were originally defined in their corresponding specs,
+even if there may be additional comments in this one, we can determine that
+there are a good number of Cinder drivers that do not follow these definitions
+and are reporting what would be incorrect values.
+
+Reporting incorrect values means that on a heterogeneous cloud you'll have
+inconsistent scheduling and an admin will not be able to make sense of the
+stats from the volumes.
+
+To illustrate this here are some of the interpretations we can see across
+different drivers for the `provisioned_capacity_gb`:
+
+* Sum of all the volumes' max sizes, which is correct.
+* Sum of all the volumes' physical disk usage, which is wrong.
+* Sum of the Cinder volumes' physical disk usage, which is wrong.
+
+And something similar happens with the `allocated_capacity_gb` where drivers go
+and report the value directly instead of letting the Cinder core code take care
+of it.  Drivers have been known to report here the following information:
+
+* Sum of the Cinder volumes' physical disk usage, which is correct.
+* Sum of the physical disk usage, which is wrong.
+* Sum of all the volumes' max sizes, which is wrong.
+
+Provisioning calculations
+-------------------------
+
+Some of the creation failures are based on the `provisioned_capacity_gb` value
+being wrong, but there are other cases where Cinder's calculations for over
+provisioning do not match industry's standard definition, which for some admins
+create confusion and undesired behavior.
+
+Standard provisioning calculation to check if a volume of `volume_size` fits
+is::
+
+  ((provisioned_capacity_gb + volume_size) <=
+   (total_capacity_gb
+    x (1 - (reserved_percentage / 100.0))
+    x max_over_subscription_ratio))
+
+Whereas the Cinder calculations, which were agreed on as the best calculations
+for being considered safer are::
+
+  (volume_size <=
+   (free_capacity_gb
+    - (total_capacity_gb x reserved_percentage / 100.0))
+   x max_over_subscription_ratio)
+
+
+Calculating max over subscription ratio
+---------------------------------------
+
+Most deployments have very dynamic workloads each with different physical
+storage requirements, which means that one month we may require many volumes
+of which we barely use any space and next month we may require fewer volumes
+but use most of the provisioned capacity.
+
+This makes it almost impossible to accurately model our storage requirements
+at deployment time, which is precisely when we have to set the
+`max_over_subscription_ratio` for our Cinder backends.
+
+As requirements change one option would be to change the configuration and
+restart our Cinder Volume services, but since Cinder is also in the data path
+the restart may take a long time to do and will have a considerable impact on
+our cloud users.
+
+Not being able to determine beforehand the best `max_over_subscription_ratio`
+and not being able to easily restart the Cinder service is a common pain that
+most operators have with backends supporting thin provisioning.
+
+Use Cases
+=========
+
+The basic case for fixing the status report is where we would like to have
+consistent reporting from our backends for the admins to see in the logs and
+for the scheduler to use.
+
+Any operator using thin provisioning storage that wants to optimize their
+storage usage and dynamically adjust to the dynamic requirements of its cloud.
+
+As for the alternative calculations it would greatly benefit any backend that
+is close to their full capacity or one that is creating huge volumes that
+usually never get filled in.
+
+Proposed change
+===============
+
+Incorrect reports
+-----------------
+
+Since we have consolidated all the documentation in one place, this spec, where
+we clearly state expected driver behavior all driver maintainers will be urged
+to make their drivers compliant with it.  This will mean adapt their drivers
+to follow this document's definition for `provisioned_capacity_gb` and stop
+reporting the `allocated_capacity_gb` field.
+
+Automatic over subscription ratio calculation
+---------------------------------------------
+
+To allow automatic over subscription ratio calculation we will add a new
+configuration option named `auto_max_over_subscription_ratio` that will
+instruct Cinder to use configured `max_over_subscription_ratio` as a starting
+reference when the backend is empty and then, when there is data calculate the
+current value on each driver stats report with the following formula::
+
+  adjusted_total = `total_capacity_gb` x (1 - (`reserved_percentage` / 100.0))
+  ratio = `provisioned_capacity_gb` / adjusted_total - `free_capacity_gb`
+
+If the driver is not reporting `provisioned_capacity_gb` then we'll proceed to
+use the `allocated_capacity_gb` instead::
+
+  adjusted_total = `total_capacity_gb` x (1 - (`reserved_percentage` / 100.0))
+  ratio = `allocated_capacity_gb` / adjusted_total - `free_capacity_gb`
+
+This new configuration option will be independent of the drivers and it will be
+part of Cinder's core code, so if `auto_max_over_subscription_ratio` is not
+defined or set to `False` then Cinder will continue behaving as it is now
+(returning the `max_over_subscription_ratio` that is reported by the driver or
+the one configured by default if not present).  But if it is set to True it
+will always return calculated ratio as explained.
+
+There are a couple of drivers that are already doing this, Pure and Kaminario's
+K2, but with different configuration options,
+`pure_automatic_max_oversubscription_ratio` and
+`auto_calc_max_oversubscription_ratio` respectively, so we'll deprecate those
+configuration options and remove the code within those drivers when we add the
+generic code to Cinder.
+
+Provisioning calculations
+-------------------------
+
+Instead of keep fighting with admins and developers on which one of the
+approaches is best -standard calculation or Cinder's- we will be adding a new
+configuration option called `over_provisioning_calculation` which will take
+values `standard` and `cinder` and default to `cinder` for backward
+compatibility and that will be used by the `CapacityFilter` to determine which
+one of the mechanism to use.
+
+This configuration option will also affect `CapacityWeigher` as it will need to
+do the free space calculation according to the standard definition as well.
+
+As one can assume thick provisioning will have no modifications on its
+behavior.
+
+Alternatives
+------------
+
+* Don't support standard over-provisioning calculations.
+* Instead of modifying `CapacityFilter` and `CapacityWeigher` create 2 new
+  classes.
+* Instead of adding the `over_provisioning_calculation` configuration option
+  make the filter use the options JSON file provided by
+  `scheduler_json_config_location` .  This data seems to be currently missing
+  on some of the operations like migrate, extend, so that would need to
+  changed.
+
+Data model impact
+-----------------
+
+N/A
+
+REST API impact
+---------------
+
+The only affected API will be the `get_pools` API that will be able to return
+the 2 new fields, `total_used_capacity_gb` and `cinder_used_capacity_gb`, when
+they are being reported by the driver's `get_volume_stats` method.  Fields will
+not be present if the drivers are not reporting them.
+
+Security impact
+---------------
+
+N/A
+
+Notifications impact
+--------------------
+
+N/A
+
+Other end user impact
+---------------------
+
+The user may see new fields when calling cinderclient's `get_pools`.
+
+Performance Impact
+------------------
+
+Depending on the driver and the storage array, performance could increase or
+decrease, since getting provisioned sizes instead of physical sizes could be
+faster or slower.
+
+Other deployer impact
+---------------------
+
+With the change of values returned by Cinder backends for
+`allocated_capacity_gb` and `provisioned_capacity_gb` we may experience
+failures on creating volumes until we correct the values of
+`reserved_percentage` and `max_over_subscription_ratio` in our cloud to the
+right values, since we may have been using incorrect ones.
+
+Two new configuration options will be added:
+
+* `auto_max_over_subscription_ratio`: Boolean value that will instruct Cinder
+  to automatically calculate the over subscription ratio based on current usage
+  instead of using a fixed value.
+
+* `over_provisioning_calculation`: Will allow to select what kind of
+  calculations the `CapacityFilter` does to determine if there is space for a
+  volume in a backend.  Acceptable values are `standard` and `cinder`.  Default
+  values will be `cinder`.
+
+Developer impact
+----------------
+
+Driver maintainer will need to verify, and fix if necessary, their stat reports
+for `allocated_capacity_gb` and `provisioned_capacity_gb` unless they start
+using the new `auto_max_over_subscription_ratio` configuration option.
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  None
+
+Other contributors:
+  None
+
+Work Items
+----------
+
+* File bugs for drivers that are not in compliance.
+* Fix drivers stat reporting.
+* Add support for the 3 new fields, `provisioned_capacity_precission`,
+  `total_used_capacity_gb` and `cinder_used_capacity_gb` in the scheduler, the
+  `get_pools` API, and the client.
+* Modify `CapacityFilter` to support the standard over-provisioning
+  calculation.
+* Modify `CapacityWeigher` to support standard over-provisioning calculations.
+* Add to the volume manager the estimation mechanism for drivers that don't
+  report `provisioned_capacity_gb`.
+* Update all the developers reference docs to ensure that there is no more
+  confusion on what the report stats need to return, and make sure that the
+  wiki page on how to contribute a driver links to that documentation
+  explaining the importance of following it when writing the driver.
+
+Dependencies
+============
+
+N/A
+
+Testing
+=======
+
+New unit tests will be added to test the changed code.
+
+Documentation Impact
+====================
+
+Since our current documentation is lacking in this aspect this will add and
+update it to reflect what's expected of the driver in the stats reports.
+
+End user documentation should also be updated.
+
+References
+==========
+
+.. [1] https://specs.openstack.org/openstack/cinder-specs/specs/kilo/over-subscription-in-thin-provisioning.html
+.. [2] https://specs.openstack.org/openstack/cinder-specs/specs/newton/differentiate-thick-thin-in-scheduler.html
+.. [3] https://specs.openstack.org/openstack/cinder-specs/specs/liberty/standard-capabilities.html