From a4fe12b14ea0faa0a5e1e8d9d4a405c59ce9149d Mon Sep 17 00:00:00 2001 From: Alexander Chadin Date: Thu, 13 Oct 2016 12:10:01 +0300 Subject: [PATCH] Add improvements to the planner and workflow mechanisms This specification is designed to start improvements of the action plan execution process regarding to the Watcher Applier. Change-Id: If6b4d78282b8f59e78d0de272783042c2e168c7d Partially Implements: blueprint planner-storage-action-plan --- .../approved/planner-storage-action-plan.rst | 278 ++++++++++++++++++ 1 file changed, 278 insertions(+) create mode 100644 specs/ocata/approved/planner-storage-action-plan.rst diff --git a/specs/ocata/approved/planner-storage-action-plan.rst b/specs/ocata/approved/planner-storage-action-plan.rst new file mode 100644 index 0000000..4eefae5 --- /dev/null +++ b/specs/ocata/approved/planner-storage-action-plan.rst @@ -0,0 +1,278 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +===================================== +Improve the storage of the ActionPlan +===================================== + +https://blueprints.launchpad.net/watcher/+spec/planner-storage-action-plan + +Problem description +=================== + +`Action plan`_ is built by the default `planner`_ of the `decision-engine`_ +service. Every `Action`_ is linked to the next one so we have linked list +of Actions as result of default planner's work. `Applier`_ service executes +the action plan. Launching actions in a sequence has negative influence +on performance, especially if there are a lot of actions. Every action +must be completed before next action will be run, so it will take +a lot of time to launch migration actions in sequence. + +Use Cases +---------- + +As an administrator, I would like to be able to run Action plan that will +launch actions for every node (compute, network, storage) in parallel. + +Proposed change +=============== + +We can launch several concurrent actions with the same action weight in +parallel. + +This specification defines two different planner with the same input/output +dataflow: weight planner and workload stabilization planner. Each planner +should use new ``parents`` attribute that will store list of actions the +action is linked to, but there are different ways to fill up this attribute. + +Weight planner is designed to get unified way to parallelize execution +of actions. It knows nothing about structure of specified actions and how they +are related to each other. The main goal is to build sets of actions ordered by +their weights. Administrator can specify each action type's weight and its +parallelization ability. +Let's show example of acyclic directed graph for this planner: + +:: + + ---sleep2 ---migration1 ---resize1 + / \ / \ / \ + I----sleep1-------migration2------resize2----resize3----E + \ / + ---migration3 + +This graph is built with the following settings: + ++-----------+--------+-----------------+ +| | Weight | Parallelization | ++===========+========+=================+ +| Sleep | 70 | 2 | ++-----------+--------+-----------------+ +| Migration | 60 | 3 | ++-----------+--------+-----------------+ +| Resize | 50 | 2 | ++-----------+--------+-----------------+ + +Since there are three resize actions and only two are allowed to be executed +at the same time, the third resize action will be executed only when +the previous actions are done. + +Weight planner is easy to configure and has to be considered +as new default planner. + +Workload Stabilization planner respects features of action types by allowing +to affect on building of ``parents``. Some action types like resizing +the instance should be run only before/after migration of the same instance, +not in parallel. There are to be specified some constraints: + +* Unlinked actions are to be performed in parallel. +* Only one action can be performed per instance at one time +* If there are more than one action per instance, linked actions must be + executed in sequence with respect to action weights. + +So we can add new attribute ``parents`` that will store list of actions +the current action is linked to. It can be shown using +the acyclic directed graph: + +:: + + ----------------a1(t=dis)----------- + / \ + | \ + | ----------------a2(t=mig)-------------\ + |/ \ + I-----a3(t=mig) -- a31(t=res)---------------E + |\ / + | ----------a4(t=resize)----------------/ + | / + \ / + a5(resource=123, -- a51(resource=123, + t=mig) t=resize) + +Here we can see the following links: + +* a1 action disables the compute node and is not linked with another action. + It can be run in parallel with other action (there is no constraint). +* a2 action migrates the instance and is not linked with another action. + It can be run in parallel with other action (there is no constraint). +* a3 action migrates the instance and is parent to a31 action. + It can be run in parallel with other action. +* a31 action resizes an instance after this last one had been migrated + by action a3. +* a4 action resizes the instance and is not linked with another action. + It can be run in parallel with other action (there is no constraint). +* a5 action migrates the instance and is parent to a51 action. + It can be run in parallel with other action. +* a51 action resizes the same instance after this last one had been migrated + by action a5. + +As we can see, all actions that are independent to each other can be performed +in parallel. Meanwhile, if some actions are linked to the same resource then +they are to be performed with respecting to action weights. Currently, we can +define the following weights: + +* End of graph - 0 +* Disable Compute Node - 1 +* Resize instance - 2 +* Migrate instance - 3 +* Initial point - 4 + +There is estimated pseudocode to show part of workload stabilization +planner's workflow:: + + action = [uuid, type, resource_id, metadata] + + init = Flow() + + action_weights = { + 'turn_host_to_acpi_s3_state': 0, + 'resize': 1, + 'migrate': 2, + 'sleep': 3, + 'change_nova_service_state': 4, + 'nop': 5, + } + + actions = sorted_by_weights(descended) + + for action in actions: + a_type = action['action_type'] + if a_type != 'turn_host_to_acpi_s3_state': + db_action = self._create_action(context, action) + plugin_action = self.load_child_class( + db_action.action_type) + parents = plugin_action.validate_parents( + resource_action_map, action) + if parents: + db_action.parents = parents + db_action.save() + else: + # if we have an action that will make host unreachable, we need to + complete all actions (resize and migration type) related to the + host. + parent_actions = get_actions(metadata=action[metadata][host]) + resize_actions = [x for x in parent_actions if x[type] == resize] + migration_actions = [x for x in parent_actions if x[type] == mig] + resize_migration_parents = [x[parents] for x in resize_actions] + # Since resize actions have less weight than migration, they may + have migration actions as parents and must be connected to the + turn_host_to_acpi_s3_state action firstly. + action_parents = [] + action_parents.extend([x[uuid] for x in resize_actions]) + # Add migrations that aren't linked to resize type actions + action_parents.extend([x[uuid] for x in migration_actions + if [x[uuid]] not in resize_migration_parents) + db_action = create_action(action, parents=action_parents) + +This spec is limited to simple chained list of actions as action plan. The +second part of modifying action plan's executing will contain graph for +parallel executing of action plans. + + +Alternatives +------------ +None + +Data model impact +----------------- +* ``next`` column should be removed from the Action table. +* ``parents`` column should be added to the Action table. Type: JSON. +* ``first_action_id`` column should be removed from Action Plan table. +* ActionPlan object major version should probably be updated to 2.0 + +REST API impact +--------------- +None + +Security impact +--------------- +None + +Notifications impact +-------------------- +None + + +Other end user impact +--------------------- +None + +Performance Impact +------------------ +None + + +Other deployer impact +--------------------- +None + + +Developer impact +---------------- +None + + +Implementation +============== + +Assignee(s) +----------- +Primary assignee: +Alexander Chadin + +Other contributors: +Vincent Francoise