Add 2025.1 spec for prometheus as datasource in watcher

Also remove redundant .gitkeep

Co-Authored-By: Dan Smith <dms@danplanet.com>
Co-Authored-By: Sean Mooney <smooney@redhat.com>

Change-Id: Idc9bf7d218a55b3f1847e5579d55bd2f1acd7d7c
This commit is contained in:
Marios Andreou 2024-10-24 15:01:51 +03:00 committed by m
parent e7cc93a76c
commit fe6895c7c2
6 changed files with 218 additions and 0 deletions

View File

@ -35,6 +35,7 @@ Here you can find the specs, and spec template, for each release:
specs/ocata/index
specs/newton/index
specs/mitaka/index
specs/2025.1/index
There are also some approved backlog specifications that are looking for
owners:

View File

@ -0,0 +1 @@
../../../../specs/2025.1/approved

View File

@ -0,0 +1,18 @@
=============================
Watcher 2025.1 Specifications
=============================
Template:
.. toctree::
:maxdepth: 1
Specification Template (2025.1 release) <template>
2025.1 approved (but not implemented) specs:
.. toctree::
:glob:
:maxdepth: 1
approved/*

View File

@ -0,0 +1 @@
../../../../specs/2025.1-template.rst

View File

@ -0,0 +1,197 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Add Prometheus as a Watcher Data Source
==========================================
launchpad blueprint: https://blueprints.launchpad.net/watcher/+spec/example
Watcher currently supports a small number of data sources for collection of
metrics: Ceilometer, Gnocchi and Grafana. Prometheus is a widely adopted
time/series based metric collection system that allows for collection of any
type of custom metric an operator may be interested in for their cloud VMs or
containers.
Besides its usage in OpenStack deployment, Prometheus is considered Kubernetes
'native' as both are CNCF projects and Prometheus is included as part of
Kubernetes distributions.
Adding the ability for Watcher to interact with a Prometheus data source
will increase the potential user base for Watcher and especially to those
operators that are familiar with or already using Prometheus.
Problem description
===================
Watcher currently supports a small number of data sources for collection of
metrics: Ceilometer, Gnocchi and Grafana. Some of these are no longer actively
developed and integrated with OpenStack distributions, limiting the ability
to deploy watcher at all.
As Prometheus becomes the de facto standard metrics store in the Kubernetes
ecosystem and OpenStack is increasingly deployed on Kubernetes, Watchers'
inability to consume metrics from Prometheus limits the project's reach.
Use Cases
----------
By providing the ability to couple the efficient and highly customizable
Prometheus collector with the Watcher project operators can achieve a powerful
optimization solution for their OpenStack deployments. There is currently no
way to use Prometheus as a data source for Watcher.
As an operator with existing knowledge of Prometheus, I would like to
leverage the power of Watcher as an optimization engine, by using it as a data
source.
As an operator with existing Kubernetes infrastructure, I would like to reuse
the same metrics storage solution across my OpenStack and Kubernetes
deployments.
As a developer of Watcher, I want to allow it to be deployed in more OpenStack
clouds, leveraging popular open-source tools to increase the project's reach
and adoption.
Proposed change
===============
A new Prometheus module will be added to watcher.decision_engine.datasources
which will leverage the https://opendev.org/openstack/python-observabilityclient
already used by AODH to retrieve metrics from Prometheus.
https://github.com/openstack/aodh/commit/f932265290a4e923eac6111eb28578489c7dce33
As a first implementation, we are not expecting to extend the DataSource
METRIC_MAP beyond the existing set (host/instance cpu/ram etc). That could be
considered future work depending on the success of this proposal.
The new Prometheus client will provide a default set of mappings to enable a
subset of strategies and goals to function by normalising the Prometheus
metric names and units to align with the existing values supported by other
data sources.
This initial work will not utilise Prometheus alert to enable triggering
audits and instead will build on AODH's existing integration to fulfil that
use case.
Alternatives
------------
It is not possible to use Prometheus as a metrics collector currently. The
alternative is to use one of the currently supported data sources which
restricts the potential user base for Watcher.
Data model impact
-----------------
There are no expected changes to the data model as part of this proposal.
Given the extensibility of Prometheus as a collector, it is feasible that
future work could propose extension of the Watcher metrics beyond the
current set (host/instance cpu or ram usage, temperatore etc). However
that is not in the scope of this current proposal.
REST API impact
---------------
This proposal is not expected to impact the REST API.
Security impact
---------------
None Expected
Notifications impact
--------------------
None expected.
Other end user impact
---------------------
None expected.
Performance Impact
------------------
There is no expected impact to using a Prometheus data source compared
to any of the currently supported sources.
Other deployer impact
---------------------
No anticipated impact besides the ability to integrate with a new data source.
Deployers will have to provide the required configuration values such
as (Prometheus) authentication credentials required for the integration.
A new optional dependency on python-observabilityclient will be introduced
which may require changes to packaging and installers.
Developer impact
----------------
The watcher devstack plugin will be extended to allow developers to use
Prometheus instead of the default Gnocchi/Ceilometer collectors.
Implementation
==============
Assignee(s)
-----------
Sean Mooney, Marios Andreou,
Reviewers
-----------
Dan Smith
Work Items
----------
We will need:
* New prometheus.py subclass of base.DataSourceBase in the [datasources](https://github.com/openstack/watcher/tree/master/watcher/decision_engine/datasources),
* A prometheus_client.py to handle authentication and transport of metrics
from the Prometheus instance under
[conf](https://github.com/openstack/watcher/tree/master/watcher/conf),
* Extend the Zuul CI testing for the Prometheus integration, that is, add a
new devstack job similar to the existing
[watcher-tempest-strategies](https://zuul.opendev.org/t/openstack/builds?job_name=watcher-tempest-strategies&project=openstack/watcher)
to enable Watcher with a Prometheus collector.
* Extend the Watcher devstack plugin to support deployment with Prometheus
instead of the default Gnocchi/Ceilometer.
Dependencies
============
The proposal requires that the OpenStack deployment monitored by the Prometheus
instance used as a data source, has deployed the appropriate exporters, the
actual collection functions and API endpoints, such that they can be mapped to
the expected Watcher metrics (host_cpu_usage, host_ram_usage,
instance_cpu_usage etc).
Testing
=======
As mentioned under work items this work will also include addition of a new
CI job against the Watcher code repo. Beyond ensuring the integration point
(e.g. communication with Prometheus is OK, metrics are received and processed
correctly etc) ideally this should include functional testing similar to the
existing watcher-tempest-strategies job that has execution of strategies.
Documentation Impact
====================
We will need to extend documentation including considerations around setup,
for example, setting up the appropriate exporters on the Prometheus side,
best practices around authentication/certs etc.
References
==========
This proposal was first mentioned by S Mooney during the
[October 2024 Watcher PTG session](https://etherpad.opendev.org/p/oct2024-ptg-watcher)
session