
After STX-Openstack upversioned to Antelope, we noticed that it was not possible to create VMs by volume, as they would be stuck on ERROR status. The first proposed solution was to create a patch containing [1] and [2], because, as specified in [3], Nova now requires a service token in order to be able to manipulate Cinder volumes. This unfortunately did not solve the issue by itself, as now an error message showed up on the nova-conductor pods with the following (not full error message, only important part): "nova.exception.RescheduledException: Build of instance 2f32c7ea-1720-4f61-bce8-dbe970c40b0c was re-scheduled: Secret not found: no secret with matching uuid 'a7f3ae2e-cee7-4f04-9402 -a78047747654". This UUID was not the same one present when issuing `virsh secret-list` on Cinder, Nova and Libvirt containers. Turns out openstack-helm and openstack-helm-infra have a Ceph UUID hardcoded in them, in Cinder [4], Nova [5] [6] and Libvirt [7] values. By changing these values to the UUID that libvirt was trying to find (7f3ae2e-cee7-4f04-9402-a78047747654), and it worked to solve the issue. However, it is not a good practice to use hardcoded values, and, searching on where this UUID was coming from, it turns out it was defined by the platform's Ceph configuration under `/etc/ceph/ceph.conf`. This still leaves the question, why was this working on Ussuri and stopped working on Antelope? First of all, the Ceph official documentation [8] [9] about using it with OpenStack explains the process of adding the secret to libvirt, to store the ceph admin keyring. You can see that the secret uuid is generated "on the fly" and both docs mention that old/hard-coded value (i.e., 457eb676-33da-42ec-9a8c-9293d545c337). This is the reason why it used to work until our upversion to OpenStack Antelope/2023.1: this UUID does not really matter (as long as nova and libvirt have the same value for it)! It is a given UUID to the libvirt secret that will store ceph keyring [10], and the key ring will ensure proper communication between our services and the platform ceph. What changed between Ussuri and Antelope (2023.1), is that now there is a specific method [11] to set a default value (Ceph's Cluster UUID) for this UUID when it is not specified in the driver configuration. What this change does is dynamically read this `/etc/ceph/ceph.conf` file to search for the UUID value, and use it to override the [4] [5] [6] and [7] values. It also adds the patch including the Nova service token configuration. The combination of these 2 changes allows VMs to be created by volumes. [1]91c8a5baf2
[2]7d39af25fd
[3] https://docs.openstack.org/releasenotes/cinder/2023.1.html#upgrade-notes [4] https://opendev.org/openstack/openstack-helm/src/branch/master/cinder/values.yaml#L942 [5] https://opendev.org/openstack/openstack-helm/src/branch/master/nova/values.yaml#L594 [6] https://opendev.org/openstack/openstack-helm/src/branch/master/nova/values.yaml#L1432 [7] https://opendev.org/openstack/openstack-helm-infra/src/branch/master/libvirt/values.yaml#L100 [8] https://github.com/ceph/ceph/blob/main/doc/rbd/rbd-openstack.rst [9] https://docs.huihoo.com/ceph/v0.80.5/rbd/rbd-openstack/index.html [10] https://opendev.org/starlingx/openstack-armada-app/src/branch/master/python3-k8sapp-openstack/k8sapp_openstack/k8sapp_openstack/helm/libvirt.py#L60 [11]6464d37d16 (diff-9b122c182b4333b747e7fd7e07f73d68ff30256a)
Test Plan: PASS: Build openstack-helm, python3-k8sapp-openstack and stx-openstack-helm-fluxcd PASS: Upload / apply / remove STX-Openstack PASS: Create a VM by an image PASS: Create a volume and launch a VM from it PASS: Create a VM using the `boot-from-volume` flag PASS: Delete a VM created by a volume Closes-Bug: 2037463 Change-Id: Ia00bb8dbe3460ce817d69049f97f56a96ad6a298 Signed-off-by: Lucas de Ataides <lucas.deataidesbarreto@windriver.com>
This repo is for https://github.com/openstack/openstack-helm
Changes to this repo are needed for StarlingX and those changes are not yet merged. Rather than clone and diverge the repo, the repo is extracted at a particular git SHA, and patches are applied on top.
As those patches are merged, the SHA can be updated and the local patches removed.