nodepool/releasenotes/notes/aws-dedicated-hosts-5b68f1174d8f242c.yaml
James E. Blair 6713bba9b7 Add support for AWS dedicated hosts
This adds limited support for dedicated hosts in AWS.  This is
primarily to enable users to launch certain instance types (such
as macos) which can only be launched on dedicated hosts.

While AWS allows multiple instances to run on a dedicated host,
this change to nodepool does not implement that.  Instead, it
launches only a single instance on a dedicated host.  The lifecycle
of the two are bound, so that when the nodepool node is deleted,
both are removed from AWS.  Supporting multiple instances on
a dedicated host would be significantly more complicated.

This change introduces a new way of tracking multiple resources for
a single node.  Since we need to create two resources (the host and
the instance), we need to handle the case where the host allocation
fails before the instance is created.  Nodepool relies on the
"external id" to handle cleanup in case of failure, but there are
two extenal ids here.  We could store the id of the host first, then
switch to the id of the instance later (AWS makes this easy by
prefixing their ids by type, eg "h-..." and "i-...").  However, this
change implements a more generalized solution:

This change updates the external_id for AWS nodes from a string to
a dictionary.  The dict holds up to two keys: 'host' and 'instance'.
The ZK code path handles this transparently.  The status commands
coerce the value to a string before printing them (and a test is
added for this).  Zuul will need an update to stringify the value
on the Nodes page.

Here are some alternatives:
* Store the host value first, then the instance later
  (This only works for this specific case since we can disambiguate them)
* Serialize the value to JSON inside the adapter
  (This is unecessary work, but simple)
* Rely on the leaked resource cleanup
  (This will clean up resources more slowly)

The application of quota is not entirely clear from documentation
at the moment.  We know that there are quotas for specific dedicated
host types.  There are also vcpu and other quotas.  What is not
clear is if vcpu quotas apply to dedicated hosts, and if instance
quotas apply to instances launched on dedicated hosts.

For the moment, we assume that only dedicated host quotas apply
(since that is the least likely to cause us to under-provision).  So
we count a node request against the dedicated host quota if it
involves a dedicated host.

If we later find out that we should also count the instance that we
launch on the dedicated host against the account's instance quota,
it should be a simple matter to merge these quota calculations rather
than use one or the other as we do now.  The points where this should
happen are noted in code comments.

Dedicated hosts require an availability-zone setting, so this is now
added to the pool configuration in general (it can be used for on-demand
instances as well).

References to "run_instance" are updated to "run_instances" in the aws
driver tests for consistency and clarity.

The statemachine driver now logs the error from the cloud provider when
it encounters a quota-related error.  In the case of AWS, this provides
useful information about which quota was hit.

Change-Id: I651327a8ace3b8921588a5ec9490f02fb7e685f7
Depends-On: https://review.opendev.org/c/zuul/zuul/+/921346
2024-06-26 17:21:26 -07:00

10 lines
396 B
YAML

---
features:
- |
Limited support for AWS dedicated hosts is now available using the
:attr:`providers.[aws].pools.labels.dedicated-host` option. This
allows for a dedicated host to be allocated to serve a single
instance (the dedicated host can not be used for multiple
instances). This enables Nodepool to launch certain instance
types which require dedicated hosts.