Added lessons learnt document
The lessions learnt document has been created on etherpad, this patch took all the content from that document, edited, reformatted, and also added more few items I learnt from my own experiences of developing these workload scripts. Change-Id: I3ab9cacd6369c3cbcfc912707a9e844ae7c4fc2d
This commit is contained in:
parent
4e8f7cb7e2
commit
bdaa492657
112
doc/source/lessonslearnt.rst
Executable file
112
doc/source/lessonslearnt.rst
Executable file
@ -0,0 +1,112 @@
|
||||
Tooling
|
||||
-------
|
||||
|
||||
For interoperable automated deployment, Ansible + Ansible OpenStack cloud
|
||||
modules (based on OpenStack Shade) provided the best results.
|
||||
|
||||
Terraform and its OpenStack cloud modules (based on OpenStack Shade) has
|
||||
also been tried, but various issues such as not supporting multiple same
|
||||
service endpoints renders many clouds which support multiple endpoints for
|
||||
Nova, Neutron versions rendered failed deployment on these clouds, but
|
||||
supporting multiple endpoints is necessary for different versions of
|
||||
OpenStack client applications. Terraform also does not allow apply (creation)
|
||||
and destroy (remove) action to be used at the same time. But it is often
|
||||
needed, for example, during a deployment, you may need a floating IP (that is
|
||||
the apply action in Terraform) but at the end of the deployment you may want
|
||||
to remove that floating IP (that is the destroy action in Terraform), so the
|
||||
floating IP which is a resource in Terraform only existed in a short period
|
||||
of time, Terraform can not really handle the situation. This is probably the
|
||||
most unforgiven restriction of the Terraform. The Interop Challenge working
|
||||
group can not seem find a work around to overcome the restriction. Also it
|
||||
appears that these issues have been identified but have not been actively
|
||||
addressed by Terraform community.
|
||||
|
||||
OpenStack Heat has also been discussed but since the adoption of HEAT is
|
||||
still not wide spread, this tool was not used. Similar reasons for other
|
||||
tools like Murano and Juju.
|
||||
|
||||
It's perhaps worth noting that both the Ansible OpenStack cloud modules and
|
||||
Terraform OpenStack cloud modules based on OpenStack Shade, which is
|
||||
a library that was written explicitlly to work around some Interop
|
||||
problems. So we can essentially have some degree of interop as long as
|
||||
there is an interop layer between us and the cloud (the aim should be not
|
||||
to need this library), tooling in interop challenge is a very important
|
||||
subject.
|
||||
|
||||
Shade seems to be missing AZ parameter for create_keypair (Ansible's
|
||||
os_keypair) and other functions which can cause problems on clouds with
|
||||
multi-AZs per region.
|
||||
|
||||
|
||||
Networking
|
||||
----------
|
||||
|
||||
Network virtualization features are where most interoperability issues become
|
||||
visible. OpenStack Neutron support very large number of plugins, these plugins
|
||||
can behave very differently. For example, private IP and floating IP
|
||||
supporting can vary, some clouds make public accessable IP address as private
|
||||
IP address when returned from client library, some clouds make the same thing
|
||||
as public IP address, the later seems to be the right behavior, but clouds
|
||||
implement them differently. Layer 2 and layer 3 functions can be also
|
||||
challenge, some clouds won't expose the functions for customers to create
|
||||
routers, or networks. Releasing the alocated floating IPs is completely
|
||||
missing from all OpenStack cloud modules tools like Ansible and Terraform.
|
||||
This problem results in the alocated floating IPs hanging around, it is
|
||||
especially bad for clouds which do not have small public IP address segment.
|
||||
|
||||
Not all clouds provide tenant networks by default. Be prepared to have to
|
||||
configure your own if netnant network can be had.
|
||||
|
||||
Can not assume the first NIC on the guest is going to be eth0 (this is common
|
||||
on older guest OS's prior to the arrival of Predictable Network Interface
|
||||
Names and systemd, and likely isn't true on newer guest OS's). Instead, allow
|
||||
the user to set those as parameters to the workload or try to detect these
|
||||
names in the workload when the network nic is needed.
|
||||
|
||||
Not all clouds support floating IP or private IP. You may want to structure
|
||||
your workload so that it can adapt to either attach instances to a routeable
|
||||
network or use floating IP's based on the parameters it's given.
|
||||
|
||||
The tenant network has its advantages when the communications are server to
|
||||
server on the same network. For example, when your deployment scenario
|
||||
involves multiple backend servers such as database and application servers,
|
||||
the commuincation between these servers can be placed on the tenant network
|
||||
to improve security and performance.
|
||||
|
||||
|
||||
Provisiong
|
||||
----------
|
||||
|
||||
It makes a real difference not only the HW that the cloud is running on but
|
||||
also if the backend is ceph or something else, if it is co-located, if the
|
||||
images have any sort of overhead checks, etc.
|
||||
|
||||
If you don't assume a particular guest OS image, be careful with
|
||||
storage/networking. We encountered one example in which a particular
|
||||
guestOS/virtual adapter pair needed to rescan the SCSI bus before it would
|
||||
recognize a newly attached Cinder volume. Rescanning the bus is generally
|
||||
harmless if not needed and ensures that images built with adapter types that
|
||||
need it run successfully, so it's an example of something you can do to make
|
||||
your workloads more interoperable.
|
||||
|
||||
Parameterize things that are likely to change across different cloud/guest
|
||||
OS setups. For example: don't assume the first volume attached to a guest
|
||||
will always be /dev/vdb (this is common but not guaranteed on libvirt, often
|
||||
untrue on other hypervisors).
|
||||
|
||||
|
||||
Metadata
|
||||
--------
|
||||
|
||||
Not all cloud support cloud-init, when develop workloads which heavily rely
|
||||
on metadata services, the clouds without metadata support will fail.
|
||||
|
||||
|
||||
Conclusion
|
||||
----------
|
||||
|
||||
With best practices it is possible to create enterprise applications (with
|
||||
enterprise characteristics such as load balancer, multiple web application
|
||||
servers, distributed database, security groups, block storage to provide
|
||||
enterprise level networking safeguards) can be created such that they are
|
||||
portable to numerous (over 18) private and public OpenStack Clouds.
|
Loading…
x
Reference in New Issue
Block a user