Merge "Add cross-pod L2 Networking spec file"
This commit is contained in:
commit
828f55287e
563
specs/cross-pod-l2-networking.rst
Normal file
563
specs/cross-pod-l2-networking.rst
Normal file
@ -0,0 +1,563 @@
|
|||||||
|
======================================
|
||||||
|
Cross pod L2 networking in Tricircle
|
||||||
|
======================================
|
||||||
|
|
||||||
|
Background
|
||||||
|
==========
|
||||||
|
The Tricircle provides unified OpenStack API gateway and networking automation
|
||||||
|
functionality. Those main functionalities allow cloud operators to manage
|
||||||
|
multiple OpenStack instances which are running in one site or multiple sites
|
||||||
|
as a single OpenStack cloud.
|
||||||
|
|
||||||
|
Each bottom OpenStack instance which is managed by the Tricircle is also called
|
||||||
|
a pod.
|
||||||
|
|
||||||
|
The Tricircle has the following components:
|
||||||
|
|
||||||
|
* Nova API-GW
|
||||||
|
* Cinder API-GW
|
||||||
|
* Neutron API Server with Neutron Tricircle plugin
|
||||||
|
* Admin API
|
||||||
|
* XJob
|
||||||
|
* DB
|
||||||
|
|
||||||
|
Nova API-GW provides the functionality to trigger automatic networking creation
|
||||||
|
when new VMs are being provisioned. Neutron Tricircle plug-in is the
|
||||||
|
functionality to create cross OpenStack L2/L3 networking for new VMs. After the
|
||||||
|
binding of tenant-id and pod finished in the Tricircle, Cinder API-GW and Nova
|
||||||
|
API-GW will pass the cinder api or nova api request to appropriate bottom
|
||||||
|
OpenStack instance.
|
||||||
|
|
||||||
|
Please refer to the Tricircle design blueprint[1], especially from
|
||||||
|
'7. Stateless Architecture Proposal' for the detail description of each
|
||||||
|
components.
|
||||||
|
|
||||||
|
|
||||||
|
Problem Description
|
||||||
|
===================
|
||||||
|
When a user wants to create a network in Neutron API Server, the user can
|
||||||
|
specify the 'availability_zone_hints'(AZ or az will be used for short for
|
||||||
|
availability zone) during network creation[5], in the Tricircle, the
|
||||||
|
'az_hints' means which AZ the network should be spreaded into. The 'az_hints'
|
||||||
|
meaning in Tricircle is a little different from the 'az_hints' meaning in
|
||||||
|
Neutron[5]. If no 'az_hints' was specified during network creation, this created
|
||||||
|
network will be spread into any AZ. If there is a list of 'az_hints' during the
|
||||||
|
network creation, that means the network should be able to be spread into these
|
||||||
|
AZs which are suggested by a list of 'az_hints'.
|
||||||
|
|
||||||
|
When a user creates VM or Volume, there is also one parameter called
|
||||||
|
availability zone. The AZ parameter is used for Volume and VM co-location, so
|
||||||
|
that the Volume and VM will be created into same bottom OpenStack instance.
|
||||||
|
|
||||||
|
When a VM is being attached to a network, the Tricircle will check whether a
|
||||||
|
VM's AZ is inside in the network's AZs scope. If a VM is not in the network's
|
||||||
|
AZs scope, the VM creation will be rejected.
|
||||||
|
|
||||||
|
Currently, the Tricircle only supports one pod in one AZ. And only supports a
|
||||||
|
network associated with one AZ. That means currently a tenant's network will
|
||||||
|
be presented only in one bottom OpenStack instance, that also means all VMs
|
||||||
|
connected to the network will be located at one bottom OpenStack instance.
|
||||||
|
If there are more than one pod in one AZ, refer to the dynamic pod binding[6].
|
||||||
|
|
||||||
|
There are lots of use cases where a tenant needs a network being able to be
|
||||||
|
spread out into multiple bottom OpenStack instances in one AZ or multiple AZs.
|
||||||
|
|
||||||
|
* Capacity expansion: tenants add VMs more and more, the capacity of one
|
||||||
|
OpenStack may not be enough, then a new OpenStack instance has to be added
|
||||||
|
to the cloud. But the tenant still wants to add new VMs into same network.
|
||||||
|
|
||||||
|
* Cross OpenStack network service chaining. Service chaining is based on
|
||||||
|
the port-pairs. Leveraging the cross pod L2 networking capability which
|
||||||
|
is provided by the Tricircle, the chaining could also be done by across sites.
|
||||||
|
For example, vRouter1 in pod1, but vRouter2 in pod2, these two VMs could be
|
||||||
|
chained.
|
||||||
|
|
||||||
|
* Applications are often required to run in different availability zones to
|
||||||
|
achieve high availability. Application needs to be designed as
|
||||||
|
Active-Standby/Active-Active/N-Way to achieve high availability, and some
|
||||||
|
components inside one application are designed to work as distributed
|
||||||
|
cluster, this design typically leads to state replication or heart
|
||||||
|
beat among application components (directly or via replicated database
|
||||||
|
services, or via private designed message format). When this kind of
|
||||||
|
applications are distributedly deployed into multiple OpenStack instances,
|
||||||
|
cross OpenStack L2 networking is needed to support heart beat
|
||||||
|
or state replication.
|
||||||
|
|
||||||
|
* When a tenant's VMs are provisioned in different OpenStack instances, there
|
||||||
|
is E-W (East-West) traffic for these VMs, the E-W traffic should be only
|
||||||
|
visible to the tenant, and isolation is needed. If the traffic goes through
|
||||||
|
N-S (North-South) via tenant level VPN, overhead is too much, and the
|
||||||
|
orchestration for multiple site to site VPN connection is also complicated.
|
||||||
|
Therefore cross OpenStack L2 networking to bridge the tenant's routers in
|
||||||
|
different OpenStack instances can provide more light weight isolation.
|
||||||
|
|
||||||
|
* In hybrid cloud, there is cross L2 networking requirement between the
|
||||||
|
private OpenStack and the public OpenStack. Cross pod L2 networking will
|
||||||
|
help the VMs migration in this case and it's not necessary to change the
|
||||||
|
IP/MAC/Security Group configuration during VM migration.
|
||||||
|
|
||||||
|
The spec[5] is to explain how one AZ can support more than one pod, and how
|
||||||
|
to schedule a proper pod during VM or Volume creation.
|
||||||
|
|
||||||
|
And this spec is to deal with the cross OpenStack L2 networking automation in
|
||||||
|
the Tricircle.
|
||||||
|
|
||||||
|
The simplest way to spread out L2 networking to multiple OpenStack instances
|
||||||
|
is to use same VLAN. But there is a lot of limitations: (1) A number of VLAN
|
||||||
|
segment is limited, (2) the VLAN network itself is not good to spread out
|
||||||
|
multiple sites, although you can use some gateways to do the same thing.
|
||||||
|
|
||||||
|
So flexible tenant level L2 networking across multiple OpenStack instances in
|
||||||
|
one site or in multiple sites is needed.
|
||||||
|
|
||||||
|
Proposed Change
|
||||||
|
===============
|
||||||
|
|
||||||
|
Cross pod L2 networking can be divided into three categories,
|
||||||
|
``Shared VLAN``, ``Shared VxLAN`` and ``Mixed VLAN/VxLAN``.
|
||||||
|
|
||||||
|
* Shared VLAN
|
||||||
|
|
||||||
|
Network in each bottom OpenStack is VLAN type and has the same VLAN ID.
|
||||||
|
If we want shared VLAN L2 networking to work in multi-site scenario, i.e.,
|
||||||
|
Multiple OpenStack instances in multiple sites, physical gateway needs to
|
||||||
|
be manually configured to make one VLAN networking be extended to other
|
||||||
|
sites.
|
||||||
|
|
||||||
|
*Manual setup physical gateway is out of the scope of this spec*
|
||||||
|
|
||||||
|
* Shared VxLAN
|
||||||
|
|
||||||
|
Network in each bottom OpenStack instance is VxLAN type and has the same
|
||||||
|
VxLAN ID.
|
||||||
|
|
||||||
|
Leverage L2GW[2][3] to implement this type of L2 networking.
|
||||||
|
|
||||||
|
* Mixed VLAN/VxLAN
|
||||||
|
|
||||||
|
Network in each bottom OpenStack instance may have different types and/or
|
||||||
|
have different segment IDs.
|
||||||
|
|
||||||
|
Leverage L2GW[2][3] to implement this type of L2 networking.
|
||||||
|
|
||||||
|
There is another network type called “Local Network”. For “Local Network”,
|
||||||
|
the network will be only presented in one bottom OpenStack instance. And the
|
||||||
|
network won't be presented in different bottom OpenStack instances. If a VM
|
||||||
|
in another pod tries to attach to the “Local Network”, it should be failed.
|
||||||
|
This use case is quite useful for the scenario in which cross pod L2
|
||||||
|
networking is not required, and one AZ will not include more than bottom
|
||||||
|
OpenStack instance.
|
||||||
|
|
||||||
|
Cross pod L2 networking will be able to be established dynamically during
|
||||||
|
tenant's VM is being provisioned.
|
||||||
|
|
||||||
|
There is assumption here that only one type of L2 networking will work in one
|
||||||
|
cloud deployment.
|
||||||
|
|
||||||
|
|
||||||
|
A Cross Pod L2 Networking Creation
|
||||||
|
------------------------------------
|
||||||
|
|
||||||
|
A cross pod L2 networking creation will be able to be done with the az_hint
|
||||||
|
attribute of the network. If az_hint includes one AZ or more AZs, the network
|
||||||
|
will be presented only in this AZ or these AZs, if no AZ in az_hint, it means
|
||||||
|
that the network can be extended to any bottom OpenStack.
|
||||||
|
|
||||||
|
There is a special use case for external network creation. For external
|
||||||
|
network creation, you need to specify the pod_id but not AZ in the az_hint
|
||||||
|
so that the external network will be only created in one specified pod per AZ.
|
||||||
|
|
||||||
|
*Support of External network in multiple OpenStack instances in one AZ
|
||||||
|
is out of scope of this spec.*
|
||||||
|
|
||||||
|
Pluggable L2 networking framework is proposed to deal with three types of
|
||||||
|
L2 cross pod networking, and it should be compatible with the
|
||||||
|
``Local Network``.
|
||||||
|
|
||||||
|
1. Type Driver under Tricircle Plugin in Neutron API server
|
||||||
|
|
||||||
|
* Type driver to distinguish different type of cross pod L2 networking. So
|
||||||
|
the Tricircle plugin need to load type driver according to the configuration.
|
||||||
|
The Tricircle can reuse the type driver of ML2 with update.
|
||||||
|
|
||||||
|
* Type driver to allocate VLAN segment id for shared VLAN L2 networking.
|
||||||
|
|
||||||
|
* Type driver to allocate VxLAN segment id for shared VxLAN L2 networking.
|
||||||
|
|
||||||
|
* Type driver for mixed VLAN/VxLAN to allocate VxLAN segment id for the
|
||||||
|
network connecting L2GWs[2][3].
|
||||||
|
|
||||||
|
* Type driver for Local Network only updating ``network_type`` for the
|
||||||
|
network to the Tricircle Neutron DB.
|
||||||
|
|
||||||
|
When a network creation request is received in Neutron API Server in the
|
||||||
|
Tricircle, the type driver will be called based on the configured network
|
||||||
|
type.
|
||||||
|
|
||||||
|
2. Nova API-GW to trigger the bottom networking automation
|
||||||
|
|
||||||
|
Nova API-GW can be aware of when a new VM is provisioned if boot VM api request
|
||||||
|
is received, therefore Nova API-GW is responsible for the network creation in
|
||||||
|
the bottom OpenStack instances.
|
||||||
|
|
||||||
|
Nova API-GW needs to get the network type from Neutron API server in the
|
||||||
|
Tricircle, and deal with the networking automation based on the network type:
|
||||||
|
|
||||||
|
* Shared VLAN
|
||||||
|
Nova API-GW creates network in bottom OpenStack instance in which the VM will
|
||||||
|
run with the VLAN segment id, network name and type that are retrieved from
|
||||||
|
the Neutron API server in the Tricircle.
|
||||||
|
|
||||||
|
* Shared VxLAN
|
||||||
|
Nova API-GW creates network in bottom OpenStack instance in which the VM will
|
||||||
|
run with the VxLAN segment id, network name and type which are retrieved from
|
||||||
|
Tricricle Neutron API server. After the network in the bottom OpenStack
|
||||||
|
instance is created successfully, Nova API-GW needs to make this network in the
|
||||||
|
bottom OpenStack instance as one of the segments in the network in the Tricircle.
|
||||||
|
|
||||||
|
* Mixed VLAN/VxLAN
|
||||||
|
Nova API-GW creates network in different bottom OpenStack instance in which the
|
||||||
|
VM will run with the VLAN or VxLAN segment id respectively, network name and type
|
||||||
|
which are retrieved from Tricricle Neutron API server. After the network in the
|
||||||
|
bottom OpenStack instances is created successfully, Nova API-GW needs to update
|
||||||
|
network in the Tricircle with the segmentation information of bottom netwoks.
|
||||||
|
|
||||||
|
3. L2GW driver under Tricircle Plugin in Neutron API server
|
||||||
|
|
||||||
|
Tricircle plugin needs to support multi-segment network extension[4].
|
||||||
|
|
||||||
|
For Shared VxLAN or Mixed VLAN/VxLAN L2 network type, L2GW driver will utilize the
|
||||||
|
multi-segment network extension in Neutron API server to build the L2 network in the
|
||||||
|
Tricircle. Each network in the bottom OpenStack instance will be a segment for the
|
||||||
|
whole cross pod L2 networking in the Tricircle.
|
||||||
|
|
||||||
|
After the network in the bottom OpenStack instance was created successfully, Nova
|
||||||
|
API-GW will call Neutron server API to update the network in the Tricircle with a
|
||||||
|
new segment from the network in the bottom OpenStack instance.
|
||||||
|
|
||||||
|
If the network in the bottom OpenStack instance was removed successfully, Nova
|
||||||
|
API-GW will call Neutron server api to remove the segment in the bottom OpenStack
|
||||||
|
instance from network in the Tricircle.
|
||||||
|
|
||||||
|
When L2GW driver under Tricircle plugin in Neutron API server receives the
|
||||||
|
segment update request, L2GW driver will start async job to orchestrate L2GW API
|
||||||
|
for L2 networking automation[2][3].
|
||||||
|
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
In database, we are considering setting physical_network in top OpenStack instance
|
||||||
|
as ``bottom_physical_network#bottom_pod_id`` to distinguish segmentation information
|
||||||
|
in different bottom OpenStack instance.
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Notifications impact
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
**Local Network Implementation**
|
||||||
|
|
||||||
|
For Local Network, L2GW is not required. In this scenario, no cross pod L2/L3
|
||||||
|
networking is required.
|
||||||
|
|
||||||
|
A user creates network ``Net1`` with single AZ1 in az_hint, the Tricircle plugin
|
||||||
|
checks the configuration, if ``tenant_network_type`` equals ``local_network``,
|
||||||
|
it will invoke Local Network type driver. Local Network driver under the
|
||||||
|
Tricircle plugin will update ``network_type`` in database.
|
||||||
|
|
||||||
|
For exmaple, a user creates VM1 in AZ1 which has only one pod ``POD1``, and
|
||||||
|
connects it to network ``Net1``. ``Nova API-GW`` will send network creation
|
||||||
|
request to ``POD1`` and the VM will be booted in AZ1 (There should be only one
|
||||||
|
pod in AZ1).
|
||||||
|
|
||||||
|
If a user wants to create VM2 in AZ2 or ``POD2`` in AZ1, and connect it to
|
||||||
|
network ``Net1`` in the Tricircle, it would be failed. Because the ``Net1`` is
|
||||||
|
local_network type network and it is limited to present in ``POD1`` in AZ1 only.
|
||||||
|
|
||||||
|
**Shared VLAN Implementation**
|
||||||
|
|
||||||
|
For Shared VLAN, L2GW is not required. This is the most simplest cross pod
|
||||||
|
L2 networking for limited scenario. For example, with a small number of
|
||||||
|
networks, all VLANs are extended through physical gateway to support cross
|
||||||
|
site VLAN networking, or all pods under same core switch with same visible
|
||||||
|
VLAN ranges that supported by the core switch are connected by the core
|
||||||
|
switch.
|
||||||
|
|
||||||
|
when a user creates network called ``Net1``, the Tricircle plugin checks the
|
||||||
|
configuration. If ``tenant_network_type`` equals ``shared_vlan``, the
|
||||||
|
Tricircle will invoke Shared VLAN type driver. Shared VLAN driver will
|
||||||
|
create ``segment``, and assign ``network_type`` with VLAN, update
|
||||||
|
``segment`` and ``network_type`` and ``physical_network`` with DB
|
||||||
|
|
||||||
|
A user creates VM1 in AZ1, and connects it to network Net1. If VM1 will be
|
||||||
|
booted in ``POD1``, ``Nova API-GW`` needs to get the network information and
|
||||||
|
send network creation message to ``POD1``. Network creation message includes
|
||||||
|
``network_type`` and ``segment`` and ``physical_network``.
|
||||||
|
|
||||||
|
Then the user creates VM2 in AZ2, and connects it to network Net1. If VM will
|
||||||
|
be booted in ``POD2``, ``Nova API-GW`` needs to get the network information and
|
||||||
|
send create network message to ``POD2``. Create network message includes
|
||||||
|
``network_type`` and ``segment`` and ``physical_network``.
|
||||||
|
|
||||||
|
**Shared VxLAN Implementation**
|
||||||
|
|
||||||
|
A user creates network ``Net1``, the Tricircle plugin checks the configuration, if
|
||||||
|
``tenant_network_type`` equals ``shared_vxlan``, it will invoke shared VxLAN
|
||||||
|
driver. Shared VxLAN driver will allocate ``segment``, and assign
|
||||||
|
``network_type`` with VxLAN, and update network with ``segment`` and
|
||||||
|
``network_type`` with DB
|
||||||
|
|
||||||
|
A user creates VM1 in AZ1, and connects it to network ``Net1``. If VM1 will be
|
||||||
|
booted in ``POD1``, ``Nova API-GW`` needs to get the network information and send
|
||||||
|
create network message to ``POD1``, create network message includes
|
||||||
|
``network_type`` and ``segment``.
|
||||||
|
|
||||||
|
``Nova API-GW`` should update ``Net1`` in Tricircle with the segment information
|
||||||
|
got by ``POD1``.
|
||||||
|
|
||||||
|
Then the user creates VM2 in AZ2, and connects it to network ``Net1``. If VM2 will
|
||||||
|
be booted in ``POD2``, ``Nova API-GW`` needs to get the network information and
|
||||||
|
send network creation massage to ``POD2``, network creation message includes
|
||||||
|
``network_type`` and ``segment``.
|
||||||
|
|
||||||
|
``Nova API-GW`` should update ``Net1`` in the Tricircle with the segment information
|
||||||
|
get by ``POD2``.
|
||||||
|
|
||||||
|
The Tricircle plugin detects that the network includes more than one segment
|
||||||
|
network, calls L2GW driver to start async job for cross pod networking for
|
||||||
|
``Net1``. The L2GW driver will create L2GW1 in ``POD1`` and L2GW2 in ``POD2``. In
|
||||||
|
``POD1``, L2GW1 will connect the local ``Net1`` and create L2GW remote connection
|
||||||
|
to L2GW2, then populate the information of MAC/IP which resides in L2GW1. In
|
||||||
|
``POD2``, L2GW2 will connect the local ``Net1`` and create L2GW remote connection
|
||||||
|
to L2GW1, then populate remote MAC/IP information which resides in ``POD1`` in L2GW2.
|
||||||
|
|
||||||
|
L2GW driver in the Tricircle will also detect the new port creation/deletion API
|
||||||
|
request. If port (MAC/IP) created or deleted in ``POD1`` or ``POD2``, it needs to
|
||||||
|
refresh the L2GW2 MAC/IP information.
|
||||||
|
|
||||||
|
Whether to populate the information of port (MAC/IP) should be configurable according
|
||||||
|
to L2GW capability. And only populate MAC/IP information for the ports that are not
|
||||||
|
resides in the same pod.
|
||||||
|
|
||||||
|
**Mixed VLAN/VxLAN**
|
||||||
|
|
||||||
|
To achieve cross pod L2 networking, L2GW will be used to connect L2 network in
|
||||||
|
different pods, using L2GW should work for Shared VxLAN and Mixed VLAN/VxLAN
|
||||||
|
scenario.
|
||||||
|
|
||||||
|
When L2GW connected with local network in the same OpenStack instance, no
|
||||||
|
matter it's VLAN or VxLAN or GRE, the L2GW should be able to connect the
|
||||||
|
local network, and because L2GW is extension of Neutron, only network
|
||||||
|
UUID should be enough for L2GW to connect the local network.
|
||||||
|
|
||||||
|
When admin user creates network in Tricircle, he/she specifies the network
|
||||||
|
type as one of the network type as discussed above. In the phase of creating
|
||||||
|
network in Tricircle, only one record is saved in the database, no network
|
||||||
|
will be created in bottom OpenStack.
|
||||||
|
|
||||||
|
After the network in the bottom created successfully, need to retrieve the
|
||||||
|
network information like segment id, network name and network type, and make
|
||||||
|
this network in the bottom pod as one of the segments in the network in
|
||||||
|
Tricircle.
|
||||||
|
|
||||||
|
In the Tricircle, network could be created by tenant or admin. For tenant, no way
|
||||||
|
to specify the network type and segment id, then default network type will
|
||||||
|
be used instead. When user uses the network to boot a VM, ``Nova API-GW``
|
||||||
|
checks the network type. For Mixed VLAN/VxLAN network, ``Nova API-GW`` first
|
||||||
|
creates network in bottom OpenStack without specifying network type and segment
|
||||||
|
ID, then updates the top network with bottom network segmentation information
|
||||||
|
returned by bottom OpenStack.
|
||||||
|
|
||||||
|
A user creates network ``Net1``, plugin checks the configuration, if
|
||||||
|
``tenant_network_type`` equals ``mixed_vlan_vxlan``, it will invoke mixed VLAN
|
||||||
|
and VxLAN driver. The driver needs to do nothing since segment is allocated
|
||||||
|
in bottom.
|
||||||
|
|
||||||
|
A user creates VM1 in AZ1, and connects it to the network ``Net1``, the VM is
|
||||||
|
booted in bottom ``POD1``, and ``Nova API-GW`` creates network in ``POD1`` and
|
||||||
|
queries the network detail segmentation information (using admin role), and
|
||||||
|
gets network type, segment id, then updates this new segment to the ``Net1``
|
||||||
|
in Tricircle ``Neutron API Server``.
|
||||||
|
|
||||||
|
Then the user creates another VM2, and with AZ info AZ2, then the VM should be
|
||||||
|
able to be booted in bottom ``POD2`` which is located in AZ2. And when VM2 should
|
||||||
|
be able to be booted in AZ2, ``Nova API-GW`` also creates a network in ``POD2``,
|
||||||
|
and queries the network information including segment and network type,
|
||||||
|
updates this new segment to the ``Net1`` in Tricircle ``Neutron API Server``.
|
||||||
|
|
||||||
|
The Tricircle plugin detects that the ``Net1`` includes more than one network
|
||||||
|
segments, calls L2GW driver to start async job for cross pod networking for
|
||||||
|
``Net1``. The L2GW driver will create L2GW1 in ``POD1`` and L2GW2 in ``POD2``. In
|
||||||
|
``POD1``, L2GW1 will connect the local ``Net1`` and create L2GW remote connection
|
||||||
|
to L2GW2, then populate information of MAC/IP which resides in ``POD2`` in L2GW1.
|
||||||
|
In ``POD2``, L2GW2 will connect the local ``Net1`` and create L2GW remote connection
|
||||||
|
to L2GW1, then populate remote MAC/IP information which resides in ``POD1`` in L2GW2.
|
||||||
|
|
||||||
|
L2GW driver in Tricircle will also detect the new port creation/deletion api
|
||||||
|
calling, if port (MAC/IP) created or deleted in ``POD1``, then needs to refresh
|
||||||
|
the L2GW2 MAC/IP information. If port (MAC/IP) created or deleted in ``POD2``,
|
||||||
|
then needs to refresh the L2GW1 MAC/IP information,
|
||||||
|
|
||||||
|
Whether to populate MAC/IP information should be configurable according to
|
||||||
|
L2GW capability. And only populate MAC/IP information for the ports that are
|
||||||
|
not resides in the same pod.
|
||||||
|
|
||||||
|
**L3 bridge network**
|
||||||
|
|
||||||
|
Current implementation without cross pod L2 networking.
|
||||||
|
|
||||||
|
* A special bridge network is created and connected to the routers in
|
||||||
|
different bottom OpenStack instances. We configure the extra routes of the routers
|
||||||
|
to route the packets from one OpenStack to another. In current
|
||||||
|
implementation, we create this special bridge network in each bottom
|
||||||
|
OpenStack with the same ``VLAN ID``, so we have an L2 network to connect
|
||||||
|
the routers.
|
||||||
|
|
||||||
|
Difference between L2 networking for tenant's VM and for L3 bridging network.
|
||||||
|
|
||||||
|
* The creation of bridge network is triggered during attaching router
|
||||||
|
interface and adding router external gateway.
|
||||||
|
|
||||||
|
* The L2 network for VM is triggered by ``Nova API-GW`` when a VM is to be
|
||||||
|
created in one pod, and finds that there is no network, then the network
|
||||||
|
will be created before the VM is booted, network or port parameter is
|
||||||
|
required to boot VM. The IP/Mac for VM is allocated in the ``Tricircle``,
|
||||||
|
top layer to avoid IP/mac collision if they are allocated separately in
|
||||||
|
bottom pods.
|
||||||
|
|
||||||
|
After cross pod L2 networking is introduced, the L3 bridge network should
|
||||||
|
be updated too.
|
||||||
|
|
||||||
|
L3 bridge network N-S (North-South):
|
||||||
|
|
||||||
|
* For each tenant, one cross pod N-S bridge network should be created for router
|
||||||
|
N-S inter-connection. Just replace the current shared VLAN N-S bridge network
|
||||||
|
to corresponding Shared VxLAN or Mixed VLAN/VxLAN.
|
||||||
|
|
||||||
|
L3 bridge network E-W (East-West):
|
||||||
|
|
||||||
|
* When attaching router interface happened, for Shared VLAN, it will keep
|
||||||
|
current process to establish E-W bridge network. For Shared VxLAN and Mixed
|
||||||
|
VLAN/VxLAN, if a L2 network is able to expand to the current pod, then just
|
||||||
|
expand the L2 network to the pod, all E-W traffic will go out from local L2
|
||||||
|
network, then no bridge network is needed.
|
||||||
|
|
||||||
|
* For example, (Net1, Router1) in ``Pod1``, (Net2, Router1) in ``Pod2``, if
|
||||||
|
``Net1`` is a cross pod L2 network, and can be expanded to Pod2, then will just
|
||||||
|
expand ``Net1`` to Pod2. After the ``Net1`` expansion ( just like cross pod L2 networking
|
||||||
|
to spread one network in multiple pods ), it’ll look like (Net1, Router1)
|
||||||
|
in ``Pod1``, (Net1, Net2, Router1) in ``Pod2``, In ``Pod2``, no VM in ``Net1``, only for
|
||||||
|
E-W traffic. Now the E-W traffic will look like this:
|
||||||
|
|
||||||
|
from Net2 to Net1:
|
||||||
|
|
||||||
|
Net2 in Pod2 -> Router1 in Pod2 -> Net1 in Pod2 -> L2GW in Pod2 ---> L2GW in
|
||||||
|
Pod1 -> Net1 in Pod1.
|
||||||
|
|
||||||
|
Note: The traffic for ``Net1`` in ``Pod2`` to ``Net1`` in ``Pod1`` can bypass the L2GW in
|
||||||
|
``Pod2``, that means outbound traffic can bypass the local L2GW if the remote VTEP of
|
||||||
|
L2GW is known to the local compute node and the packet from the local compute
|
||||||
|
node with VxLAN encapsulation cloud be routed to remote L2GW directly. It's up
|
||||||
|
to the L2GW implementation. With the inbound traffic through L2GW, the inbound
|
||||||
|
traffic to the VM will not be impacted by the VM migration from one host to
|
||||||
|
another.
|
||||||
|
|
||||||
|
If ``Net2`` is a cross pod L2 network, and can be expanded to ``Pod1`` too, then will
|
||||||
|
just expand ``Net2`` to ``Pod1``. After the ``Net2`` expansion(just like cross pod L2
|
||||||
|
networking to spread one network in multiple pods ), it’ll look like (Net2,
|
||||||
|
Net1, Router1) in ``Pod1``, (Net1, Net2, Router1) in ``Pod2``, In ``Pod1``, no VM in
|
||||||
|
Net2, only for E-W traffic. Now the E-W traffic will look like this:
|
||||||
|
from ``Net1`` to ``Net2``:
|
||||||
|
|
||||||
|
Net1 in Pod1 -> Router1 in Pod1 -> Net2 in Pod1 -> L2GW in Pod1 ---> L2GW in
|
||||||
|
Pod2 -> Net2 in Pod2.
|
||||||
|
|
||||||
|
To limit the complexity, one network’s az_hint can only be specified when
|
||||||
|
creating, and no update is allowed, if az_hint need to be updated, you have
|
||||||
|
to delete the network and create again.
|
||||||
|
|
||||||
|
If the network can’t be expanded, then E-W bridge network is needed. For
|
||||||
|
example, Net1(AZ1, AZ2,AZ3), Router1; Net2(AZ4, AZ5, AZ6), Router1.
|
||||||
|
Then a cross pod L2 bridge network has to be established:
|
||||||
|
|
||||||
|
Net1(AZ1, AZ2, AZ3), Router1 --> E-W bridge network ---> Router1,
|
||||||
|
Net2(AZ4, AZ5, AZ6).
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
------------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
|
||||||
|
|
||||||
|
Other contributors:
|
||||||
|
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
------------
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
[1] https://docs.google.com/document/d/18kZZ1snMOCD9IQvUKI5NVDzSASpw-QKj7l2zNqMEd3g/
|
||||||
|
|
||||||
|
[2] https://review.openstack.org/#/c/270786/
|
||||||
|
|
||||||
|
[3] https://github.com/openstack/networking-l2gw/blob/master/specs/kilo/l2-gateway-api.rst
|
||||||
|
|
||||||
|
[4] http://developer.openstack.org/api-ref-networking-v2-ext.html#networks-multi-provider-ext
|
||||||
|
|
||||||
|
[5] http://docs.openstack.org/mitaka/networking-guide/adv-config-availability-zone.html
|
||||||
|
|
||||||
|
[6] https://review.openstack.org/#/c/306224/
|
Loading…
Reference in New Issue
Block a user