This patch makes current the OVS+ASAP^2 integration documentation. Change-Id: I3d2ddf0f4e5074d1d6ffa5db21fe2b61a61beaff
11 KiB
Scenario - Using Open vSwitch w/ ASAP 2 (Direct Mode)
Overview
With appropriate hardware, operators can choose to utilize ASAP 2-accelerated Open vSwitch instead of unaccelerated Open vSwitch for the Neutron virtual network infrastructure. ASAP 2 technology offloads packet processing onto hardware built into the NIC rather than using the CPU of the host. It requires careful consideration and planning before implementing. This document outlines how to set it up in your environment.
Note
ASAP 2 is a proprietary feature provided with certain Mellanox NICs, including the ConnectX-5 and ConnectX-6. Future support is not guaranteed. This feature is considered EXPERIMENTAL and should not be used for production workloads. There is no guarantee of upgradability or backwards compatibility.
Note
Hardware offloading is not yet compatible with the
openvswitch
firewall driver. To ensure flows are offloaded,
port security must be disabled. Information on disabling port security
is discussed later in this document.
Recommended reading
This guide is a variation of the standard Open vSwitch and SR-IOV deployment guides available at:
- https://docs.openstack.org/openstack-ansible-os_neutron/latest/app-openvswitch.html
- https://docs.openstack.org/openstack-ansible-os_neutron/latest/configure-network-services.html#sr-iov-support-optional
The following resources may also be helpful:
- https://docs.openstack.org/neutron/latest/admin/config-sriov.html
- https://docs.openstack.org/neutron/latest/admin/config-ovs-offload.html
- https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf
- https://docs.nvidia.com/networking/pages/viewpage.action?pageId=61869597
Prerequisites
To enable SR-IOV and PCI passthrough capabilities on a Linux platform, ensure that VT-d/VT-x are enabled for Intel processors and AMD-V/AMD-Vi are enabled for AMD processors. Such features are typically enabled in the BIOS.
On an Intel platform, the following kernel parameters are required and can be added to the GRUB configuration:
GRUB_CMDLINE_LINUX="... iommu=pt intel_iommu=on"
On an AMD platform, use these parameters instead:
GRUB_CMDLINE_LINUX="... iommu=pt amd_iommu=on"
Update GRUB and reboot the host(s).
SR-IOV provides virtual functions (VFs) that can be presented to instances as network interfaces and are used in lieu of tuntap interfaces. Configuration of VFs is outside the scope of this guide. The following links may be helpful:
- https://community.mellanox.com/s/article/getting-started-with-mellanox-asap-2
- https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x
Deployment
Configure your networking according the Open vSwitch implementation docs:
Note
At this time, only a single (non-bonded) interface is supported.
An example provider network configuration has been provided below:
- network:
container_bridge: "br-provider"
container_type: "veth"
type: "vlan"
range: "700:709"
net_name: "physnet1"
network_interface: "ens4f0"
group_binds:
- neutron_openvswitch_agent
Add a nova_pci_passthrough_whitelist
entry to
user_variables.yml
, where devname
is the name
of the interface connected to the provider bridge and
physical_network
is the name of the provider network.
nova_pci_passthrough_whitelist: '{"devname":"ens4f0","physical_network":"physnet1"}'
Note
In the respective network block configured in
openstack_user_config.yml
, devname
corresponds
to network_interface
and physical_network
corresponds to net_name
.
To enable the openvswitch
firewall driver rather than
the default iptables_hybrid
firewall driver, add the
following overrides to user_variables.yml
:
neutron_ml2_conf_ini_overrides:
securitygroup:
firewall_driver: openvswitch
neutron_openvswitch_agent_ini_overrides:
securitygroup:
firewall_driver: openvswitch
Note
Hardware-offloaded flows are not activated for ports utilizing security groups or port security. Be sure to disable port security and security groups on individual ports or networks when hardware offloading is required.
Once the OpenStack cluster is configured, start the OpenStack deployment as listed in the OpenStack-Ansible Install guide by running all playbooks in sequence on the deployment host.
Post-Deployment
Once the deployment is complete, create the VFs that will be used for
SR-IOV. In this example, the physical function (PF) is
ens4f0
. It will simultaneously be connected to the Neutron
provider bridge br-provider
.
- On each compute node, determine the maximum number of VFs a PF can support:
# cat /sys/class/net/ens4f0/device/sriov_totalvfs
Note
To adjust sriov_totalvfs
please refer to Mellanox
documentation.
- On each compute node, create the VFs:
# echo '8' > /sys/class/net/ens4f0/device/sriov_numvfs
Configure Open vSwitch hardware offloading
- Unbind the VFs from the Mellanox driver:
# for vf in `grep PCI_SLOT_NAME /sys/class/net/ens4f0/device/virtfn*/uevent | cut -d'=' -f2`
do
echo $vf > /sys/bus/pci/drivers/mlx5_core/unbind
done
- Enable the switch in the NIC:
# PCI_ADDR=`grep PCI_SLOT_NAME /sys/class/net/ens4f0/device/uevent | sed 's:.*PCI_SLOT_NAME=::'`
# devlink dev eswitch set pci/$PCI_ADDR mode switchdev
- Enable hardware offload filters with TC:
# ethtool -K ens4f0 hw-tc-offload on
- Rebind the VFs to the Mellanox driver:
# for vf in `grep PCI_SLOT_NAME /sys/class/net/ens4f0/device/virtfn*/uevent | cut -d'=' -f2`
do
echo $vf > /sys/bus/pci/drivers/mlx5_core/bind
done
- Enable hardware offloading in OVS:
# ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
# ovs-vsctl set Open_vSwitch . other_config:max-idle=30000
- Restart Open vSwitch
# systemctl restart openvswitch-switch
- Restart the Open vSwitch agent
# systemctl restart neutron-openvswitch-agent
- Restart the Nova compute service
# systemctl restart nova-compute
Warning
Changes to sriov_numvfs
as well as the built-in NIC
switch will not persist a reboot and must be performed every time the
server is started.
Verify operation
To verify operation of hardware-offloaded Open vSwitch, you must create a virtual machine instance using an image with the proper network drivers.
The following images are known to contain working drivers:
Before creating an instance, a Neutron port must be created that has the following characteristics:
--vnic-type direct --binding-profile '{"capabilities": ["switchdev"]}'
To ensure flows are offloaded, disable port security with the
--disable-port-security
argument.
An example of the full command can be seen here:
# openstack port create \
--network <network> \
--vnic-type direct --binding-profile '{"capabilities": ["switchdev"]}' \
--disable-port-security \
<name>
The port can then be attached to the instance at boot. Once booted, the port will be updated to reflect the PCI address of the corresponding virtual function:
root@aio1-utility-container-8c0b0916:~# openstack port show -c binding_profile testport2
+-----------------+------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-----------------+------------------------------------------------------------------------------------------------------------------+
| binding_profile | capabilities='[u'switchdev']', pci_slot='0000:21:00.6', pci_vendor_info='15b3:1016', physical_network='physnet1' |
+-----------------+------------------------------------------------------------------------------------------------------------------+
Observing traffic
From the compute node, perform a packet capture on the representor
port that corresponds to the virtual function attached to the instance.
In this example, the interface is eth1
.
root@compute1:~# tcpdump -nnn -i eth1 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
Perform a ping from another host and observe the traffic at the representor port:
root@infra2:~# ping 192.168.88.151 -c5
PING 192.168.88.151 (192.168.88.151) 56(84) bytes of data.
64 bytes from 192.168.88.151: icmp_seq=1 ttl=64 time=48.3 ms
64 bytes from 192.168.88.151: icmp_seq=2 ttl=64 time=1.52 ms
64 bytes from 192.168.88.151: icmp_seq=3 ttl=64 time=0.586 ms
64 bytes from 192.168.88.151: icmp_seq=4 ttl=64 time=0.688 ms
64 bytes from 192.168.88.151: icmp_seq=5 ttl=64 time=0.775 ms
--- 192.168.88.151 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4045ms
rtt min/avg/max/mdev = 0.586/10.381/48.335/18.979 ms
root@compute1:~# tcpdump -nnn -i eth1 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
19:51:09.684957 IP 192.168.88.254 > 192.168.88.151: ICMP echo request, id 11168, seq 1, length 64
19:51:09.685448 IP 192.168.88.151 > 192.168.88.254: ICMP echo reply, id 11168, seq 1, length 64
When offloading is handled in the NIC, only the first packet(s) of the flow will be visible in the packet capture.
The following command can be used to dump flows in the kernel datapath:
# ovs-appctl dpctl/dump-flows type=ovs
The following command can be used to dump flows that are offloaded:
# ovs-appctl dpctl/dump-flows type=offloaded