From 3fe2242707044c6ca8ce748f92a6e5f63cf51c96 Mon Sep 17 00:00:00 2001 From: Jakub Jursa Date: Mon, 23 Apr 2018 11:30:19 +0200 Subject: [PATCH] [docs] Add Open vSwitch Hardware Offloading (ASAP^2) Deployment Scenario This patch adds documentation that can assist an operator in deploying ASAP^2-accelerated[1] Open vSwitch - AKA Open vSwitch Hardware Offloading[2]. This feature requires Mellanox ConnectX-4 Lx or ConnectX-5 NICs, and can be deployed using existing OVS and SR-IOV-related tasks in OpenStack-Ansible. [1] http://www.mellanox.com/page/asap2?mtag=asap2 [2] https://docs.openstack.org/neutron/latest/admin/config-ovs-offload.html Co-Authored-By: James Denton Change-Id: I0c93aca73f2b809ff1e525f75496982942ea3785 --- doc/source/app-openvswitch-asap.rst | 317 ++++++++++++++++++++++++++++ doc/source/index.rst | 1 + 2 files changed, 318 insertions(+) create mode 100644 doc/source/app-openvswitch-asap.rst diff --git a/doc/source/app-openvswitch-asap.rst b/doc/source/app-openvswitch-asap.rst new file mode 100644 index 00000000..b8afd7f3 --- /dev/null +++ b/doc/source/app-openvswitch-asap.rst @@ -0,0 +1,317 @@ +============================================================ +Scenario - Using Open vSwitch w/ ASAP :sup:`2` (Direct Mode) +============================================================ + +Overview +~~~~~~~~ + +With appropriate hardware, operators can choose to utilize +ASAP :sup:`2`-accelerated Open vSwitch instead of unaccelerated Open vSwitch +for the Neutron virtual network infrastructure. ASAP :sup:`2` technology +offloads packet processing onto hardware built into the NIC rather than using +the CPU of the host. It requires careful consideration and planning before +implementing. This document outlines how to set it up in your environment. + +.. note:: + + ASAP :sup:`2` is a proprietary feature provided with certain Mellanox NICs, + including the ConnectX-4 Lx and ConnectX-5. Future support is not + guaranteed. This feature is considered *EXPERIMENTAL* and should not + be used for production workloads. There is no guarantee of upgradability + or backwards compatibility. + +.. note:: + + Hardware offloading is not compatible with the ``openvswitch`` firewall + driver. To ensure flows are offloaded, port security must be disabled. + Information on disabling port security is discussed later in this document. + +Recommended reading +~~~~~~~~~~~~~~~~~~~ + +This guide is a variation of the standard Open vSwitch and SR-IOV deployment +guides available at: + +* ``_ + +* ``_ + +The following resources may also be helpful: + +* ``_ + +* ``_ + +* ``_ + +Prerequisites +~~~~~~~~~~~~~ + +To enable SR-IOV and PCI passthrough capabilities on a Linux platform, +ensure that VT-d/VT-x are enabled for Intel processors and AMD-V/AMD-Vi are +enabled for AMD processors. Such features are typically enabled in the BIOS. + +On an Intel platform, the following kernel parameters are required and can be +added to the GRUB configuration: + +.. code-block:: console + + GRUB_CMDLINE_LINUX="... iommu=pt intel_iommu=on" + +On an AMD platform, use these parameters instead: + +.. code-block:: console + + GRUB_CMDLINE_LINUX="... iommu=pt amd_iommu=on" + +Update GRUB and reboot the host(s). + +SR-IOV provides virtual functions (VFs) that can be presented to instances as +network interfaces and are used in lieu of tuntap interfaces. Configuration +of VFs is outside the scope of this guide. The following links may be helpful: + +* ``_ + +* ``_ + +Deployment +~~~~~~~~~~ + +Configure your networking according the Open vSwitch implementation docs: + +* `Scenario - Using Open vSwitch + `_ + +.. note:: + + At this time, only a single (non-bonded) interface is supported. + +An example provider network configuration has been provided below: + +.. code-block:: console + + - network: + container_bridge: "br-provider" + container_type: "veth" + type: "vlan" + range: "700:709" + net_name: "physnet1" + network_interface: "ens4f0" + group_binds: + - neutron_openvswitch_agent + +Add a ``nova_pci_passthrough_whitelist`` entry to ``user_variables.yml``, where +``devname`` is the name of the interface connected to the provider bridge and +``physical_network`` is the name of the provider network. + +.. code-block:: console + + nova_pci_passthrough_whitelist: '{"devname":"ens4f0","physical_network":"physnet1"}' + +.. note:: + + In the respective network block configured in ``openstack_user_config.yml``, + ``devname`` corresponds to ``network_interface`` and ``physical_network`` + corresponds to ``net_name``. + +To enable the ``openvswitch`` firewall driver rather than the default +``iptables_hybrid`` firewall driver, add the following overrides to +``user_variables.yml``: + +.. code-block:: console + + neutron_ml2_conf_ini_overrides: + securitygroup: + firewall_driver: openvswitch + neutron_openvswitch_agent_ini_overrides: + securitygroup: + firewall_driver: openvswitch + +.. note:: + + Hardware-offloaded flows are **not** activated for ports utilizing security + groups or port security. Be sure to disable port security *and* security + groups on individual ports or networks when hardware offloading is required. + +Once the OpenStack cluster is configured, start the OpenStack deployment as +listed in the OpenStack-Ansible Install guide by running all playbooks in +sequence on the deployment host. + +Post-Deployment +~~~~~~~~~~~~~~~ + +Once the deployment is complete, create the VFs that will be used for SR-IOV. +In this example, the physical function (PF) is ``ens4f0``. It will +simultaneously be connected to the Neutron provider bridge ``br-provider``. + +1. On each compute node, determine the maximum number of VFs a PF can support: + +.. code-block:: console + + # cat /sys/class/net/ens4f0/device/sriov_totalvfs + +.. note:: + + To adjust ``sriov_totalvfs`` please refer to Mellanox documentation. + +2. On each compute node, create the VFs: + +.. code-block:: console + + # echo '8' > /sys/class/net/ens4f0/device/sriov_numvfs + +Configure Open vSwitch hardware offloading +------------------------------------------ + +1. Unbind the VFs from the Mellanox driver: + +.. code-block:: console + + # for vf in `ls -ld /sys/class/net/ens4f0/device/virt* | cut -f 11 -d ' ' | cut -b 4-` + do + echo $vf > /sys/bus/pci/drivers/mlx5_core/unbind + done + +2. Enable the switch in the NIC: + +.. code-block:: console + + # PCI_ADDR=`grep PCI_SLOT_NAME /sys/class/net/ens4f0/device/uevent | sed 's:.*PCI_SLOT_NAME=::'` + # devlink dev eswitch set pci/$PCI_ADDR mode switchdev + +3. Enable hardware offload filters with TC: + +.. code-block:: console + + # ethtool -K ens4f0 hw-tc-offload on + +4. Rebind the VFs to the Mellanox driver: + +.. code-block:: console + + # for vf in `ls -ld /sys/class/net/ens4f0/device/virt* | cut -f 11 -d ' ' | cut -b 4-` + do + echo $vf > /sys/bus/pci/drivers/mlx5_core/bind + done + +5. Enable hardware offloading in OVS: + +.. code-block:: console + + # ovs-vsctl set Open_vSwitch . other_config:hw-offload=true + # ovs-vsctl set Open_vSwitch . other_config:max-idle=30000 + +6. Restart Open vSwitch + +.. code-block:: console + + # systemctl restart openvswitch-switch + +7. Restart the Open vSwitch agent + +.. code-block:: console + + # systemctl restart neutron-openvswitch-agent + +8. Restart the Nova compute service + +.. code-block:: console + + # systemctl restart nova-compute + +.. warning:: + + Changes to ``sriov_numvfs`` as well as the built-in NIC switch will not + persist a reboot and must be performed every time the server is started. + +Verify operation +~~~~~~~~~~~~~~~~ + +To verify operation of hardware-offloaded Open vSwitch, you must create +a virtual machine instance using an image with the proper network drivers. + +The following images are known to contain working drivers: + +* `Fedora 24 `_ + +* `Ubuntu 18.04 LTS (Bionic) `_ + +* `Centos 7 (1901) `_ + +Before creating an instance, a Neutron port must be created that has the +following characteristics: + +:code:`--vnic-type direct --binding-profile '{"capabilities": ["switchdev"]}'` + +To ensure flows are offloaded, disable port security with the +``--disable-port-security`` argument. + +An example of the full command can be seen here: + +.. code-block:: console + + # openstack port create \ + --network \ + --vnic-type direct --binding-profile '{"capabilities": ["switchdev"]}' \ + --disable-port-security \ + + +The port can then be attached to the instance at boot. Once booted, the port +will be updated to reflect the PCI address of the corresponding virtual +function: + +.. code-block:: console + + root@aio1-utility-container-8c0b0916:~# openstack port show -c binding_profile testport2 + +-----------------+------------------------------------------------------------------------------------------------------------------+ + | Field | Value | + +-----------------+------------------------------------------------------------------------------------------------------------------+ + | binding_profile | capabilities='[u'switchdev']', pci_slot='0000:21:00.6', pci_vendor_info='15b3:1016', physical_network='physnet1' | + +-----------------+------------------------------------------------------------------------------------------------------------------+ + +Observing traffic +----------------- + +From the compute node, perform a packet capture on the representor port +that corresponds to the virtual function attached to the instance. In this +example, the interface is ``eth1``. + +.. code-block:: console + + root@compute1:~# tcpdump -nnn -i eth1 icmp + tcpdump: verbose output suppressed, use -v or -vv for full protocol decode + listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes + +Perform a ping from another host and observe the traffic at the representor +port: + +.. code-block:: console + + root@infra2:~# ping 192.168.88.151 -c5 + PING 192.168.88.151 (192.168.88.151) 56(84) bytes of data. + 64 bytes from 192.168.88.151: icmp_seq=1 ttl=64 time=48.3 ms + 64 bytes from 192.168.88.151: icmp_seq=2 ttl=64 time=1.52 ms + 64 bytes from 192.168.88.151: icmp_seq=3 ttl=64 time=0.586 ms + 64 bytes from 192.168.88.151: icmp_seq=4 ttl=64 time=0.688 ms + 64 bytes from 192.168.88.151: icmp_seq=5 ttl=64 time=0.775 ms + + --- 192.168.88.151 ping statistics --- + 5 packets transmitted, 5 received, 0% packet loss, time 4045ms + rtt min/avg/max/mdev = 0.586/10.381/48.335/18.979 ms + + root@compute1:~# tcpdump -nnn -i eth1 icmp + tcpdump: verbose output suppressed, use -v or -vv for full protocol decode + listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes + 19:51:09.684957 IP 192.168.88.254 > 192.168.88.151: ICMP echo request, id 11168, seq 1, length 64 + 19:51:09.685448 IP 192.168.88.151 > 192.168.88.254: ICMP echo reply, id 11168, seq 1, length 64 + +When offloading is handled in the NIC, only the first packet(s) of the +flow will be visible in the packet capture. + +The following command can be used to dump flows in the kernel datapath: + +:code:`# ovs-dpctl dump-flows type=ovs` + +The following command can be used to dump flows that are offloaded: + +:code:`# ovs-dpctl dump-flows type=offloaded` diff --git a/doc/source/index.rst b/doc/source/index.rst index ac0e36fd..e6bf2eb5 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -7,6 +7,7 @@ Neutron role for OpenStack-Ansible configure-network-services.rst app-openvswitch.rst + app-openvswitch-asap.rst app-ovn.rst app-nuage.rst app-calico.rst