diff --git a/doc/source/index.rst b/doc/source/index.rst
index 839de9c694..8f045cfb18 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -86,6 +86,7 @@ Administrator Documentation
     admin_guide
     replication_network
     logs
+    ops_runbook/index
 
 Object Storage v1 REST API Documentation
 ========================================
diff --git a/doc/source/ops_runbook/diagnose.rst b/doc/source/ops_runbook/diagnose.rst
new file mode 100644
index 0000000000..d34b38c52b
--- /dev/null
+++ b/doc/source/ops_runbook/diagnose.rst
@@ -0,0 +1,1031 @@
+==================================
+Identifying issues and resolutions
+==================================
+
+Diagnose: General approach
+--------------------------
+
+-  Look at service status in your monitoring system.
+
+-  In addition to system monitoring tools and issue logging by users,
+   swift errors will often result in log entries in the ``/var/log/swift``
+   files: ``proxy.log``, ``server.log`` and ``background.log`` (see:``Swift
+   logs``).
+
+-  Look at any logs your deployment tool produces.
+
+-  Log files should be reviewed for error signatures (see below) that
+   may point to a known issue, or root cause issues reported by the
+   diagnostics tools, prior to escalation.
+
+Dependencies
+^^^^^^^^^^^^
+
+The Swift software is dependent on overall system health. Operating
+system level issues with network connectivity, domain name resolution,
+user management, hardware and system configuration and capacity in terms
+of memory and free disk space, may result is secondary Swift issues.
+System level issues should be resolved prior to diagnosis of swift
+issues.
+
+
+Diagnose: Swift-dispersion-report
+---------------------------------
+
+The swift-dispersion-report is a useful tool to gauge the general
+health of the system. Configure the ``swift-dispersion`` report for
+100% coverage. The dispersion report regularly monitors
+these and gives a report of the amount of objects/containers are still
+available as well as how many copies of them are also there.
+
+The dispersion-report output is logged on the first proxy of the first
+AZ or each system (proxy with the monitoring role) under
+``/var/log/swift/swift-dispersion-report.log``.
+
+Diagnose: Is swift running?
+---------------------------
+
+When you want to establish if a swift endpoint is running, run ``curl -k``
+against either: https://*[REPLACEABLE]*./healthcheck OR
+https:*[REPLACEABLE]*.crossdomain.xml
+
+
+Diagnose: Interpreting messages in ``/var/log/swift/`` files
+------------------------------------------------------------
+
+.. note::
+
+   In the Hewlett Packard Enterprise Helion Public Cloud we send logs to
+   ``proxy.log`` (proxy-server logs), ``server.log`` (object-server,
+   account-server, container-server logs), ``background.log`` (all
+   other servers [object-replicator, etc]).
+
+The following table lists known issues:
+
+.. list-table::
+   :widths: 25 25 25 25
+   :header-rows: 1
+
+   * - **Logfile**
+     - **Signature**
+     - **Issue**
+     - **Steps to take**
+   * - /var/log/syslog
+     - kernel: [] hpsa .... .... .... has check condition: unknown type:
+       Sense: 0x5, ASC: 0x20, ASC Q: 0x0 ....
+     - An unsupported command was issued to the storage hardware
+     - Understood to be a benign monitoring issue, ignore
+   * - /var/log/syslog
+     - kernel: [] sd .... [csbu:sd...] Sense Key: Medium Error
+     - Suggests disk surface issues
+     - Run swift diagnostics on the target node to check for disk errors,
+       repair disk errors
+   * - /var/log/syslog
+     - kernel: [] sd .... [csbu:sd...] Sense Key: Hardware Error
+     - Suggests storage hardware issues
+     - Run swift diagnostics on the target node to check for disk failures,
+       replace failed disks
+   * - /var/log/syslog
+     - kernel: [] .... I/O error, dev sd.... ,sector ....
+     -
+     - Run swift diagnostics on the target node to check for disk errors
+   * - /var/log/syslog
+     - pound: NULL get_thr_arg
+     - Multiple threads woke up
+     - Noise, safe to ignore
+   * - /var/log/swift/proxy.log
+     - .... ERROR .... ConnectionTimeout ....
+     - A storage node is not responding in a timely fashion
+     - Run swift diagnostics on the target node to check for node down,
+       node unconfigured, storage off-line or network issues between the
+       proxy and non responding node
+   * - /var/log/swift/proxy.log
+     - proxy-server .... HTTP/1.0 500 ....
+     - A proxy server has reported an internal server error
+     - Run swift diagnostics on the target node to check for issues
+   * - /var/log/swift/server.log
+     - .... ERROR .... ConnectionTimeout ....
+     - A storage server is not responding in a timely fashion
+     - Run swift diagnostics on the target node to check for a node or
+       service, down, unconfigured, storage off-line or network issues
+       between the two nodes
+   * - /var/log/swift/server.log
+     - .... ERROR .... Remote I/O error: '/srv/node/disk....
+     - A storage device is not responding as expected
+     - Run swift diagnostics and check the filesystem named in the error
+       for corruption (unmount & xfs_repair)
+   * - /var/log/swift/background.log
+     - object-server ERROR container update failed .... Connection refused
+     - Peer node is not responding
+     - Check status of the network and peer node
+   * - /var/log/swift/background.log
+     - object-updater ERROR with remote .... ConnectionTimeout
+     -
+     - Check status of the network and peer node
+   * - /var/log/swift/background.log
+     - account-reaper STDOUT: .... error: ECONNREFUSED
+     - Network connectivity issue
+     - Resolve network issue and re-run diagnostics
+   * - /var/log/swift/background.log
+     - .... ERROR .... ConnectionTimeout
+     - A storage server is not responding in a timely fashion
+     - Run swift diagnostics on the target node to check for a node
+       or service, down, unconfigured, storage off-line or network issues
+       between the two nodes
+   * - /var/log/swift/background.log
+     - .... ERROR syncing .... Timeout
+     - A storage server is not responding in a timely fashion
+     - Run swift diagnostics on the target node to check for a node
+       or service, down, unconfigured, storage off-line or network issues
+       between the two nodes
+   * - /var/log/swift/background.log
+     - .... ERROR Remote drive not mounted ....
+     - A storage server disk is unavailable
+     - Run swift diagnostics on the target node to check for a node or
+       service, failed or unmounted disk on the target, or a network issue
+   * - /var/log/swift/background.log
+     - object-replicator .... responded as unmounted
+     - A storage server disk is unavailable
+     - Run swift diagnostics on the target node to check for a node or
+       service, failed or unmounted disk on the target, or a network issue
+   * - /var/log/swift/\*.log
+     - STDOUT: EXCEPTION IN
+     - A unexpected error occurred
+     - Read the Traceback details, if it matches known issues
+       (e.g. active network/disk issues), check for re-ocurrences
+       after the primary issues have been resolved
+   * - /var/log/rsyncd.log
+     - rsync: mkdir "/disk....failed: No such file or directory....
+     - A local storage server disk is unavailable
+     - Run swift diagnostics on the node to check for a failed or
+       unmounted disk
+   * - /var/log/swift*
+     - Exception: Could not bind to 0.0.0.0:600xxx
+     - Possible Swift process restart issue. This indicates an old swift
+       process is still running.
+     - Run swift diagnostics, if some swift services are reported down,
+       check if they left residual process behind.
+   * - /var/log/rsyncd.log
+     - rsync: recv_generator: failed to stat "/disk....." (in object)
+       failed: Not a directory (20)
+     - Swift directory structure issues
+     - Run swift diagnostics on the node to check for issues
+
+Diagnose: Parted reports the backup GPT table is corrupt
+--------------------------------------------------------
+
+-  If a GPT table is broken, a message like the following should be
+   observed when the following command is run:
+
+   .. code::
+
+      $ sudo parted -l
+
+   .. code::
+
+      Error: The backup GPT table is corrupt, but the primary appears OK,
+      so that will be used.
+
+      OK/Cancel?
+
+To fix, go to: Fix broken GPT table (broken disk partition)
+
+
+Diagnose: Drives diagnostic reports a FS label is not acceptable
+----------------------------------------------------------------
+
+If diagnostics reports something like  "FS label: obj001dsk011 is not
+acceptable", it indicates that a partition has a valid disk label, but an
+invalid filesystem label. In such cases proceed as follows:
+
+#. Verify that the disk labels are correct:
+
+   .. code::
+
+      FS=/dev/sd#1
+
+      sudo parted -l | grep object
+
+#. If partition labels are inconsistent then, resolve the disk label issues
+   before proceeding:
+
+   .. code::
+
+      sudo parted -s ${FS} name ${PART_NO} ${PART_NAME} #Partition Label
+      #PART_NO is 1 for object disks and 3 for OS disks
+      #PART_NAME follows the convention seen in "sudo parted -l | grep object"
+
+#. If the Filesystem label is missing then create it with care:
+
+   .. code::
+
+      sudo xfs_admin -l ${FS} #Filesystem label (12 Char limit)
+
+      #Check for the existence of a FS label
+
+      OBJNO=<3 Length Object No.>
+
+      #I.E OBJNO for sw-stbaz3-object0007 would be 007
+
+      DISKNO=<3 Length Disk No.>
+
+      #I.E DISKNO for /dev/sdb would be 001, /dev/sdc would be 002 etc.
+
+      sudo xfs_admin -L "obj${OBJNO}dsk${DISKNO}" ${FS}
+
+      #Create a FS Label
+
+Diagnose: Failed LUNs
+---------------------
+
+.. note::
+
+   The HPE Helion Public Cloud uses direct attach SmartArry
+   controllers/drives. The information here is specific to that
+   environment.
+
+The ``swift_diagnostics`` mount checks may return a warning that a LUN has
+failed, typically accompanied by DriveAudit check failures and device
+errors.
+
+Such cases are typically caused by a drive failure, and if drive check
+also reports a failed status for the underlying drive, then follow
+the procedure to replace the disk.
+
+Otherwise the lun can be re-enabled as follows:
+
+#. Generate a hpssacli diagnostic report. This report allows the swift
+   team to troubleshoot potential cabling or hardware issues so it is
+   imperative that you run it immediately when troubleshooting a failed
+   LUN. You will come back later and grep this file for more details, but
+   just generate it for now.
+
+   .. code::
+
+      sudo hpssacli controller all diag file=/tmp/hpacu.diag ris=on \
+      xml=off zip=off
+
+Export the following variables using the below instructions before
+proceeding further.
+
+#. Print a list of logical drives and their numbers and take note of the
+   failed drive's number and array value (example output: "array A
+   logicaldrive 1..." would be exported as LDRIVE=1):
+
+   .. code::
+
+      sudo hpssacli controller slot=1 ld all show
+
+#. Export the number of the logical drive that was retrieved from the
+   previous command into the LDRIVE variable:
+
+   .. code::
+
+      export LDRIVE=<LogicalDriveNumber>
+
+#. Print the array value and Port:Box:Bay for all drives and take note of
+   the Port:Box:Bay for the failed drive (example output: " array A
+   physicaldrive 2C:1:1..." would be exported as PBOX=2C:1:1). Match the
+   array value of this output with the array value obtained from the
+   previous command to be sure you are working on the same drive. Also,
+   the array value usually matches the device name (For example, /dev/sdc
+   in the case of "array c"), but we will run a different command to be sure
+   we are operating on the correct device.
+
+   .. code::
+
+      sudo hpssacli controller slot=1 pd all show
+
+.. note::
+
+   Sometimes a LUN may appear to be failed as it is not and cannot
+   be mounted but the hpssacli/parted commands may show no problems with
+   the LUNS/drives. In this case, the filesystem may be corrupt and may be
+   necessary to run ``sudo xfs_check /dev/sd[a-l][1-2]`` to see if there is
+   an xfs issue. The results of running this command may require that
+   ``xfs_repair`` is run.
+
+#. Export the Port:Box:Bay for the failed drive into the PBOX variable:
+
+   .. code::
+
+      export PBOX=<Port:Box:Bay>
+
+#. Print the physical device information and take note of the Disk Name
+   (example output: "Disk Name: /dev/sdk" would be exported as
+   DEV=/dev/sdk):
+
+   .. code::
+
+      sudo hpssacli controller slot=1 ld ${LDRIVE} show detail \
+      grep -i "Disk Name"
+
+#. Export the device name variable from the preceding command (example:
+   /dev/sdk):
+
+   .. code::
+
+      export DEV=<Device>
+
+#. Export the filesystem variable. Disks that are split between the
+   operating system and data storage, typically sda and sdb, should  only
+   have repairs done on their data filesystem, usually /dev/sda2 and
+   /dev/sdb2, Other data only disks have just one partition on the device,
+   so the filesystem will be 1. In any case you should verify the data
+   filesystem by running ``df -h | grep /srv/node`` and using the listed
+   data filesystem for the device in question as the export. For example:
+   /dev/sdk1.
+
+   .. code::
+
+      export FS=<Filesystem>
+
+#. Verify the LUN is failed, and the device is not:
+
+   .. code::
+
+      sudo hpssacli controller slot=1 ld all show
+      sudo hpssacli controller slot=1 pd all show
+      sudo hpssacli controller slot=1 ld ${LDRIVE} show detail
+      sudo hpssacli controller slot=1 pd ${PBOX} show detail
+
+#. Stop the swift and rsync service:
+
+   .. code::
+
+      sudo service rsync stop
+      sudo swift-init shutdown all
+
+#. Unmount the problem drive, fix the LUN and the filesystem:
+
+   .. code::
+
+      sudo umount ${FS}
+
+#. If umount fails, you should run lsof search for the mountpoint and
+   kill any lingering processes before repeating the unpount:
+
+   .. code::
+
+      sudo hpacucli controller slot=1 ld ${LDRIVE} modify reenable
+      sudo xfs_repair ${FS}
+
+#. If the ``xfs_repair`` complains about possible journal data, use the
+   ``xfs_repair -L`` option to zeroise the journal log.
+
+#. Once complete test-mount the filesystem, and tidy up its lost and
+   found area.
+
+   .. code::
+
+      sudo mount ${FS} /mnt
+      sudo rm -rf /mnt/lost+found/
+      sudo umount /mnt
+
+#. Mount the filesystem and restart swift and rsync.
+
+#. Run the following to determine if a DC ticket is needed to check the
+   cables on the node:
+
+   .. code::
+
+      grep -y media.exchanged /tmp/hpacu.diag
+      grep -y hot.plug.count /tmp/hpacu.diag
+
+#. If the output reports any non 0x00 values, it suggests that the cables
+   should be checked. For example, log a DC ticket to check the sas cables
+   between the drive and the expander.
+
+Diagnose: Slow disk devices
+---------------------------
+
+.. note::
+
+   collectl is an open-source performance gathering/analysis tool.
+
+If the diagnostics report a message such as ``sda: drive is slow``, you
+should log onto the node and run the following comand:
+
+.. code::
+
+   $ /usr/bin/collectl -s D -c 1
+   waiting for 1 second sample...
+   # DISK STATISTICS (/sec)
+   #          <---------reads---------><---------writes---------><--------averages--------> Pct
+   #Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
+   sdb            204      0   33    6      43      0    4   11       6     1     7      6   23
+   sda             84      0   13    6     108     21    6   18      10     1     7      7   13
+   sdc            100      0   16    6       0      0    0    0       6     1     7      6    9
+   sdd            140      0   22    6      22      0    2   11       6     1     9      9   22
+   sde             76      0   12    6     255      0   52    5       5     1     2      1   10
+   sdf            276      0   44    6       0      0    0    0       6     1    11      8   38
+   sdg            112      0   17    7      18      0    2    9       6     1     7      7   13
+   sdh           3552      0   73   49       0      0    0    0      48     1     9      8   62
+   sdi             72      0   12    6       0      0    0    0       6     1     8      8   10
+   sdj            112      0   17    7      22      0    2   11       7     1    10      9   18
+   sdk            120      0   19    6      21      0    2   11       6     1     8      8   16
+   sdl            144      0   22    7      18      0    2    9       6     1     9      7   18
+   dm-0             0      0    0    0       0      0    0    0       0     0     0      0    0
+   dm-1             0      0    0    0      60      0   15    4       4     0     0      0    0
+   dm-2             0      0    0    0      48      0   12    4       4     0     0      0    0
+   dm-3             0      0    0    0       0      0    0    0       0     0     0      0    0
+   dm-4             0      0    0    0       0      0    0    0       0     0     0      0    0
+   dm-5             0      0    0    0       0      0    0    0       0     0     0      0    0
+   ...
+   (repeats -- type Ctrl/C to stop)
+
+Look at the ``Wait`` and ``SvcTime`` values. It is not normal for
+these values to exceed 50msec. This is known to impact customer
+performance (upload/download. For a controller problem, many/all drives
+will show how wait and service times. A reboot may correct the prblem;
+otherwise hardware replacement is needed.
+
+Another way to look at the data is as follows:
+
+.. code::
+
+   $ /opt/hp/syseng/disk-anal.pl -d
+   Disk: sda  Wait: 54580 371  65  25  12   6   6   0   1   2   0  46
+   Disk: sdb  Wait: 54532 374  96  36  16   7   4   1   0   2   0  46
+   Disk: sdc  Wait: 54345 554 105  29  15   4   7   1   4   4   0  46
+   Disk: sdd  Wait: 54175 553 254  31  20  11   6   6   2   2   1  53
+   Disk: sde  Wait: 54923  66  56  15   8   7   7   0   1   0   2  29
+   Disk: sdf  Wait: 50952 941 565 403 426 366 442 447 338  99  38  97
+   Disk: sdg  Wait: 50711 689 808 562 642 675 696 185  43  14   7  82
+   Disk: sdh  Wait: 51018 668 688 483 575 542 692 275  55  22   9  87
+   Disk: sdi  Wait: 51012 1011 849 672 568 240 344 280  38  13   6  81
+   Disk: sdj  Wait: 50724 743 770 586 662 509 684 283  46  17  11  79
+   Disk: sdk  Wait: 50886 700 585 517 633 511 729 352  89  23   8  81
+   Disk: sdl  Wait: 50106 617 794 553 604 504 532 501 288 234 165 216
+   Disk: sda  Time: 55040  22  16   6   1   1  13   0   0   0   3  12
+
+   Disk: sdb  Time: 55014  41  19   8   3   1   8   0   0   0   3  17
+   Disk: sdc  Time: 55032  23  14   8   9   2   6   1   0   0   0  19
+   Disk: sdd  Time: 55022  29  17  12   6   2  11   0   0   0   1  14
+   Disk: sde  Time: 55018  34  15  11  12   1   9   0   0   0   2  12
+   Disk: sdf  Time: 54809 250  45   7   1   0   0   0   0   0   1   1
+   Disk: sdg  Time: 55070  36   6   2   0   0   0   0   0   0   0   0
+   Disk: sdh  Time: 55079  33   2   0   0   0   0   0   0   0   0   0
+   Disk: sdi  Time: 55074  28   7   2   0   0   2   0   0   0   0   1
+   Disk: sdj  Time: 55067  35  10   0   1   0   0   0   0   0   0   1
+   Disk: sdk  Time: 55068  31  10   3   0   0   1   0   0   0   0   1
+   Disk: sdl  Time: 54905 130  61   7   3   4   1   0   0   0   0   3
+
+This shows the historical distribution of the wait and service times
+over a day. This is how you read it:
+
+-  sda did 54580 operations with a short wait time, 371 operations with
+   a longer wait time and 65 with an even longer wait time.
+
+-  sdl did 50106 operations with a short wait time, but as you can see
+   many took longer.
+
+There is a clear pattern that sdf to sdl have a problem. Actually, sda
+to sde would more normally have lots of zeros in their data. But maybe
+this is a busy system. In this example it is worth changing the
+controller as the individual drives may be ok.
+
+After the controller is changed, use collectl -s D as described above to
+see if the problem has cleared. disk-anal.pl will continue to show
+historical data. You can look at recent data as follows. It only looks
+at data from 13:15 to 14:15. As you can see, this is a relatively clean
+system (few if any long wait or service times):
+
+.. code::
+
+   $ /opt/hp/syseng/disk-anal.pl -d -t 13:15-14:15
+   Disk: sda  Wait:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdb  Wait:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdc  Wait:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdd  Wait:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sde  Wait:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdf  Wait:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdg  Wait:  3594   6   0   0   0   0   0   0   0   0   0   0
+   Disk: sdh  Wait:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdi  Wait:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdj  Wait:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdk  Wait:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdl  Wait:  3599   1   0   0   0   0   0   0   0   0   0   0
+   Disk: sda  Time:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdb  Time:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdc  Time:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdd  Time:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sde  Time:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdf  Time:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdg  Time:  3594   6   0   0   0   0   0   0   0   0   0   0
+   Disk: sdh  Time:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdi  Time:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdj  Time:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdk  Time:  3600   0   0   0   0   0   0   0   0   0   0   0
+   Disk: sdl  Time:  3599   1   0   0   0   0   0   0   0   0   0   0
+
+For long wait times, where the service time appears normal is to check
+the logical drive cache status. While the cache may be enabled, it can
+be disabled on a per-drive basis.
+
+Diagnose: Slow network link - Measuring network performance
+-----------------------------------------------------------
+
+Network faults can cause performance between Swift nodes to degrade. The
+following tests are recommended. Other methods (such as copying large
+files) may also work, but can produce inconclusive results.
+
+Use netperf on all production systems. Install on all systems if not
+already installed. And the UFW rules for its control port are in place.
+However, there are no pre-opened ports for netperf's data connection. Pick a
+port number. In this example, 12866 is used because it is one higher
+than netperf's default control port number, 12865. If you get very
+strange results including zero values, you may not have gotten the data
+port opened in UFW at the target or may have gotten the netperf
+command-line wrong.
+
+Pick a ``source`` and ``target`` node. The source is often a proxy node
+and the target is often an object node. Using the same source proxy you
+can test communication to different object nodes in different AZs to
+identity possible bottlekecks.
+
+Running tests
+^^^^^^^^^^^^^
+
+#. Prepare the ``target`` node as follows:
+
+   .. code::
+
+      sudo iptables -I INPUT -p tcp -j ACCEPT
+
+   Or, do:
+
+   .. code::
+
+      sudo ufw allow 12866/tcp
+
+#. On the ``source`` node, run the following command to check
+   throughput. Note the double-dash before the -P option.
+   The command takes 10 seconds to complete.
+
+   .. code::
+
+      $ netperf -H <redacted>.72.4
+      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 12866 AF_INET to
+      <redacted>.72.4 (<redacted>.72.4) port 12866 AF_INET : demo
+      Recv   Send    Send
+      Socket Socket  Message  Elapsed
+      Size   Size    Size     Time     Throughput
+      bytes  bytes   bytes    secs.    10^6bits/sec
+      87380  16384  16384    10.02     923.69
+
+#. On the ``source`` node, run the following command to check latency:
+
+   .. code::
+
+      $ netperf -H <redacted>.72.4 -t TCP_RR -- -P 12866
+      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 12866
+      AF_INET to <redacted>.72.4 (<redacted>.72.4) port 12866 AF_INET : demo
+      : first burst 0
+      Local  Remote Socket   Size    Request  Resp.   Elapsed  Trans.
+      Send   Recv   Size     Size    Time     Rate
+      bytes  Bytes  bytes    bytes   secs.    per sec
+      16384  87380  1        1       10.00    11753.37
+      16384  87380
+
+Expected results
+^^^^^^^^^^^^^^^^
+
+Faults will show up as differences between different pairs of nodes.
+However, for reference, here are some expected numbers:
+
+-  For throughput, proxy to proxy, expect ~9300 Mbit/sec  (proxies have
+   a 10Ge link).
+
+-  For throughout, proxy to object, expect ~920 Mbit/sec  (at time of
+   writing this, object nodes have a 1Ge link).
+
+-  For throughput, object to object, expect ~920 Mbit/sec.
+
+-  For latency (all types), expect ~11000 transactions/sec.
+
+Diagnose: Remapping sectors experiencing UREs
+---------------------------------------------
+
+#. Find the bad sector, device, and filesystem in ``kern.log``.
+
+#. Set the environment variables SEC, DEV & FS, for example:
+
+   .. code::
+
+      SEC=2930954256
+      DEV=/dev/sdi
+      FS=/dev/sdi1
+
+#. Verify that the sector is bad:
+
+   .. code::
+
+      sudo dd if=${DEV} of=/dev/null bs=512 count=1 skip=${SEC}
+
+#. If the sector is bad this command will output an input/output error:
+
+   .. code::
+
+      dd: reading `/dev/sdi`: Input/output error
+      0+0 records in
+      0+0 records out
+
+#. Prevent chef from attempting to re-mount the filesystem while the
+   repair is in progress:
+
+   .. code::
+
+      sudo mv /etc/chef/client.pem /etc/chef/xx-client.xx-pem
+
+#. Stop the swift and rsync service:
+
+   .. code::
+
+      sudo service rsync stop
+      sudo swift-init shutdown all
+
+#. Unmount the problem drive:
+
+   .. code::
+
+      sudo umount ${FS}
+
+#. Overwrite/remap the bad sector:
+
+   .. code::
+
+      sudo dd_rescue -d -A -m8b -s ${SEC}b ${DEV} ${DEV}
+
+#. This command should report an input/output error the first time
+   it is run. Run the command a second time, if it successfully remapped
+   the bad sector it should not report an input/output error.
+
+#. Verify the sector is now readable:
+
+   .. code::
+
+      sudo dd if=${DEV} of=/dev/null bs=512 count=1 skip=${SEC}
+
+#. If the sector is now readable this command should not report an
+   input/output error.
+
+#. If more than one problem sector is listed, set the SEC environment
+   variable to the next sector in the list:
+
+   .. code::
+
+      SEC=123456789
+
+#. Repeat from step 8.
+
+#. Repair the filesystem:
+
+   .. code::
+
+      sudo xfs_repair ${FS}
+
+#. If ``xfs_repair`` reports that the filesystem has valuable filesystem
+   changes:
+
+   .. code::
+
+      sudo xfs_repair ${FS}
+      Phase 1 - find and verify superblock...
+      Phase 2 - using internal log
+              - zero log...
+      ERROR: The filesystem has valuable metadata changes in a log which
+      needs to be replayed.
+      Mount the filesystem to replay the log, and unmount it before
+      re-running xfs_repair.
+      If you are unable to mount the filesystem, then use the -L option to
+      destroy the log and attempt a repair. Note that destroying the log may
+      cause corruption -- please attempt a mount of the filesystem before
+      doing this.
+
+#. You should attempt to mount the filesystem, and clear the lost+found
+   area:
+
+   .. code::
+
+      sudo mount $FS /mnt
+      sudo rm -rf /mnt/lost+found/*
+      sudo umount /mnt
+
+#. If the filesystem fails to mount then you will need to use the
+   ``xfs_repair -L`` option to force log zeroing.
+   Repeat step 11.
+
+#. If ``xfs_repair`` reports that an additional input/output error has been
+   encountered, get the sector details as follows:
+
+   .. code::
+
+      sudo grep "I/O error" /var/log/kern.log | grep sector | tail -1
+
+#. If new input/output error is reported then set the SEC environment
+   variable to the problem sector number:
+
+   .. code::
+
+      SEC=234567890
+
+#. Repeat from step 8
+
+
+#. Remount the filesystem and restart swift and rsync.
+
+   -  If all UREs in the kern.log have been fixed and you are still unable
+      to have xfs_repair disk, it is possible that the URE's have
+      corrupted the filesystem or possibly destroyed the drive altogether.
+      In this case, the first step is to re-format the filesystem and if
+      this fails, get the disk replaced.
+
+
+Diagnose: High system latency
+-----------------------------
+
+.. note::
+
+   The latency measurements described here are specific to the HPE
+   Helion Public Cloud.
+
+-  A bad NIC on a proxy server. However, as explained above, this
+   usually causes the peak to rise, but average should remain near
+   normal parameters. A quick fix is to shutdown the proxy.
+
+-  A stuck memcache server. Accepts connections, but then will not respond.
+   Expect to see timeout messages in ``/var/log/proxy.log`` (port 11211).
+   Swift Diags will also report this as a failed node/port. A quick fix
+   is to shutdown the proxy server.
+
+-  A bad/broken object server can also cause problems if the accounts
+   used by the monitor program happen to live on the bad object server.
+
+-  A general network problem within the data canter. Compare the results
+   with the Pingdom monitors too see if they also have a problem.
+
+Diagnose: Interface reports errors
+----------------------------------
+
+Should a network interface on a Swift node begin reporting network
+errors, it may well indicate a cable, switch, or network issue.
+
+Get an overview of the interface with:
+
+.. code::
+
+   sudo ifconfig eth{n}
+   sudo ethtool eth{n}
+
+The ``Link Detected:`` indicator will read ``yes`` if the nic is
+cabled.
+
+Establish the adapter type with:
+
+.. code::
+
+   sudo ethtool  -i eth{n}
+
+Gather the interface statistics with:
+
+.. code::
+
+   sudo ethtool  -S eth{n}
+
+If the nick supports self test, this can be performed with:
+
+.. code::
+
+   sudo ethtool  -t eth{n}
+
+Self tests should read ``PASS`` if the nic is operating correctly.
+
+Nic module drivers can be re-initialised by carefully removing and
+re-installing the modules. Case in point being the mellanox drivers on
+Swift Proxy servers. which use a two part driver mlx4_en and
+mlx4_core. To reload these you must carefully remove the mlx4_en
+(ethernet) then the mlx4_core modules, and reinstall them in the
+reverse order.
+
+As the interface will be disabled while the modules are unloaded, you
+must be very careful not to lock the interface out. The following
+script can be used to reload the melanox drivers, as a side effect, this
+resets error counts on the interface.
+
+
+Diagnose: CorruptDir diagnostic reports corrupt directories
+-----------------------------------------------------------
+
+From time to time Swift data structures may become corrupted by
+misplaced files in filesystem locations that swift would normally place
+a directory. This causes issues for swift when directory creation is
+attempted at said location, it may fail due to the pre-existent file. If
+the CorruptDir diagnostic reports Corrupt directories, they should be
+checked to see if they exist.
+
+Checking existence of entries
+-----------------------------
+
+Swift data filesystems are located under the ``/srv/node/disk``
+mountpoints and contain accounts, containers and objects
+subdirectories which in turn contain partition number subdirectories.
+The partition number directories contain md5 hash subdirectories. md5
+hash directories contain md5sum subdirectories. md5sum directories
+contain the Swift data payload as either a database (.db), for
+accounts and containers, or a data file (.data) for objects.
+If the entries reported in diagnostics correspond to a partition
+number, md5 hash or md5sum directory, check the entry with ``ls
+-ld *entry*``.
+If it turns out to be a file rather than a directory, it should be
+carefully removed.
+
+.. note::
+
+   Please do not ``ls`` the partition level directory contents, as
+   this *especially objects* may take a lot of time and system resources,
+   if you need to check the contents, use:
+
+   .. code::
+
+      echo /srv/node/disk#/type/partition#/
+
+Diagnose: Hung swift object replicator
+--------------------------------------
+
+The swift diagnostic message ``Object replicator: remaining exceeds
+100hrs:`` may indicate that the swift ``object-replicator`` is stuck and not
+making progress. Another useful way to check this is with the
+'swift-recon -r' command on a swift proxy server:
+
+.. code::
+
+   sudo swift-recon -r
+   ===============================================================================
+
+   --> Starting reconnaissance on 384 hosts
+   ===============================================================================
+   [2013-07-17 12:56:19] Checking on replication
+   http://<redacted>.72.63:6000/recon/replication: <urlopen error timed out>
+   [replication_time] low: 2, high: 80, avg: 28.8, total: 11037, Failed: 0.0%, no_result: 0, reported: 383
+   Oldest completion was 2013-06-12 22:46:50 (12 days ago) by <redacted>.31:6000.
+   Most recent completion was 2013-07-17 12:56:19 (5 seconds ago) by <redacted>.204.113:6000.
+   ===============================================================================
+
+The ``Oldest completion`` line in this example indicates that the
+object-replicator on swift object server <redacted>.31 has not completed
+the replication cycle in 12 days. This replicator is stuck. The object
+replicator cycle is generally less than 1 hour. Though an replicator
+cycle of 15-20 hours can occur if nodes are added to the system and a
+new ring has been deployed.
+
+You can further check if the object replicator is stuck by logging on
+the the object server and checking the object replicator progress with
+the following command:
+
+.. code::
+
+   #  sudo grep object-rep /var/log/swift/background.log | grep -e "Starting object replication" -e "Object replication complete" -e "partitions rep"
+   Jul 16 06:25:46 <redacted> object-replicator 15344/16450 (93.28%) partitions replicated in 69018.48s (0.22/sec, 22h remaining)
+   Jul 16 06:30:46 <redacted> object-replicator 15344/16450 (93.28%) partitions replicated in 69318.58s (0.22/sec, 22h remaining)
+   Jul 16 06:35:46 <redacted> object-replicator 15344/16450 (93.28%) partitions replicated in 69618.63s (0.22/sec, 23h remaining)
+   Jul 16 06:40:46 <redacted> object-replicator 15344/16450 (93.28%) partitions replicated in 69918.73s (0.22/sec, 23h remaining)
+   Jul 16 06:45:46 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 70218.75s (0.22/sec, 24h remaining)
+   Jul 16 06:50:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 70518.85s (0.22/sec, 24h remaining)
+   Jul 16 06:55:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 70818.95s (0.22/sec, 25h remaining)
+   Jul 16 07:00:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 71119.05s (0.22/sec, 25h remaining)
+   Jul 16 07:05:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 71419.15s (0.21/sec, 26h remaining)
+   Jul 16 07:10:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 71719.25s (0.21/sec, 26h remaining)
+   Jul 16 07:15:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 72019.27s (0.21/sec, 27h remaining)
+   Jul 16 07:20:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 72319.37s (0.21/sec, 27h remaining)
+   Jul 16 07:25:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 72619.47s (0.21/sec, 28h remaining)
+   Jul 16 07:30:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 72919.56s (0.21/sec, 28h remaining)
+   Jul 16 07:35:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 73219.67s (0.21/sec, 29h remaining)
+   Jul 16 07:40:47 <redacted> object-replicator 15348/16450 (93.30%) partitions replicated in 73519.76s (0.21/sec, 29h remaining)
+
+The above status is output every 5 minutes to ``/var/log/swift/background.log``.
+
+.. note::
+
+   The 'remaining' time is increasing as time goes on, normally the
+   time remaining should be decreasing. Also note the partition number. For example,
+   15344 remains the same for several status lines. Eventually the object
+   replicator detects the hang and attempts to make progress by killing the
+   problem thread. The replicator then progresses to the next partition but
+   quite often it again gets stuck on the same partition.
+
+One of the reasons for the object replicator hanging like this is
+filesystem corruption on the drive. The following is a typical log entry
+of a corrupted filesystem detected by the object replicator:
+
+.. code::
+
+   # sudo bzgrep "Remote I/O error" /var/log/swift/background.log* |grep srv | - tail -1
+   Jul 12 03:33:30 <redacted> object-replicator STDOUT: ERROR:root:Error hashing suffix#012Traceback (most recent call last):#012 File
+   "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 199, in get_hashes#012 hashes[suffix] = hash_suffix(suffix_dir,
+   reclaim_age)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 84, in hash_suffix#012 path_contents =
+   sorted(os.listdir(path))#012OSError: [Errno 121] Remote I/O error: '/srv/node/disk4/objects/1643763/b51'
+
+An ``ls`` of the problem file or directory usually shows something like the following:
+
+.. code::
+
+   # ls -l /srv/node/disk4/objects/1643763/b51
+   ls: cannot access /srv/node/disk4/objects/1643763/b51: Remote I/O error
+
+If no entry with ``Remote I/O error`` occurs in the ``background.log`` it is
+not possible to determine why the object-replicator is hung. It may be
+that the ``Remote I/O error`` entry is older than 7 days and so has been
+rotated out of the logs. In this scenario it may be best to simply
+restart the object-replicator.
+
+#. Stop the object-replicator:
+
+   .. code::
+
+      # sudo swift-init object-replicator stop
+
+#. Make sure the object replicator has stopped, if it has hung, the stop
+   command will not stop the hung process:
+
+   .. code::
+
+      # ps auxww | - grep swift-object-replicator
+
+#. If the previous ps shows the object-replicator is still running, kill
+   the process:
+
+   .. code::
+
+      # kill -9 <pid-of-swift-object-replicator>
+
+#. Start the object-replicator:
+
+   .. code::
+
+      # sudo swift-init object-replicator start
+
+If the above grep did find an ``Remote I/O error`` then it may be possible
+to repair the problem filesystem.
+
+#. Stop swift and rsync:
+
+   .. code::
+
+      # sudo swift-init all shutdown
+      # sudo service rsync stop
+
+#. Make sure all swift process have stopped:
+
+   .. code::
+
+      # ps auxww | grep swift | grep python
+
+#. Kill any swift processes still running.
+
+#. Unmount the problem filesystem:
+
+   .. code::
+
+      # sudo umount /srv/node/disk4
+
+#. Repair the filesystem:
+
+   .. code::
+
+      # sudo xfs_repair -P /dev/sde1
+
+#. If the ``xfs_repair`` fails then it may be necessary to re-format the
+   filesystem. See Procedure: fix broken XFS filesystem. If the
+   ``xfs_repair`` is successful, re-enable chef using the following command
+   and replication should commence again.
+
+
+Diagnose: High CPU load
+-----------------------
+
+The CPU load average on an object server, as shown with the
+'uptime' command, is typically under 10 when the server is
+lightly-moderately loaded:
+
+.. code::
+
+   $ uptime
+   07:59:26 up 99 days,  5:57,  1 user,  load average: 8.59, 8.39, 8.32
+
+During times of increased activity, due to user transactions or object
+replication, the CPU load average can increase to  to around 30.
+
+However, sometimes the CPU load average can increase significantly. The
+following is an example of an object server that has extremely high CPU
+load:
+
+.. code::
+
+   $ uptime
+   07:44:02 up 18:22,  1 user,  load average: 407.12, 406.36, 404.59
+
+.. toctree::
+   :maxdepth: 2
+
+   sec-furtherdiagnose.rst
diff --git a/doc/source/ops_runbook/general.rst b/doc/source/ops_runbook/general.rst
new file mode 100644
index 0000000000..60d19badee
--- /dev/null
+++ b/doc/source/ops_runbook/general.rst
@@ -0,0 +1,36 @@
+==================
+General Procedures
+==================
+
+Getting a swift account stats
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. note::
+
+   ``swift-direct`` is specific to the HPE Helion Public Cloud. Go look at
+   ``swifty`` for an alternate, this is an example.
+
+This procedure describes how you determine the swift usage for a given
+swift account, that is the number of containers, number of objects and
+total bytes used. To do this you will need the project ID.
+
+Log onto one of the swift proxy servers.
+
+Use swift-direct to show this accounts usage:
+
+.. code::
+
+   $ sudo -u swift /opt/hp/swift/bin/swift-direct show AUTH_redacted-9a11-45f8-aa1c-9e7b1c7904c8
+   Status: 200
+         Content-Length: 0
+         Accept-Ranges: bytes
+         X-Timestamp: 1379698586.88364
+         X-Account-Bytes-Used: 67440225625994
+         X-Account-Container-Count: 1
+         Content-Type: text/plain; charset=utf-8
+         X-Account-Object-Count: 8436776
+         Status: 200
+         name: my_container  count: 8436776  bytes: 67440225625994
+
+This account has 1 container. That container has 8436776 objects. The
+total bytes used is 67440225625994.
\ No newline at end of file
diff --git a/doc/source/ops_runbook/index.rst b/doc/source/ops_runbook/index.rst
new file mode 100644
index 0000000000..6fdb9c8c90
--- /dev/null
+++ b/doc/source/ops_runbook/index.rst
@@ -0,0 +1,79 @@
+=================
+Swift Ops Runbook
+=================
+
+This document contains operational procedures that Hewlett Packard Enterprise (HPE) uses to operate
+and monitor the Swift system within the HPE Helion Public Cloud. This
+document is an excerpt of a larger product-specific handbook. As such,
+the material may appear incomplete. The suggestions and recommendations
+made in this document are for our particular environment, and may not be
+suitable for your environment or situation. We make no representations
+concerning the accuracy, adequacy, completeness or suitability of the
+information, suggestions or recommendations. This document are provided
+for reference only. We are not responsible for your use of any
+information, suggestions or recommendations contained herein.
+
+This document also contains references to certain tools that we use to
+operate the Swift system within the HPE Helion Public Cloud.
+Descriptions of these tools are provided for reference only, as the tools themselves
+are not publically available at this time.
+
+-  ``swift-direct``: This is similar to the ``swiftly`` tool.
+
+
+.. toctree::
+   :maxdepth: 2
+
+   general.rst
+   diagnose.rst
+   procedures.rst
+   maintenance.rst
+   troubleshooting.rst
+
+Is the system up?
+~~~~~~~~~~~~~~~~~
+
+If you have a report that Swift is down, perform the following basic checks:
+
+#. Run swift functional tests.
+
+#. From a server in your data center, use ``curl`` to check ``/healthcheck``.
+
+#. If you have a monitoring system, check your monitoring system.
+
+#. Check on your hardware load balancers infrastructure.
+
+#. Run swift-recon on a proxy node.
+
+Run swift function tests
+------------------------
+
+We would recommend that you set up your function tests against your production
+system.
+
+A script for running the function tests is located in ``swift/.functests``.
+
+
+External monitoring
+-------------------
+
+-  We use pingdom.com to monitor the external Swift API. We suggest the
+   following:
+
+   -  Do a GET on ``/healthcheck``
+
+   -  Create a container, make it public (x-container-read:
+      .r\*,.rlistings), create a small file in the container; do a GET
+      on the object
+
+Reference information
+~~~~~~~~~~~~~~~~~~~~~
+
+Reference: Swift startup/shutdown
+---------------------------------
+
+-  Use reload - not stop/start/restart.
+
+-  Try to roll sets of servers (especially proxy) in groups of less
+   than 20% of your servers.
+
diff --git a/doc/source/ops_runbook/maintenance.rst b/doc/source/ops_runbook/maintenance.rst
new file mode 100644
index 0000000000..b3c9e582ac
--- /dev/null
+++ b/doc/source/ops_runbook/maintenance.rst
@@ -0,0 +1,322 @@
+==================
+Server maintenance
+==================
+
+General assumptions
+~~~~~~~~~~~~~~~~~~~
+
+-  It is assumed that anyone attempting to replace hardware components
+   will have already read and understood the appropriate maintenance and
+   service guides.
+
+-  It is assumed that where servers need to be taken off-line for
+   hardware replacement, that this will be done in series, bringing the
+   server back on-line before taking the next off-line.
+
+-  It is assumed that the operations directed procedure will be used for
+   identifying hardware for replacement.
+
+Assessing the health of swift
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can run the swift-recon tool on a Swift proxy node to get a quick
+check of how Swift is doing. Please note that the numbers below are
+necessarily somewhat subjective. Sometimes parameters for which we
+say 'low values are good' will have pretty high values for a time. Often
+if you wait a while things get better.
+
+For example:
+
+.. code::
+
+   sudo swift-recon -rla
+   ===============================================================================
+   [2012-03-10 12:57:21] Checking async pendings on 384 hosts...
+   Async stats: low: 0, high: 1, avg: 0, total: 1
+   ===============================================================================
+
+   [2012-03-10 12:57:22] Checking replication times on 384 hosts...
+   [Replication Times] shortest: 1.4113877813, longest: 36.8293570836, avg: 4.86278064749
+   ===============================================================================
+
+   [2012-03-10 12:57:22] Checking load avg's on 384 hosts...
+   [5m load average] lowest: 2.22, highest: 9.5, avg: 4.59578125
+   [15m load average] lowest: 2.36, highest: 9.45, avg: 4.62622395833
+   [1m load average] lowest: 1.84, highest: 9.57, avg: 4.5696875
+   ===============================================================================
+
+In the example above we ask for information on replication times (-r),
+load averages (-l) and async pendings (-a). This is a healthy Swift
+system. Rules-of-thumb for 'good' recon output are:
+
+-  Nodes that respond are up and running Swift. If all nodes respond,
+   that is a good sign. But some nodes may time out. For example:
+
+   .. code::
+
+      \-> [http://<redacted>.29:6000/recon/load:] <urlopen error [Errno 111] ECONNREFUSED>
+      \-> [http://<redacted>.31:6000/recon/load:] <urlopen error timed out>
+
+-  That could be okay or could require investigation.
+
+-  Low values (say < 10 for high and average) for async pendings are
+   good. Higher values occur when disks are down and/or when the system
+   is heavily loaded. Many simultaneous PUTs to the same container can
+   drive async pendings up. This may be normal, and may resolve itself
+   after a while. If it persists, one way to track down the problem is
+   to find a node with high async pendings (with ``swift-recon -av | sort
+   -n -k4``), then check its Swift logs, Often async pendings are high
+   because a node cannot write to a container on another node. Often
+   this is because the node or disk is offline or bad. This may be okay
+   if we know about it.
+
+-  Low values for replication times are good. These values rise when new
+   rings are pushed, and when nodes and devices are brought back on
+   line.
+
+-  Our 'high' load average values are typically in the 9-15 range. If
+   they are a lot bigger it is worth having a look at the systems
+   pushing the average up. Run ``swift-recon -av`` to get the individual
+   averages. To sort the entries with the highest at the end,
+   run ``swift-recon -av | sort -n -k4``.
+
+For comparison here is the recon output for the same system above when
+two entire racks of Swift are down:
+
+.. code::
+
+   [2012-03-10 16:56:33] Checking async pendings on 384 hosts...
+   -> http://<redacted>.22:6000/recon/async: <urlopen error timed out>
+   -> http://<redacted>.18:6000/recon/async: <urlopen error timed out>
+   -> http://<redacted>.16:6000/recon/async: <urlopen error timed out>
+   -> http://<redacted>.13:6000/recon/async: <urlopen error timed out>
+   -> http://<redacted>.30:6000/recon/async: <urlopen error timed out>
+   -> http://<redacted>.6:6000/recon/async: <urlopen error timed out>
+   .........
+   -> http://<redacted>.5:6000/recon/async: <urlopen error timed out>
+   -> http://<redacted>.15:6000/recon/async: <urlopen error timed out>
+   -> http://<redacted>.9:6000/recon/async: <urlopen error timed out>
+   -> http://<redacted>.27:6000/recon/async: <urlopen error timed out>
+   -> http://<redacted>.4:6000/recon/async: <urlopen error timed out>
+   -> http://<redacted>.8:6000/recon/async: <urlopen error timed out>
+   Async stats: low: 243, high: 659, avg: 413, total: 132275
+   ===============================================================================
+   [2012-03-10 16:57:48] Checking replication times on 384 hosts...
+   -> http://<redacted>.22:6000/recon/replication: <urlopen error timed out>
+   -> http://<redacted>.18:6000/recon/replication: <urlopen error timed out>
+   -> http://<redacted>.16:6000/recon/replication: <urlopen error timed out>
+   -> http://<redacted>.13:6000/recon/replication: <urlopen error timed out>
+   -> http://<redacted>.30:6000/recon/replication: <urlopen error timed out>
+   -> http://<redacted>.6:6000/recon/replication: <urlopen error timed out>
+   ............
+   -> http://<redacted>.5:6000/recon/replication: <urlopen error timed out>
+   -> http://<redacted>.15:6000/recon/replication: <urlopen error timed out>
+   -> http://<redacted>.9:6000/recon/replication: <urlopen error timed out>
+   -> http://<redacted>.27:6000/recon/replication: <urlopen error timed out>
+   -> http://<redacted>.4:6000/recon/replication: <urlopen error timed out>
+   -> http://<redacted>.8:6000/recon/replication: <urlopen error timed out>
+   [Replication Times] shortest: 1.38144306739, longest: 112.620954418, avg: 10.285
+   9475361
+   ===============================================================================
+   [2012-03-10 16:59:03] Checking load avg's on 384 hosts...
+   -> http://<redacted>.22:6000/recon/load: <urlopen error timed out>
+   -> http://<redacted>.18:6000/recon/load: <urlopen error timed out>
+   -> http://<redacted>.16:6000/recon/load: <urlopen error timed out>
+   -> http://<redacted>.13:6000/recon/load: <urlopen error timed out>
+   -> http://<redacted>.30:6000/recon/load: <urlopen error timed out>
+   -> http://<redacted>.6:6000/recon/load: <urlopen error timed out>
+   ............
+   -> http://<redacted>.15:6000/recon/load: <urlopen error timed out>
+   -> http://<redacted>.9:6000/recon/load: <urlopen error timed out>
+   -> http://<redacted>.27:6000/recon/load: <urlopen error timed out>
+   -> http://<redacted>.4:6000/recon/load: <urlopen error timed out>
+   -> http://<redacted>.8:6000/recon/load: <urlopen error timed out>
+   [5m load average] lowest: 1.71, highest: 4.91, avg: 2.486375
+   [15m load average] lowest: 1.79, highest: 5.04, avg: 2.506125
+   [1m load average] lowest: 1.46, highest: 4.55, avg: 2.4929375
+   ===============================================================================
+
+.. note::
+
+   The replication times and load averages are within reasonable
+   parameters, even with 80 object stores down. Async pendings, however is
+   quite high. This is due to the fact that the containers on the servers
+   which are down cannot be updated. When those servers come back up, async
+   pendings should drop. If async pendings were at this level without an
+   explanation, we have a problem.
+
+Recon examples
+~~~~~~~~~~~~~~
+
+Here is an example of noting and tracking down a problem with recon.
+
+Running reccon shows some async pendings:
+
+.. code::
+
+   bob@notso:~/swift-1.4.4/swift$ ssh \\-q <redacted>.132.7 sudo swift-recon \\-alr
+   ===============================================================================
+   \[2012-03-14 17:25:55\\] Checking async pendings on 384 hosts...
+   Async stats: low: 0, high: 23, avg: 8, total: 3356
+   ===============================================================================
+   \[2012-03-14 17:25:55\\] Checking replication times on 384 hosts...
+   \[Replication Times\\] shortest: 1.49303831657, longest: 39.6982825994, avg: 4.2418222066
+   ===============================================================================
+   \[2012-03-14 17:25:56\\] Checking load avg's on 384 hosts...
+   \[5m load average\\] lowest: 2.35, highest: 8.88, avg: 4.45911458333
+   \[15m load average\\] lowest: 2.41, highest: 9.11, avg: 4.504765625
+   \[1m load average\\] lowest: 1.95, highest: 8.56, avg: 4.40588541667
+    ===============================================================================
+
+Why? Running recon again with -av swift (not shown here) tells us that
+the node with the highest (23) is <redacted>.72.61. Looking at the log
+files on <redacted>.72.61 we see:
+
+.. code::
+
+   souzab@<redacted>:~$ sudo tail -f /var/log/swift/background.log | - grep -i ERROR
+   Mar 14 17:28:06 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.119', 'id': 5481, 'meta': '', 'device': 'disk6', 'port': 6001}
+   Mar 14 17:28:06 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.119', 'id': 5481, 'meta': '', 'device': 'disk6', 'port': 6001}
+   Mar 14 17:28:09 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6001}
+   Mar 14 17:28:11 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6001}
+   Mar 14 17:28:13 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.119', 'id': 5481, 'meta': '', 'device': 'disk6', 'port': 6001}
+   Mar 14 17:28:13 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.119', 'id': 5481, 'meta': '', 'device': 'disk6', 'port': 6001}
+   Mar 14 17:28:15 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6001}
+   Mar 14 17:28:15 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6001}
+   Mar 14 17:28:19 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6001}
+   Mar 14 17:28:19 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6001}
+   Mar 14 17:28:20 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.119', 'id': 5481, 'meta': '', 'device': 'disk6', 'port': 6001}
+   Mar 14 17:28:21 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6001}
+   Mar 14 17:28:21 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6001}
+   Mar 14 17:28:22 <redacted> container-replicator ERROR Remote drive not mounted
+   {'zone': 5, 'weight': 1952.0, 'ip': '<redacted>.204.20', 'id': 2311, 'meta': '', 'device': 'disk5', 'port': 6001}
+
+That is why this node has a lot of async pendings: a bunch of disks that
+are not mounted on <redacted> and <redacted>. There may be other issues,
+but clearing this up will likely drop the async pendings a fair bit, as
+other nodes will be having the same problem.
+
+Assessing the availability risk when multiple storage servers are down
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. note::
+
+   This procedure will tell you if you have a problem, however, in practice
+   you will find that you will not use this procedure frequently.
+
+If three storage nodes (or, more precisely, three disks on three
+different storage nodes) are down, there is a small but nonzero
+probability that user objects, containers, or accounts will not be
+available.
+
+Procedure
+---------
+
+.. note::
+
+   swift has three rings: one each for objects, containers and accounts.
+   This procedure should be run three times, each time specifying the
+   appropriate ``*.builder`` file.
+
+#. Determine whether all three nodes are different Swift zones by
+   running the ring builder on a proxy node to determine which zones
+   the storage nodes are in. For example:
+
+   .. code::
+
+      % sudo swift-ring-builder /etc/swift/object.builder
+      /etc/swift/object.builder, build version 1467
+      2097152 partitions, 3 replicas, 5 zones, 1320 devices, 0.02 balance
+      The minimum number of hours before a partition can be reassigned is 24
+      Devices:    id  zone      ip address  port      name weight partitions balance meta
+      0     1     <redacted>.4  6000     disk0 1708.00       4259   -0.00
+      1     1     <redacted>.4  6000     disk1 1708.00       4260    0.02
+      2     1     <redacted>.4  6000     disk2 1952.00       4868    0.01
+      3     1     <redacted>.4  6000     disk3 1952.00       4868    0.01
+      4     1     <redacted>.4  6000     disk4 1952.00       4867   -0.01
+
+#. Here, node <redacted>.4 is in zone 1. If two or more of the three
+   nodes under consideration are in the same Swift zone, they do not
+   have any ring partitions in common; there is little/no data
+   availability risk if all three nodes are down.
+
+#. If the nodes are in three distinct Swift zonesit is necessary to
+   whether the nodes have ring partitions in common. Run ``swift-ring``
+   builder again, this time with the ``list_parts`` option and specify
+   the nodes under consideration. For example (all on one line):
+
+   .. code::
+
+      % sudo swift-ring-builder /etc/swift/object.builder list_parts <redacted>.8 <redacted>.15 <redacted>.72.2
+      Partition   Matches
+      91           2
+      729          2
+      3754         2
+      3769         2
+      3947         2
+      5818         2
+      7918         2
+      8733         2
+      9509         2
+      10233        2
+
+#. The ``list_parts`` option to the ring builder indicates how many ring
+   partitions the nodes have in common. If, as in this case,  the
+   first entry in the list has a ‘Matches’ column of 2 or less,  there
+   is no data availability risk if all three nodes are down.
+
+#. If the ‘Matches’ column has entries equal to 3, there is some data
+   availability risk if all three nodes are down. The risk is generally
+   small, and is proportional to the number of entries that have a 3 in
+   the Matches column. For example:
+
+   .. code::
+
+      Partition   Matches
+      26865          3
+      362367         3
+      745940         3
+      778715         3
+      797559         3
+      820295         3
+      822118         3
+      839603         3
+      852332         3
+      855965         3
+      858016         3
+
+#. A quick way to count the number of rows with 3 matches is:
+
+   .. code::
+
+      % sudo swift-ring-builder /etc/swift/object.builder list_parts <redacted>.8 <redacted>.15 <redacted>.72.2 | grep “3$” - wc \\-l
+
+      30
+
+#. In this case the nodes have 30 out of a total of 2097152 partitions
+   in common; about 0.001%. In this case the risk is small nonzero.
+   Recall that a partition is simply a portion of the ring mapping
+   space, not actual data. So having partitions in common is a necessary
+   but not sufficient condition for data unavailability.
+
+   .. note::
+
+      We should not bring down a node for repair if it shows
+      Matches entries of 3 with other nodes that are also down.
+
+      If three nodes that have 3 partitions in common are all down, there is
+      a nonzero probability that data are unavailable and we should work to
+      bring some or all of the nodes up ASAP.
diff --git a/doc/source/ops_runbook/procedures.rst b/doc/source/ops_runbook/procedures.rst
new file mode 100644
index 0000000000..899df6d694
--- /dev/null
+++ b/doc/source/ops_runbook/procedures.rst
@@ -0,0 +1,367 @@
+=================================
+Software configuration procedures
+=================================
+
+Fix broken GPT table (broken disk partition)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+-  If a GPT table is broken, a message like the following should be
+   observed when the command...
+
+   .. code::
+
+      $ sudo parted -l
+
+-  ... is run.
+
+   .. code::
+
+      ...
+      Error: The backup GPT table is corrupt, but the primary appears OK, so that will
+      be used.
+      OK/Cancel?
+
+#. To fix this, firstly install the ``gdisk`` program to fix this:
+
+   .. code::
+
+      $ sudo aptitude install gdisk
+
+#. Run ``gdisk`` for the particular drive with the damaged partition:
+
+   .. code:
+
+      $ sudo gdisk /dev/sd*a-l*
+      GPT fdisk (gdisk) version 0.6.14
+
+      Caution: invalid backup GPT header, but valid main header; regenerating
+      backup header from main header.
+
+      Warning! One or more CRCs don't match. You should repair the disk!
+
+      Partition table scan:
+         MBR: protective
+         BSD: not present
+         APM: not present
+         GPT: damaged
+      /dev/sd
+      *****************************************************************************
+      Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
+      verification and recovery are STRONGLY recommended.
+      *****************************************************************************
+
+#. On the command prompt, type ``r`` (recovery and transformation
+   options), followed by ``d`` (use main GPT header) , ``v`` (verify disk)
+   and finally ``w`` (write table to disk and exit). Will also need to
+   enter ``Y`` when prompted in order to confirm actions.
+
+   .. code::
+
+      Command (? for help): r
+
+      Recovery/transformation command (? for help): d
+
+      Recovery/transformation command (? for help): v
+
+      Caution: The CRC for the backup partition table is invalid. This table may
+      be corrupt. This program will automatically create a new backup partition
+      table when you save your partitions.
+
+      Caution: Partition 1 doesn't begin on a 8-sector boundary. This may
+      result in degraded performance on some modern (2009 and later) hard disks.
+
+      Caution: Partition 2 doesn't begin on a 8-sector boundary. This may
+      result in degraded performance on some modern (2009 and later) hard disks.
+
+      Caution: Partition 3 doesn't begin on a 8-sector boundary. This may
+      result in degraded performance on some modern (2009 and later) hard disks.
+
+      Identified 1 problems!
+
+      Recovery/transformation command (? for help): w
+
+      Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
+      PARTITIONS!!
+
+      Do you want to proceed, possibly destroying your data? (Y/N): Y
+
+      OK; writing new GUID partition table (GPT).
+      The operation has completed successfully.
+
+#. Running the command:
+
+   .. code::
+
+      $ sudo parted /dev/sd#
+
+#. Should now show that the partition is recovered and healthy again.
+
+#. Finally, uninstall ``gdisk`` from the node:
+
+   .. code::
+
+      $ sudo aptitude remove gdisk
+
+Procedure: Fix broken XFS filesystem
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+#. A filesystem may be corrupt or broken if the following output is
+   observed when checking its label:
+
+   .. code::
+
+      $ sudo xfs_admin -l /dev/sd#
+        cache_node_purge: refcount was 1, not zero (node=0x25d5ee0)
+        xfs_admin: cannot read root inode (117)
+        cache_node_purge: refcount was 1, not zero (node=0x25d92b0)
+        xfs_admin: cannot read realtime bitmap inode (117)
+        bad sb magic # 0 in AG 1
+        failed to read label in AG 1
+
+#. Run the following commands to remove the broken/corrupt filesystem and replace.
+   (This example uses the filesystem ``/dev/sdb2``) Firstly need to replace the partition:
+
+   .. code::
+
+      $ sudo parted
+      GNU Parted 2.3
+      Using /dev/sda
+      Welcome to GNU Parted! Type 'help' to view a list of commands.
+      (parted) select /dev/sdb
+      Using /dev/sdb
+      (parted) p
+      Model: HP LOGICAL VOLUME (scsi)
+      Disk /dev/sdb: 2000GB
+      Sector size (logical/physical): 512B/512B
+      Partition Table: gpt
+
+      Number  Start   End     Size    File system  Name   Flags
+      1      17.4kB  1024MB  1024MB  ext3                 boot
+      2      1024MB  1751GB  1750GB  xfs          sw-aw2az1-object045-disk1
+      3      1751GB  2000GB  249GB                        lvm
+
+      (parted) rm 2
+      (parted) mkpart primary 2 -1
+      Warning: You requested a partition from 2000kB to 2000GB.
+      The closest location we can manage is 1024MB to 1751GB.
+      Is this still acceptable to you?
+      Yes/No? Yes
+      Warning: The resulting partition is not properly aligned for best performance.
+      Ignore/Cancel? Ignore
+      (parted) p
+      Model: HP LOGICAL VOLUME (scsi)
+      Disk /dev/sdb: 2000GB
+      Sector size (logical/physical): 512B/512B
+      Partition Table: gpt
+
+      Number  Start   End     Size    File system  Name     Flags
+      1      17.4kB  1024MB  1024MB  ext3                  boot
+      2      1024MB  1751GB  1750GB  xfs          primary
+      3      1751GB  2000GB  249GB                         lvm
+
+      (parted) quit
+
+#. Next step is to scrub the filesystem and format:
+
+   .. code::
+
+      $ sudo dd if=/dev/zero of=/dev/sdb2 bs=$((1024\*1024)) count=1
+      1+0 records in
+      1+0 records out
+      1048576 bytes (1.0 MB) copied, 0.00480617 s, 218 MB/s
+      $ sudo /sbin/mkfs.xfs -f -i size=1024 /dev/sdb2
+      meta-data=/dev/sdb2              isize=1024   agcount=4, agsize=106811524 blks
+             =                       sectsz=512   attr=2, projid32bit=0
+    data     =                       bsize=4096   blocks=427246093, imaxpct=5
+             =                       sunit=0      swidth=0 blks
+    naming   =version 2              bsize=4096   ascii-ci=0
+    log      =internal log           bsize=4096   blocks=208616, version=2
+             =                       sectsz=512   sunit=0 blks, lazy-count=1
+    realtime =none                   extsz=4096   blocks=0, rtextents=0
+
+#. You should now label and mount your filesystem.
+
+#. Can now check to see if the filesystem is mounted using the command:
+
+   .. code::
+
+      $ mount
+
+Procedure: Checking if an account is okay
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. note::
+
+   ``swift-direct`` is only available in the HPE Helion Public Cloud.
+   Use ``swiftly`` as an alternate.
+
+If you have a tenant ID you can check the account is okay as follows from a proxy.
+
+.. code::
+
+   $ sudo -u swift  /opt/hp/swift/bin/swift-direct show <Api-Auth-Hash-or-TenantId>
+
+The response will either be similar to a swift list of the account
+containers, or an error indicating that the resource could not be found.
+
+In the latter case you can establish if a backend database exists for
+the tenantId by running the following on a proxy:
+
+.. code::
+
+   $ sudo -u swift  swift-get-nodes /etc/swift/account.ring.gz  <Api-Auth-Hash-or-TenantId>
+
+The response will list ssh commands that will list the replicated
+account databases, if they exist.
+
+Procedure: Revive a deleted account
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Swift accounts are normally not recreated. If a tenant unsubscribes from
+Swift, the account is deleted. To re-subscribe to Swift, you can create
+a new tenant (new tenant ID), and subscribe to Swift. This creates a
+new Swift account with the new tenant ID.
+
+However, until the unsubscribe/new tenant process is supported, you may
+hit a situation where a Swift account is deleted and the user is locked
+out of Swift.
+
+Deleting the account database files
+-----------------------------------
+
+Here is one possible solution. The containers and objects may be lost
+forever. The solution is to delete the account database files and
+re-create the account. This may only be done once the containers and
+objects are completely deleted. This process is untested, but could
+work as follows:
+
+#. Use swift-get-nodes to locate the account's database file (on three
+   servers).
+
+#. Rename the database files (on three servers).
+
+#. Use ``swiftly`` to create the account (use original name).
+
+Renaming account database so it can be revived
+----------------------------------------------
+
+Get the locations of the database files that hold the account data.
+
+   .. code::
+
+      sudo swift-get-nodes /etc/swift/account.ring.gz AUTH_redacted-1856-44ae-97db-31242f7ad7a1
+
+      Account  AUTH_redacted-1856-44ae-97db-31242f7ad7a1
+      Container None
+
+      Object    None
+
+      Partition 18914
+
+      Hash        93c41ef56dd69173a9524193ab813e78
+
+      Server:Port Device 15.184.9.126:6002 disk7
+      Server:Port Device 15.184.9.94:6002 disk11
+      Server:Port Device 15.184.9.103:6002 disk10
+      Server:Port Device 15.184.9.80:6002 disk2  [Handoff]
+      Server:Port Device 15.184.9.120:6002 disk2  [Handoff]
+      Server:Port Device 15.184.9.98:6002 disk2  [Handoff]
+
+      curl -I -XHEAD "`*http://15.184.9.126:6002/disk7/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.126:6002/disk7/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_
+      curl -I -XHEAD "`*http://15.184.9.94:6002/disk11/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.94:6002/disk11/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_
+
+      curl -I -XHEAD "`*http://15.184.9.103:6002/disk10/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.103:6002/disk10/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_
+
+      curl -I -XHEAD "`*http://15.184.9.80:6002/disk2/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.80:6002/disk2/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ # [Handoff]
+      curl -I -XHEAD "`*http://15.184.9.120:6002/disk2/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.120:6002/disk2/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ # [Handoff]
+      curl -I -XHEAD "`*http://15.184.9.98:6002/disk2/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.98:6002/disk2/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ # [Handoff]
+
+      ssh 15.184.9.126 "ls -lah /srv/node/disk7/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"
+      ssh 15.184.9.94 "ls -lah /srv/node/disk11/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"
+      ssh 15.184.9.103 "ls -lah /srv/node/disk10/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"
+      ssh 15.184.9.80 "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/" # [Handoff]
+      ssh 15.184.9.120 "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/" # [Handoff]
+      ssh 15.184.9.98 "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/" # [Handoff]
+
+      $ sudo swift-get-nodes /etc/swift/account.ring.gz AUTH\_redacted-1856-44ae-97db-31242f7ad7a1Account  AUTH_redacted-1856-44ae-97db-
+      31242f7ad7a1Container  NoneObject      NonePartition   18914Hash           93c41ef56dd69173a9524193ab813e78Server:Port Device  15.184.9.126:6002 disk7Server:Port Device   15.184.9.94:6002 disk11Server:Port Device   15.184.9.103:6002 disk10Server:Port Device  15.184.9.80:6002
+      disk2   [Handoff]Server:Port Device    15.184.9.120:6002 disk2  [Handoff]Server:Port Device    15.184.9.98:6002 disk2   [Handoff]curl -I -XHEAD
+      "`*http://15.184.9.126:6002/disk7/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"*<http://15.184.9.126:6002/disk7/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ curl -I -XHEAD
+
+      "`*http://15.184.9.94:6002/disk11/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.94:6002/disk11/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ curl -I -XHEAD
+
+      "`*http://15.184.9.103:6002/disk10/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.103:6002/disk10/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ curl -I -XHEAD
+
+      "`*http://15.184.9.80:6002/disk2/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.80:6002/disk2/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ # [Handoff]curl -I -XHEAD
+
+      "`*http://15.184.9.120:6002/disk2/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.120:6002/disk2/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ # [Handoff]curl -I -XHEAD
+
+      "`*http://15.184.9.98:6002/disk2/18914/AUTH_redacted-1856-44ae-97db-31242f7ad7a1"* <http://15.184.9.98:6002/disk2/18914/AUTH_cc9ebdb8-1856-44ae-97db-31242f7ad7a1>`_ # [Handoff]ssh 15.184.9.126
+
+      "ls -lah /srv/node/disk7/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"ssh 15.184.9.94 "ls -lah /srv/node/disk11/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"ssh 15.184.9.103
+      "ls -lah /srv/node/disk10/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"ssh 15.184.9.80 "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/" # [Handoff]ssh 15.184.9.120
+      "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/" # [Handoff]ssh 15.184.9.98 "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/" # [Handoff]
+
+Check that the handoff nodes do not have account databases:
+
+.. code::
+
+   $ ssh 15.184.9.80 "ls -lah /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/"
+   ls: cannot access /srv/node/disk2/accounts/18914/e78/93c41ef56dd69173a9524193ab813e78/: No such file or directory
+
+If the handoff node has a database, wait for rebalancing to occur.
+
+Procedure: Temporarily stop load balancers from directing traffic to a proxy server
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can stop the load balancers sending requests to a proxy server as
+follows. This can be useful when a proxy is misbehaving but you need
+Swift running to help diagnose the problem. By removing from the load
+balancers, customer's are not impacted by the misbehaving proxy.
+
+#. Ensure that in proxyserver.com the ``disable_path`` variable is set to
+   ``/etc/swift/disabled-by-file``.
+
+#. Log onto the proxy node.
+
+#. Shut down Swift as follows:
+
+   .. code::
+
+      sudo swift-init proxy shutdown
+
+      .. note::
+
+         Shutdown, not stop.
+
+#. Create the ``/etc/swift/disabled-by-file`` file. For example:
+
+   .. code::
+
+      sudo touch /etc/swift/disabled-by-file
+
+#. Optional, restart Swift:
+
+   .. code::
+
+      sudo swift-init proxy start
+
+It works because the healthcheck middleware looks for this file. If it
+find it, it will return 503 error instead of 200/OK. This means the load balancer
+should stop sending traffic to the proxy.
+
+``/healthcheck`` will report
+``FAIL: disabled by file`` if the ``disabled-by-file`` file exists.
+
+Procedure: Ad-Hoc disk performance test
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can get an idea whether a disk drive is performing as follows:
+
+.. code::
+
+   sudo dd bs=1M count=256 if=/dev/zero conv=fdatasync of=/srv/node/disk11/remember-to-delete-this-later
+
+You can expect ~600MB/sec. If you get a low number, repeat many times as
+Swift itself may also read or write to the disk, hence giving a lower
+number.
diff --git a/doc/source/ops_runbook/sec-furtherdiagnose.rst b/doc/source/ops_runbook/sec-furtherdiagnose.rst
new file mode 100644
index 0000000000..dd8154a3d9
--- /dev/null
+++ b/doc/source/ops_runbook/sec-furtherdiagnose.rst
@@ -0,0 +1,177 @@
+==============================
+Further issues and resolutions
+==============================
+
+.. note::
+
+   The urgency levels in each **Action** column indicates whether or
+   not it is required to take immediate action, or if the problem can be worked
+   on during business hours.
+
+.. list-table::
+   :widths: 33 33 33
+   :header-rows: 1
+
+   * - **Scenario**
+     - **Description**
+     - **Action**
+   * - ``/healthcheck`` latency is high.
+     - The ``/healthcheck`` test does not tax the proxy very much so any drop in value is probably related to
+       network issues, rather than the proxies being very busy. A very slow proxy might impact the average
+       number, but it would need to be very slow to shift the number that much.
+     - Check networks. Do a ``curl https://<ip-address>/healthcheck where ip-address`` is individual proxy
+       IP address to see if you can pin point a problem in the network.
+
+       Urgency: If there are other indications that your system is slow, you should treat
+       this as an urgent problem.
+   * - Swift process is not running.
+     - You can use ``swift-init`` status to check if swift processes are running on any
+       given server.
+     - Run this command:
+       .. code::
+
+          sudo swift-init all start
+
+       Examine messages in the swift log files to see if there are any
+       error messages related to any of the swift processes since the time you
+       ran the ``swift-init`` command.
+
+       Take any corrective actions that seem necessary.
+
+       Urgency: If this only affects one server, and you have more than one,
+       identifying and fixing the problem can wait until business hours.
+       If this same problem affects many servers, then you need to take corrective
+       action immediately.
+   * - ntpd is not running.
+     - NTP is not running.
+     - Configure and start NTP.
+       Urgency: For proxy servers, this is vital.
+
+   * - Host clock is not syncd to an NTP server.
+     - Node time settings does not match NTP server time.
+       This may take some time to sync after a reboot.
+     - Assuming NTP is configured and running, you have to wait until the times sync.
+   * - A swift process has hundreds, to thousands of open file descriptors.
+     - May happen to any of the swift processes.
+       Known to have happened with a ``rsyslod restart`` and where ``/tmp`` was hanging.
+
+     - Restart the swift processes on the affected node:
+
+       .. code::
+
+          % sudo swift-init all reload
+
+       Urgency:
+                If known performance problem: Immediate
+
+                If system seems fine: Medium
+   * - A swift process is not owned by the swift user.
+     - If the UID of the swift user has changed, then the processes might not be
+       owned by that UID.
+     - Urgency: If this only affects one server, and you have more than one,
+       identifying and fixing the problem can wait until business hours.
+       If this same problem affects many servers, then you need to take corrective
+       action immediately.
+   * - Object account or container files not owned by swift.
+     - This typically happens if during a reinstall or a re-image of a server that the UID
+       of the swift user was changed. The data files in the object account and container
+       directories are owned by the original swift UID. As a result, the current swift
+       user does not own these files.
+     - Correct the UID of the swift user to reflect that of the original UID. An alternate
+       action is to change the ownership of every file on all file systems. This alternate
+       action is often impractical and will take considerable time.
+
+       Urgency: If this only affects one server, and you have more than one,
+       identifying and fixing the problem can wait until business hours.
+       If this same problem affects many servers, then you need to take corrective
+       action immediately.
+   * - A disk drive has a high IO wait or service time.
+     - If high wait IO times are seen for a single disk, then the disk drive is the problem.
+       If most/all devices are slow, the controller is probably the source of the problem.
+       The controller cache may also be miss configured – which will cause similar long
+       wait or service times.
+     - As a first step, if your controllers have a cache, check that it is enabled and their battery/capacitor
+       is working.
+
+       Second, reboot the server.
+       If problem persists, file a DC ticket to have the drive or controller replaced.
+       See `Diagnose: Slow disk devices` on how to check the drive wait or service times.
+
+       Urgency: Medium
+   * - The network interface is not up.
+     - Use the ``ifconfig`` and ``ethtool`` commands to determine the network state.
+     - You can try restarting the interface. However, generally the interface
+       (or cable) is probably broken, especially if the interface is flapping.
+
+       Urgency: If this only affects one server, and you have more than one,
+       identifying and fixing the problem can wait until business hours.
+       If this same problem affects many servers, then you need to take corrective
+       action immediately.
+   * - Network interface card (NIC) is not operating at the expected speed.
+     - The NIC is running at a slower speed than its nominal rated speed.
+       For example, it is running at 100 Mb/s and the NIC is a 1Ge NIC.
+     - 1. Try resetting the interface with:
+
+       .. code::
+
+          sudo ethtool -s eth0 speed 1000
+
+       ... and then run:
+
+       .. code::
+
+          sudo lshw -class
+
+       See if size goes to the expected speed. Failing
+       that, check hardware (NIC cable/switch port).
+
+       2. If persistent, consider shutting down the server (especially if a proxy)
+          until the problem is identified and resolved. If you leave this server
+          running it can have a large impact on overall performance.
+
+       Urgency: High
+   * - The interface RX/TX error count is non-zero.
+     - A value of 0 is typical, but counts of 1 or 2 do not indicate a problem.
+     - 1. For low numbers (For example, 1 or 2), you can simply ignore. Numbers in the range
+          3-30 probably indicate that the error count has crept up slowly over a long time.
+          Consider rebooting the server to remove the report from the noise.
+
+          Typically, when a cable or interface is bad, the error count goes to 400+. For example,
+          it stands out. There may be other symptoms such as the interface going up and down or
+          not running at correct speed. A server with a high error count should be watched.
+
+       2. If the error count continue to climb, consider taking the server down until
+          it can be properly investigated. In any case, a reboot should be done to clear
+          the error count.
+
+       Urgency: High, if the error count increasing.
+
+   * - In a swift log you see a message that a process has not replicated in over 24 hours.
+     - The replicator has not successfully completed a run in the last 24 hours.
+       This indicates that the replicator has probably hung.
+     - Use ``swift-init`` to stop and then restart the replicator process.
+
+       Urgency: Low (high if recent adding or replacement of disk drives), however if you
+       recently added or replaced disk drives then you should treat this urgently.
+   * - Container Updater has not run in 4 hour(s).
+     - The service may appear to be running however, it may be hung. Examine their swift
+       logs to see if there are any error messages relating to the container updater. This
+       may potentially explain why the container is not running.
+     - Urgency: Medium
+       This may have been triggered by a recent restart of the  rsyslog daemon.
+       Restart the service with:
+       .. code::
+
+          sudo swift-init <service> reload
+   * - Object replicator: Reports the remaining time and that time is more than 100 hours.
+     - Each replication cycle the object replicator writes a log message to its log
+       reporting statistics about the current cycle. This includes an estimate for the
+       remaining time needed to replicate all objects. If this time is longer than
+       100 hours, there is a problem with the replication process.
+     - Urgency: Medium
+       Restart the service with:
+       .. code::
+
+          sudo swift-init object-replicator reload
+
+       Check that the remaining replication time is going down.
diff --git a/doc/source/ops_runbook/troubleshooting.rst b/doc/source/ops_runbook/troubleshooting.rst
new file mode 100644
index 0000000000..d097ce0673
--- /dev/null
+++ b/doc/source/ops_runbook/troubleshooting.rst
@@ -0,0 +1,264 @@
+====================
+Troubleshooting tips
+====================
+
+Diagnose: Customer complains they receive a HTTP status 500 when trying to browse containers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This entry is prompted by a real customer issue and exclusively focused on how
+that problem was identified.
+There are many reasons why a http status of 500 could be returned. If
+there are no obvious problems with the swift object store, then it may
+be necessary to take a closer look at the users transactions.
+After finding the users swift account, you can
+search the swift proxy logs on each swift proxy server for
+transactions from this user. The linux ``bzgrep`` command can be used to
+search all the proxy log files on a node including the ``.bz2`` compressed
+files. For example:
+
+.. code::
+
+   $ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh
+
+    -w <redacted>.68.[4-11,132-139 4-11,132-139],<redacted>.132.[4-11,132-139
+    4-11,132-139] 'sudo bzgrep -w AUTH_redacted-4962-4692-98fb-52ddda82a5af /var/log/swift/proxy.log\*' 
+    dshbak -c
+    .
+    .
+    \---------------\-
+    <redacted>.132.6
+    \---------------\-
+    Feb 29 08:51:57 sw-aw2az2-proxy011 proxy-server <redacted>.16.132
+    <redacted>.66.8 29/Feb/2012/08/51/57 GET /v1.0/AUTH_redacted-4962-4692-98fb-52ddda82a5af
+    /%3Fformat%3Djson HTTP/1.0 404 - - <REDACTED>_4f4d50c5e4b064d88bd7ab82 - - -
+    tx429fc3be354f434ab7f9c6c4206c1dc3 - 0.0130
+
+This shows a ``GET`` operation on the users account.
+
+.. note::
+
+   The HTTP status returned is 404, not found, rather than 500 as reported by the user.
+
+Using the transaction ID, ``tx429fc3be354f434ab7f9c6c4206c1dc3`` you can
+search the swift object servers log files for this transaction ID:
+
+.. code::
+
+   $ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername>
+
+   -R ssh
+   -w <redacted>.72.[4-67|4-67],<redacted>.[4-67|4-67],<redacted>.[4-67|4-67],<redacted>.204.[4-131| 4-131]
+   'sudo bzgrep tx429fc3be354f434ab7f9c6c4206c1dc3 /var/log/swift/server.log*'
+      | dshbak -c
+   .
+   .
+   \---------------\-
+   <redacted>.72.16
+   \---------------\-
+   Feb 29 08:51:57 sw-aw2az1-object013 account-server <redacted>.132.6 - -
+
+   [29/Feb/2012:08:51:57 +0000|] "GET /disk9/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
+   404 - "tx429fc3be354f434ab7f9c6c4206c1dc3" "-" "-"
+
+   0.0016 ""
+    \---------------\-
+    <redacted>.31
+    \---------------\-
+    Feb 29 08:51:57 node-az2-object060 account-server <redacted>.132.6 - -
+    [29/Feb/2012:08:51:57 +0000|] "GET /disk6/198875/AUTH_redacted-4962-
+    4692-98fb-52ddda82a5af" 404 - "tx429fc3be354f434ab7f9c6c4206c1dc3" "-" "-" 0.0011 ""
+    \---------------\-
+    <redacted>.204.70
+    \---------------\-
+
+    Feb 29 08:51:57 sw-aw2az3-object0067 account-server <redacted>.132.6 - -
+    [29/Feb/2012:08:51:57 +0000|] "GET /disk6/198875/AUTH_redacted-4962-
+    4692-98fb-52ddda82a5af" 404 - "tx429fc3be354f434ab7f9c6c4206c1dc3" "-" "-" 0.0014 ""
+
+.. note::
+
+   The 3 GET operations to 3 different object servers that hold the 3
+   replicas of this users account. Each ``GET`` returns a HTTP status of 404,
+   not found.
+
+Next, use the ``swift-get-nodes`` command to determine exactly where the
+users account data is stored:
+
+.. code::
+
+   $ sudo swift-get-nodes /etc/swift/account.ring.gz AUTH_redacted-4962-4692-98fb-52ddda82a5af
+   Account AUTH_redacted-4962-4692-98fb-52ddda82a5af
+   Container None
+   Object None
+
+   Partition 198875
+   Hash 1846d99185f8a0edaf65cfbf37439696
+
+   Server:Port Device <redacted>.31:6002 disk6
+   Server:Port Device <redacted>.204.70:6002 disk6
+   Server:Port Device <redacted>.72.16:6002 disk9
+   Server:Port Device <redacted>.204.64:6002 disk11 [Handoff]
+   Server:Port Device <redacted>.26:6002 disk11 [Handoff]
+   Server:Port Device <redacted>.72.27:6002 disk11 [Handoff]
+
+   curl -I -XHEAD "`http://<redacted>.31:6002/disk6/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
+   <http://15.185.138.31:6002/disk6/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_
+   curl -I -XHEAD "`http://<redacted>.204.70:6002/disk6/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
+   <http://15.185.204.70:6002/disk6/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_
+   curl -I -XHEAD "`http://<redacted>.72.16:6002/disk9/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
+   <http://15.185.72.16:6002/disk9/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_
+   curl -I -XHEAD "`http://<redacted>.204.64:6002/disk11/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
+   <http://15.185.204.64:6002/disk11/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_ # [Handoff]
+   curl -I -XHEAD "`http://<redacted>.26:6002/disk11/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
+   <http://15.185.136.26:6002/disk11/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_ # [Handoff]
+   curl -I -XHEAD "`http://<redacted>.72.27:6002/disk11/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
+   <http://15.185.72.27:6002/disk11/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_ # [Handoff]
+
+   ssh <redacted>.31 "ls \-lah /srv/node/disk6/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
+   ssh <redacted>.204.70 "ls \-lah /srv/node/disk6/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
+   ssh <redacted>.72.16 "ls \-lah /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
+   ssh <redacted>.204.64 "ls \-lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
+   ssh <redacted>.26 "ls \-lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
+   ssh <redacted>.72.27 "ls \-lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
+
+Check each of the primary servers, <redacted>.31, <redacted>.204.70  and <redacted>.72.16, for
+this users account. For example on <redacted>.72.16:
+
+.. code::
+
+   $ ls \\-lah /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/
+   total 1.0M
+   drwxrwxrwx 2 swift swift 98 2012-02-23 14:49 .
+   drwxrwxrwx 3 swift swift 45 2012-02-03 23:28 ..
+   -rw-\\-----\\- 1 swift swift 15K 2012-02-23 14:49 1846d99185f8a0edaf65cfbf37439696.db
+   -rw-rw-rw- 1 swift swift 0 2012-02-23 14:49 1846d99185f8a0edaf65cfbf37439696.db.pending
+
+So this users account db, an sqlite db is present. Use sqlite to
+checkout the account:
+
+.. code::
+
+   $ sudo cp /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/1846d99185f8a0edaf65cfbf37439696.db /tmp
+   $ sudo sqlite3 /tmp/1846d99185f8a0edaf65cfbf37439696.db
+   sqlite> .mode line
+   sqlite> select * from account_stat;
+   account = AUTH_redacted-4962-4692-98fb-52ddda82a5af
+   created_at = 1328311738.42190
+   put_timestamp = 1330000873.61411
+   delete_timestamp = 1330001026.00514
+   container_count = 0
+   object_count = 0
+   bytes_used = 0
+   hash = eb7e5d0ea3544d9def940b19114e8b43
+   id = 2de8c8a8-cef9-4a94-a421-2f845802fe90
+   status = DELETED
+   status_changed_at = 1330001026.00514
+   metadata =
+
+.. note::
+
+   The status is ``DELETED``. So this account was deleted. This explains
+   why the GET operations are returning 404, not found. Check the account
+   delete date/time:
+
+   .. code::
+
+      $ python
+
+      >>> import time
+      >>> time.ctime(1330001026.00514)
+      'Thu Feb 23 12:43:46 2012'
+
+Next try and find the ``DELETE`` operation for this account in the proxy
+server logs:
+
+.. code::
+
+   $ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh -w <redacted>.68.[4-11,132-139 4-11,132-
+   139],<redacted>.132.[4-11,132-139|4-11,132-139] 'sudo bzgrep AUTH_redacted-4962-4692-98fb-52ddda82a5af /var/log/swift/proxy.log\* | grep -w
+   DELETE |awk "{print \\$3,\\$10,\\$12}"' |- dshbak -c
+   .
+   .
+   Feb 23 12:43:46 sw-aw2az2-proxy001 proxy-server 15.203.233.76 <redacted>.66.7 23/Feb/2012/12/43/46 DELETE /v1.0/AUTH_redacted-4962-4692-98fb-
+   52ddda82a5af/ HTTP/1.0 204 - Apache-HttpClient/4.1.2%20%28java%201.5%29 <REDACTED>_4f458ee4e4b02a869c3aad02 - - -
+
+   tx4471188b0b87406899973d297c55ab53 - 0.0086
+
+From this you can see the operation that resulted in the account being deleted.
+
+Procedure: Deleting objects
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Simple case - deleting small number of objects and containers
+-------------------------------------------------------------
+
+.. note::
+
+   ``swift-direct`` is specific to the Hewlett Packard Enterprise Helion Public Cloud.
+   Use ``swiftly`` as an alternative.
+
+.. note::
+
+   Object and container names are in UTF8. Swift direct accepts UTF8
+   directly, not URL-encoded UTF8 (the REST API expects UTF8 and then
+   URL-encoded). In practice cut and paste of foreign language strings to
+   a terminal window will produce the right result.
+
+   Hint: Use the ``head`` command before any destructive commands.
+
+To delete a small number of objects, log into any proxy node and proceed
+as follows:
+
+Examine the object in question:
+
+.. code::
+
+   $ sudo -u swift /opt/hp/swift/bin/swift-direct head 132345678912345 container_name obj_name
+
+See if ``X-Object-Manifest`` or ``X-Static-Large-Object`` is set,
+then this is the manifest object and segment objects may be in another
+container.
+
+If the ``X-Object-Manifest`` attribute is set, you need to find the
+name of the objects this means it is a DLO. For example,
+if ``X-Object-Manifest`` is ``container2/seg-blah``, list the contents
+of the container container2 as follows:
+
+.. code::
+
+   $ sudo -u swift /opt/hp/swift/bin/swift-direct show 132345678912345 container2
+
+Pick out the objects whose names start with ``seg-blah``.
+Delete the segment objects as follows:
+
+.. code::
+
+   $ sudo -u swift /opt/hp/swift/bin/swift-direct delete 132345678912345 container2 seg-blah01
+   $ sudo -u swift /opt/hp/swift/bin/swift-direct delete 132345678912345 container2 seg-blah02
+   etc
+
+If ``X-Static-Large-Object`` is set, you need to read the contents. Do this by:
+
+-  Using swift-get-nodes to get the details of the object's location.
+-  Change the ``-X HEAD`` to ``-X GET`` and run ``curl`` against one copy.
+-  This lists a json body listing containers and object names
+-  Delete the objects as described above for DLO segments
+
+Once the segments are deleted, you can delete the object using
+``swift-direct`` as described above.
+
+Finally, use ``swift-direct`` to delete the container.
+
+Procedure: Decommissioning swift nodes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Should Swift nodes need to be decommissioned. For example, where they are being
+re-purposed, it is very important to follow the following steps.
+
+#. In the case of object servers, follow the procedure for removing
+   the node from the rings.
+#. In the case of swift proxy servers, have the network team remove
+   the node from the load balancers.
+#. Open a network ticket to have the node removed from network
+   firewalls.
+#. Make sure that you remove the ``/etc/swift`` directory and everything in it.