52a4fe37aa
* Get rid of a bunch of accidental blockquote formatting * Always declare a lexer to use for ``.. code::`` blocks Change-Id: I8940e75b094843e542e815dde6b6be4740751813
260 lines
11 KiB
ReStructuredText
260 lines
11 KiB
ReStructuredText
====================
|
|
Troubleshooting tips
|
|
====================
|
|
|
|
Diagnose: Customer complains they receive a HTTP status 500 when trying to browse containers
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
This entry is prompted by a real customer issue and exclusively focused on how
|
|
that problem was identified.
|
|
There are many reasons why a http status of 500 could be returned. If
|
|
there are no obvious problems with the swift object store, then it may
|
|
be necessary to take a closer look at the users transactions.
|
|
After finding the users swift account, you can
|
|
search the swift proxy logs on each swift proxy server for
|
|
transactions from this user. The linux ``bzgrep`` command can be used to
|
|
search all the proxy log files on a node including the ``.bz2`` compressed
|
|
files. For example:
|
|
|
|
.. code:: console
|
|
|
|
$ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh \
|
|
-w <redacted>.68.[4-11,132-139 4-11,132-139],<redacted>.132.[4-11,132-139] \
|
|
'sudo bzgrep -w AUTH_redacted-4962-4692-98fb-52ddda82a5af /var/log/swift/proxy.log*' | dshbak -c
|
|
.
|
|
.
|
|
----------------
|
|
<redacted>.132.6
|
|
----------------
|
|
Feb 29 08:51:57 sw-aw2az2-proxy011 proxy-server <redacted>.16.132
|
|
<redacted>.66.8 29/Feb/2012/08/51/57 GET /v1.0/AUTH_redacted-4962-4692-98fb-52ddda82a5af
|
|
/%3Fformat%3Djson HTTP/1.0 404 - - <REDACTED>_4f4d50c5e4b064d88bd7ab82 - - -
|
|
tx429fc3be354f434ab7f9c6c4206c1dc3 - 0.0130
|
|
|
|
This shows a ``GET`` operation on the users account.
|
|
|
|
.. note::
|
|
|
|
The HTTP status returned is 404, Not found, rather than 500 as reported by the user.
|
|
|
|
Using the transaction ID, ``tx429fc3be354f434ab7f9c6c4206c1dc3`` you can
|
|
search the swift object servers log files for this transaction ID:
|
|
|
|
.. code:: console
|
|
|
|
$ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh \
|
|
-w <redacted>.72.[4-67|4-67],<redacted>.[4-67|4-67],<redacted>.[4-67|4-67],<redacted>.204.[4-131] \
|
|
'sudo bzgrep tx429fc3be354f434ab7f9c6c4206c1dc3 /var/log/swift/server.log*' | dshbak -c
|
|
.
|
|
.
|
|
----------------
|
|
<redacted>.72.16
|
|
----------------
|
|
Feb 29 08:51:57 sw-aw2az1-object013 account-server <redacted>.132.6 - -
|
|
|
|
[29/Feb/2012:08:51:57 +0000|] "GET /disk9/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
|
|
404 - "tx429fc3be354f434ab7f9c6c4206c1dc3" "-" "-"
|
|
|
|
0.0016 ""
|
|
----------------
|
|
<redacted>.31
|
|
----------------
|
|
Feb 29 08:51:57 node-az2-object060 account-server <redacted>.132.6 - -
|
|
[29/Feb/2012:08:51:57 +0000|] "GET /disk6/198875/AUTH_redacted-4962-
|
|
4692-98fb-52ddda82a5af" 404 - "tx429fc3be354f434ab7f9c6c4206c1dc3" "-" "-" 0.0011 ""
|
|
----------------
|
|
<redacted>.204.70
|
|
----------------
|
|
|
|
Feb 29 08:51:57 sw-aw2az3-object0067 account-server <redacted>.132.6 - -
|
|
[29/Feb/2012:08:51:57 +0000|] "GET /disk6/198875/AUTH_redacted-4962-
|
|
4692-98fb-52ddda82a5af" 404 - "tx429fc3be354f434ab7f9c6c4206c1dc3" "-" "-" 0.0014 ""
|
|
|
|
.. note::
|
|
|
|
The 3 GET operations to 3 different object servers that hold the 3
|
|
replicas of this users account. Each ``GET`` returns a HTTP status of 404,
|
|
Not found.
|
|
|
|
Next, use the ``swift-get-nodes`` command to determine exactly where the
|
|
user's account data is stored:
|
|
|
|
.. code:: console
|
|
|
|
$ sudo swift-get-nodes /etc/swift/account.ring.gz AUTH_redacted-4962-4692-98fb-52ddda82a5af
|
|
Account AUTH_redacted-4962-4692-98fb-52ddda82a5af
|
|
Container None
|
|
Object None
|
|
|
|
Partition 198875
|
|
Hash 1846d99185f8a0edaf65cfbf37439696
|
|
|
|
Server:Port Device <redacted>.31:6202 disk6
|
|
Server:Port Device <redacted>.204.70:6202 disk6
|
|
Server:Port Device <redacted>.72.16:6202 disk9
|
|
Server:Port Device <redacted>.204.64:6202 disk11 [Handoff]
|
|
Server:Port Device <redacted>.26:6202 disk11 [Handoff]
|
|
Server:Port Device <redacted>.72.27:6202 disk11 [Handoff]
|
|
|
|
curl -I -XHEAD "`http://<redacted>.31:6202/disk6/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
|
|
<http://15.185.138.31:6202/disk6/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_
|
|
curl -I -XHEAD "`http://<redacted>.204.70:6202/disk6/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
|
|
<http://15.185.204.70:6202/disk6/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_
|
|
curl -I -XHEAD "`http://<redacted>.72.16:6202/disk9/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
|
|
<http://15.185.72.16:6202/disk9/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_
|
|
curl -I -XHEAD "`http://<redacted>.204.64:6202/disk11/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
|
|
<http://15.185.204.64:6202/disk11/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_ # [Handoff]
|
|
curl -I -XHEAD "`http://<redacted>.26:6202/disk11/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
|
|
<http://15.185.136.26:6202/disk11/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_ # [Handoff]
|
|
curl -I -XHEAD "`http://<redacted>.72.27:6202/disk11/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
|
|
<http://15.185.72.27:6202/disk11/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_ # [Handoff]
|
|
|
|
ssh <redacted>.31 "ls -lah /srv/node/disk6/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
|
|
ssh <redacted>.204.70 "ls -lah /srv/node/disk6/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
|
|
ssh <redacted>.72.16 "ls -lah /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
|
|
ssh <redacted>.204.64 "ls -lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
|
|
ssh <redacted>.26 "ls -lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
|
|
ssh <redacted>.72.27 "ls -lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
|
|
|
|
Check each of the primary servers, <redacted>.31, <redacted>.204.70 and <redacted>.72.16, for
|
|
this users account. For example on <redacted>.72.16:
|
|
|
|
.. code:: console
|
|
|
|
$ ls -lah /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/
|
|
total 1.0M
|
|
drwxrwxrwx 2 swift swift 98 2012-02-23 14:49 .
|
|
drwxrwxrwx 3 swift swift 45 2012-02-03 23:28 ..
|
|
-rw------- 1 swift swift 15K 2012-02-23 14:49 1846d99185f8a0edaf65cfbf37439696.db
|
|
-rw-rw-rw- 1 swift swift 0 2012-02-23 14:49 1846d99185f8a0edaf65cfbf37439696.db.pending
|
|
|
|
So this users account db, an sqlite db is present. Use sqlite to
|
|
checkout the account:
|
|
|
|
.. code:: console
|
|
|
|
$ sudo cp /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/1846d99185f8a0edaf65cfbf37439696.db /tmp
|
|
$ sudo sqlite3 /tmp/1846d99185f8a0edaf65cfbf37439696.db
|
|
sqlite> .mode line
|
|
sqlite> select * from account_stat;
|
|
account = AUTH_redacted-4962-4692-98fb-52ddda82a5af
|
|
created_at = 1328311738.42190
|
|
put_timestamp = 1330000873.61411
|
|
delete_timestamp = 1330001026.00514
|
|
container_count = 0
|
|
object_count = 0
|
|
bytes_used = 0
|
|
hash = eb7e5d0ea3544d9def940b19114e8b43
|
|
id = 2de8c8a8-cef9-4a94-a421-2f845802fe90
|
|
status = DELETED
|
|
status_changed_at = 1330001026.00514
|
|
metadata =
|
|
|
|
.. note:
|
|
|
|
The status is ``DELETED``. So this account was deleted. This explains
|
|
why the GET operations are returning 404, not found. Check the account
|
|
delete date/time:
|
|
|
|
.. code:: console
|
|
|
|
$ python
|
|
|
|
>>> import time
|
|
>>> time.ctime(1330001026.00514)
|
|
'Thu Feb 23 12:43:46 2012'
|
|
|
|
Next try and find the ``DELETE`` operation for this account in the proxy
|
|
server logs:
|
|
|
|
.. code:: console
|
|
|
|
$ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh \
|
|
-w <redacted>.68.[4-11,132-139 4-11,132-139],<redacted>.132.[4-11,132-139|4-11,132-139] \
|
|
'sudo bzgrep AUTH_redacted-4962-4692-98fb-52ddda82a5af /var/log/swift/proxy.log* \
|
|
| grep -w DELETE | awk "{print $3,$10,$12}"' |- dshbak -c
|
|
.
|
|
.
|
|
Feb 23 12:43:46 sw-aw2az2-proxy001 proxy-server <redacted> <redacted>.66.7 23/Feb/2012/12/43/46 DELETE /v1.0/AUTH_redacted-4962-4692-98fb-
|
|
52ddda82a5af/ HTTP/1.0 204 - Apache-HttpClient/4.1.2%20%28java%201.5%29 <REDACTED>_4f458ee4e4b02a869c3aad02 - - -
|
|
tx4471188b0b87406899973d297c55ab53 - 0.0086
|
|
|
|
From this you can see the operation that resulted in the account being deleted.
|
|
|
|
Procedure: Deleting objects
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Simple case - deleting small number of objects and containers
|
|
-------------------------------------------------------------
|
|
|
|
.. note::
|
|
|
|
``swift-direct`` is specific to the Hewlett Packard Enterprise Helion Public Cloud.
|
|
Use ``swiftly`` as an alternative.
|
|
|
|
.. note::
|
|
|
|
Object and container names are in UTF8. Swift direct accepts UTF8
|
|
directly, not URL-encoded UTF8 (the REST API expects UTF8 and then
|
|
URL-encoded). In practice cut and paste of foreign language strings to
|
|
a terminal window will produce the right result.
|
|
|
|
Hint: Use the ``head`` command before any destructive commands.
|
|
|
|
To delete a small number of objects, log into any proxy node and proceed
|
|
as follows:
|
|
|
|
Examine the object in question:
|
|
|
|
.. code:: console
|
|
|
|
$ sudo -u swift /opt/hp/swift/bin/swift-direct head 132345678912345 container_name obj_name
|
|
|
|
See if ``X-Object-Manifest`` or ``X-Static-Large-Object`` is set,
|
|
then this is the manifest object and segment objects may be in another
|
|
container.
|
|
|
|
If the ``X-Object-Manifest`` attribute is set, you need to find the
|
|
name of the objects this means it is a DLO. For example,
|
|
if ``X-Object-Manifest`` is ``container2/seg-blah``, list the contents
|
|
of the container container2 as follows:
|
|
|
|
.. code:: console
|
|
|
|
$ sudo -u swift /opt/hp/swift/bin/swift-direct show 132345678912345 container2
|
|
|
|
Pick out the objects whose names start with ``seg-blah``.
|
|
Delete the segment objects as follows:
|
|
|
|
.. code:: console
|
|
|
|
$ sudo -u swift /opt/hp/swift/bin/swift-direct delete 132345678912345 container2 seg-blah01
|
|
$ sudo -u swift /opt/hp/swift/bin/swift-direct delete 132345678912345 container2 seg-blah02
|
|
etc
|
|
|
|
If ``X-Static-Large-Object`` is set, you need to read the contents. Do this by:
|
|
|
|
- Using swift-get-nodes to get the details of the object's location.
|
|
- Change the ``-X HEAD`` to ``-X GET`` and run ``curl`` against one copy.
|
|
- This lists a JSON body listing containers and object names
|
|
- Delete the objects as described above for DLO segments
|
|
|
|
Once the segments are deleted, you can delete the object using
|
|
``swift-direct`` as described above.
|
|
|
|
Finally, use ``swift-direct`` to delete the container.
|
|
|
|
Procedure: Decommissioning swift nodes
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Should Swift nodes need to be decommissioned (e.g.,, where they are being
|
|
re-purposed), it is very important to follow the following steps.
|
|
|
|
#. In the case of object servers, follow the procedure for removing
|
|
the node from the rings.
|
|
#. In the case of swift proxy servers, have the network team remove
|
|
the node from the load balancers.
|
|
#. Open a network ticket to have the node removed from network
|
|
firewalls.
|
|
#. Make sure that you remove the ``/etc/swift`` directory and everything in it.
|