Improve path handling in proxy_logging

In the access log, proxy_logging places the URL path,
double-URL-encoded (e.g. " " -> %20 -> %2520).

In the WSGI environment, env['PATH_INFO'] is the unquoted path sent by
the client, e.g. "/v1/a/c/o" or "/v1/a/c/obj with spaces" or
"/v1/a/c/u-\xf0\x9f\xa4\x94". On the request, we have the "path"
attribute, which returns the quoted form of env["PATH_INFO"].

Note that, under Python 2, these are always str objects, not unicode.

Swift used to log this:

    quote(quote(unquote(
        req.path.decode('utf-8', 'replace').encode('utf-8')), "/:"))

There's a lot going on here. Let's look at the innermost expression:

    req.path.decode('utf-8', 'replace').encode('utf-8')

This appears to be trying to replace invalid UTF-8 sequences with the
Unicode replacement character. However, since we start from
"req.path", which quotes its input, we don't have any non-ASCII bytes,
so the decode/encode is a no-op. That is, if env["PATH_INFO"] had
"\xff\xff" in it, req.path would have "%FF%FF", so there's no invalid
byte sequences to escape.

So, without changing the output, we can simplify the above code to:

    quote(quote(unquote(req.path), "/:"))

The second call to quote() has a second argument of "/:", so neither
slashes or colons are quoted. This was introduced in commit cd8af3f8f1
as part of a change to log client IPv6 addresses (from
env["REMOTE_ADDR") without escaping the colons. However, cd8af3f8f1
also applied that to the logging of env["PATH_INFO"]. This seems
unhelpful since IPv6 addresses in URL paths are not particularly
common, so I've removed it. The change is backwards-compatible since a
double-unquote of the logged value will still return the original
PATH_INFO.

With that in mind, let's simplify further. req.path quotes
env["PATH_INFO"], so that cancels out one unquote() call, leaving us
with this:

    quote(req.path_info)

which produces basically the same result as before, but faster.

Change-Id: I7e665eb55a9524aa49e2578f2084d8354f6c1e31
This commit is contained in:
Samuel Merritt 2018-04-20 16:36:19 -07:00
parent 57a4545e51
commit f8419bf68a

View File

@ -75,10 +75,10 @@ import sys
import time import time
import six import six
from six.moves.urllib.parse import quote, unquote from six.moves.urllib.parse import quote
from swift.common.swob import Request from swift.common.swob import Request
from swift.common.utils import (get_logger, get_remote_client, from swift.common.utils import (get_logger, get_remote_client,
get_valid_utf8_str, config_true_value, config_true_value,
InputProxy, list_from_csv, get_policy_index) InputProxy, list_from_csv, get_policy_index)
from swift.common.storage_policy import POLICIES from swift.common.storage_policy import POLICIES
@ -152,8 +152,7 @@ class ProxyLoggingMiddleware(object):
:param resp_headers: dict of the response headers :param resp_headers: dict of the response headers
""" """
resp_headers = resp_headers or {} resp_headers = resp_headers or {}
req_path = get_valid_utf8_str(req.path) the_request = req.path
the_request = quote(unquote(req_path), QUOTE_SAFE)
if req.query_string: if req.query_string:
the_request = the_request + '?' + req.query_string the_request = the_request + '?' + req.query_string
logged_headers = None logged_headers = None