[HBase] Improve uniqueness for row in meter table

Currently, row in meter table consists of reversed timestamp,
meter and an md5 of user+resource+project for purposes of
uniqueness. But md5 of user+resource+project may be the same
for different meters and only "rts" provides the difference.
But in the tests or under the high load different meters
from the same user, resource and project may have the same
timestamp.
In this patchset row contains a message signature instead of
md5 of user+resource+project, because message signature has
dependecies from all fields of sample, it provides more uniqueness.

Change-Id: Ic767b257176e05c0b35664968e7f529f507eab11
This commit is contained in:
Ilya Tyaptin 2014-08-01 17:30:30 +04:00
parent b6a597804f
commit bf1b9937f7

View File

@ -13,7 +13,6 @@
"""HBase storage backend
"""
import datetime
import hashlib
import operator
import os
import time
@ -60,8 +59,8 @@ class Connection(base.Connection):
- meter (describes sample actually):
- row-key: consists of reversed timestamp, meter and an md5 of
user+resource+project for purposes of uniqueness
- row-key: consists of reversed timestamp, meter and a message signature
for purposes of uniqueness
- Column Families:
f: contains the following qualifiers:
@ -250,13 +249,10 @@ class Connection(base.Connection):
ts = int(time.mktime(data['timestamp'].timetuple()) * 1000)
resource_table.put(data['resource_id'], resource, ts)
# TODO(nprivalova): improve uniqueness
# Rowkey consists of reversed timestamp, meter and an md5 of
# user+resource+project for purposes of uniqueness
m = hashlib.md5()
m.update("%s%s%s" % (data['user_id'], data['resource_id'],
data['project_id']))
row = "%s_%d_%s" % (data['counter_name'], rts, m.hexdigest())
# Rowkey consists of reversed timestamp, meter and a
# message signature for purposes of uniqueness
row = "%s_%d_%s" % (data['counter_name'], rts,
data['message_signature'])
record = hbase_utils.serialize_entry(
data, **{'source': data['source'], 'rts': rts,
'message': data, 'recorded_at': timeutils.utcnow()})