vmware-nsx/neutron
Kevin Benton 029e9f7c5a BSN: Optimistic locking strategy for consistency
Summary:
  Adds an optimistic locking strategy for the Big Switch
  server manager so multiple Neutron servers wanting to
  communicate with the backend do not receive the consistency
  hash for use simultaneously.

  The bsn-rest-call semaphore is removed because serialization
  is now provided by the new locking scheme.

  A new DB engine is added because the consistency hashes
  need a life-cycle with rollbacks and other DB operations
  than cannot impact or be impacted by database operations
  happening on the regular Neutron objects.

  Unit tests are included for each of the new branches
  introduced.

Problem Statement:
  Requests to the Big Switch controllers must contain the
  consistency hash value received from the previous update.
  Otherwise, an inconsistency error will be triggered which
  will force a synchronization. Essentially, a new backend
  call must be prevented from reading from the consistency
  hash table in the DB until the previous call has updated
  the table with the hash from the server response.

  This can be addressed by a semaphore around the rest_call
  function for the single server use case and by a table lock
  on the consistency table for multiple Neutron servers.
  However, both solutions are inadequate because a single
  Neutron server does not scale and a table lock is not
  supported by common SQL HA deployments (e.g. Galera).

  This issue was previously addressed by deploying servers
  in an active-standby configuration. However, that only
  prevented the problem for HTTP API calls. All Neutron
  servers would respond to RPC messages, some of which would
  result in a port update and possible backend call which
  would trigger a conflict if it happened at the same time
  as a backend call from another server. These unnecessary
  syncs are unsustainable as the topology increases beyond
  ~3k VMs.

  Any solution needs to be back-portable to Icehouse so new
  database tables, new requirements, etc. are all out of the
  question.

Solution:
  This patch stores the lock for the consistency hash as a part
  of the DB record. The guaruntees the database offers around
  atomic insertion and constrained atomic updates offer the
  primitives necessary to ensure that only one process/thread
  can lock the record at once.

  The read_for_update method is modified to not return the hash
  in the database until an identifier is inserted into the
  current record or added as a new record. By using an UPDATE
  query with a WHERE clause restricting to the current state,
  only one of many concurrent callers to the DB will successfully
  update the rows. If a caller sees that it didn't update any
  rows, it will start the process over of trying to get the
  lock.

  If a caller observes that the same ID has the lock for
  more than 60 seconds, it will assume the holder has
  died and will attempt to take the lock. This is also done
  in a concurrency-safe UPDATE call since there may be many
  other callers may attempt to do the same thing. If it
  fails and the lock was taken by someone else, the process
  will start over.

  Some pseudo-code resembling the logic:
    read_current_lock
    if no_record:
      insert_lock
      sleep_and_retry if constraint_violation else return
    if current_is_locked and not timer_exceeded:
      sleep_and_retry
    if update_record_with_lock:
      return
    else:
      sleep_and_retry

Closes-Bug: #1374261
Change-Id: Ifa5a7c9749952bc2785a9bf3fed69ad55bf21acc
2014-11-19 05:43:18 +00:00
..
agent Drop RpcProxy usage from MetadataPluginApi 2014-11-18 16:57:46 +00:00
api switch to oslo.serialization 2014-11-14 09:28:12 +00:00
cmd Update i18n translation for neutron.cmd log msg's 2014-11-15 00:14:42 -08:00
common Merge "Drop neutron.common.rpc.MessagingTimeout" 2014-11-14 22:03:44 +00:00
db Merge "Update i18n translation for neutron.db log msg's" 2014-11-18 20:29:57 +00:00
debug Purge use of "PRED and A or B" poor-mans-ternary 2014-11-08 00:17:12 +11:00
extensions Update i18n translation for neutron.extension log msg's 2014-11-15 00:42:43 -08:00
hacking Update i18n translation for neutron.extension log msg's 2014-11-15 00:42:43 -08:00
locale Imported Translations from Transifex 2014-11-09 06:08:09 +00:00
notifiers fix event_send for re-assign floating ip 2014-11-10 18:20:10 -08:00
openstack switch to oslo.serialization 2014-11-14 09:28:12 +00:00
plugins BSN: Optimistic locking strategy for consistency 2014-11-19 05:43:18 +00:00
scheduler Empty files should not contain copyright or license 2014-10-20 00:50:32 +00:00
server Configure agents using neutron.common.config.init (formerly .parse) 2014-06-17 21:56:24 +02:00
services Merge "enable F812 check for flake8" 2014-11-18 00:56:20 +00:00
tests BSN: Optimistic locking strategy for consistency 2014-11-19 05:43:18 +00:00
__init__.py Remove the useless vim modelines 2014-06-21 15:07:31 +08:00
auth.py add auth token to context 2014-08-12 11:17:21 +09:00
context.py Add advsvc role to neutron policy file 2014-10-27 12:49:27 +00:00
hooks.py Remove the useless vim modelines 2014-06-21 15:07:31 +08:00
manager.py Moved rpc_compat.py code back into rpc.py 2014-06-24 10:35:39 +02:00
neutron_plugin_base_v2.py Throw exception instances instead of classes 2014-09-07 12:56:30 +04:00
policy.py Decrease policy logging verbosity 2014-11-14 21:10:04 +04:00
quota.py Remove the useless vim modelines 2014-06-21 15:07:31 +08:00
service.py Use stop() method on MessageHandlingServer 2014-10-29 10:02:10 +01:00
version.py Remove the useless vim modelines 2014-06-21 15:07:31 +08:00
wsgi.py switch to oslo.serialization 2014-11-14 09:28:12 +00:00