029e9f7c5a
Summary: Adds an optimistic locking strategy for the Big Switch server manager so multiple Neutron servers wanting to communicate with the backend do not receive the consistency hash for use simultaneously. The bsn-rest-call semaphore is removed because serialization is now provided by the new locking scheme. A new DB engine is added because the consistency hashes need a life-cycle with rollbacks and other DB operations than cannot impact or be impacted by database operations happening on the regular Neutron objects. Unit tests are included for each of the new branches introduced. Problem Statement: Requests to the Big Switch controllers must contain the consistency hash value received from the previous update. Otherwise, an inconsistency error will be triggered which will force a synchronization. Essentially, a new backend call must be prevented from reading from the consistency hash table in the DB until the previous call has updated the table with the hash from the server response. This can be addressed by a semaphore around the rest_call function for the single server use case and by a table lock on the consistency table for multiple Neutron servers. However, both solutions are inadequate because a single Neutron server does not scale and a table lock is not supported by common SQL HA deployments (e.g. Galera). This issue was previously addressed by deploying servers in an active-standby configuration. However, that only prevented the problem for HTTP API calls. All Neutron servers would respond to RPC messages, some of which would result in a port update and possible backend call which would trigger a conflict if it happened at the same time as a backend call from another server. These unnecessary syncs are unsustainable as the topology increases beyond ~3k VMs. Any solution needs to be back-portable to Icehouse so new database tables, new requirements, etc. are all out of the question. Solution: This patch stores the lock for the consistency hash as a part of the DB record. The guaruntees the database offers around atomic insertion and constrained atomic updates offer the primitives necessary to ensure that only one process/thread can lock the record at once. The read_for_update method is modified to not return the hash in the database until an identifier is inserted into the current record or added as a new record. By using an UPDATE query with a WHERE clause restricting to the current state, only one of many concurrent callers to the DB will successfully update the rows. If a caller sees that it didn't update any rows, it will start the process over of trying to get the lock. If a caller observes that the same ID has the lock for more than 60 seconds, it will assume the holder has died and will attempt to take the lock. This is also done in a concurrency-safe UPDATE call since there may be many other callers may attempt to do the same thing. If it fails and the lock was taken by someone else, the process will start over. Some pseudo-code resembling the logic: read_current_lock if no_record: insert_lock sleep_and_retry if constraint_violation else return if current_is_locked and not timer_exceeded: sleep_and_retry if update_record_with_lock: return else: sleep_and_retry Closes-Bug: #1374261 Change-Id: Ifa5a7c9749952bc2785a9bf3fed69ad55bf21acc |
||
---|---|---|
.. | ||
__init__.py | ||
consistency_db.py | ||
porttracker_db.py |