Version Stamp : Technique of conflict resolution — 8

Version Stamp : Technique of conflict resolution — 8

·

6 min read

NoSQL has faced criticism for not supporting transactions and ACID, hence it’s believed to be unable to meet consistency requirements. However, this is not entirely correct because, as we saw in previous articles, NoSQL does support atomic updates at the aggregate level. But, having said that, transactions are still necessary, which is why modern solutions are polyglot, and architects carefully choose databases for specific requirements; the one-size-fits-all era ended long ago.

Transactions do have limitations; even within transactional systems, human interventions are sometimes required to resolve conflicts, and holding a single session for a long transaction is not possible. Version stamps are a good option to handle such situations.

Once a client or user has made their updates, the transaction is usually open for a small window. For the NoSQL environment, optimistic offline lock is a useful technique. The way to handle this is by having the database contain some form of mechanism for records to have a version stamp. Version stamps are commonly used in NoSQL databases and other distributed systems. They help maintain data consistency and prevent conflicting modifications.

This technique changes every time the record is updated. In this technique, when a record is read, the version stamp is noted, and the client ensures to update the version stamp with each write.

A version stamp is associated with each data item, such as a record or document, in the database. It helps determine whether an item has been updated since it was last read. Whenever an item is updated (modified, inserted, or deleted), its version stamp is also updated. For example, when a user edits a record, the version stamp changes to reflect the new state of that record. Version stamps play a crucial role in concurrency control. When multiple processes or users access the same data concurrently, version stamps allow them to detect conflicts. Before updating an item, a process can check its version stamp to see if it has changed since it was last read. If the version stamp differs, it indicates that another update occurred in the meantime, and conflict resolution strategies can be applied.

There are various ways to maintain this version stamp; let’s explore some of them below:

  1. Counter — A counter can be used, which can be incremented when a record is updated. Counters make it very easy to determine which version of the record is the latest. The flip side of the coin is that the server needs to generate the counter, and a single primary is required to ensure no duplication of counters.

  2. GUID — GUIDs are large random numbers that are unique. These can be a combination of dates, hardware info, and other sources of randomness. The advantage of using GUIDs is that anyone can create them, and because of their randomness, they are unique. The flip side of the coin in this case is that, since they are so large, they can’t be directly compared to find the most recent record.

  3. Hash — With a big enough hash key size, a content hash (such as SHA-1 or MD5) of the content can be globally unique. The advantage is that any node will create the same hash key for the same content, so no duplicates. However, like GUIDs, it’s hard to find the most recent one because of their large size.

  4. Timestamp — The usage of the timestamp of the last update. Similar to counters, timestamps are short and easy to compare for recentness. There’s no need to have a single master node to generate timestamps; multiple nodes can create timestamps, but the clocks of each node need to be in sync for this to work properly. One node with an out-of-sync clock can create bad data. There is a possibility of duplications in the case of too much granulation of timestamps, meaning timestamps won’t be good if updates are frequently happening at milliseconds; this will create duplicates:

One node with out of sync clock can create bad data.

There is possibility of duplications in case to too much granulation of timestamp meaning timestamp won’t be good if the updates are frequently happening at milliseconds this will create duplicates.

We can use a combination of these techniques. For example, a counter and hash key can be combined to find unique and recent records. Remember that the choice of version stamp technique depends on factors like system requirements, scalability, and consistency needs. Each method has its trade-offs, so choose wisely!

Version stamping on multiple nodes

Version stamping works well in a single-node or primary and secondary system, where the primary is responsible for maintaining the version stamp. Secondary nodes simply follow the primary node. In the case of a peer-to-peer system, this system of version stamping needs to be enhanced because version stamping is not happening in one place; every node in a peer-to-peer system is stamping the version, which requires coordination and reconciliation.

If you receive updates from two nodes, and each one provides a different version, you have the choice to get the latest version and discard the non-recent version, assuming that the non-recent version node has not been able to get the update due to a network issue or something else. Another scenario would be updates that are inconsistent from two nodes; in that case, you would need a mechanism for reconciliation and conflict resolution.

A commonly used technique in peer-to-peer systems is using a special version stamp called a vector stamp. Simply put, a vector stamp is a set of counters, one for each node. For example, a vector stamp for three nodes (node1, node2, node3) would be [node1: 54, node2: 65, node3: 71]. Each time a node goes through an update, it updates its counter. For example, if node1 goes through an update, the resulting vector stamp would be [node1: 55, node2: 65, node3: 71]. Whenever two nodes communicate, they synchronize their vector stamps. There are several variations of how this synchronization happens.

Apart from version stamps, other options include vector clocks and version vectors.

Which version is recent

[node1:20 , node2: 21 , node3: 23] — — newer node2 counter is greater here rest remain same

[node1:20 , node2: 20 , node3: 23]

Write — write conflict

If both stamps have a counter greater than other

[node1:20 , node2: 21 , node3: 23]

[node1:21 , node2: 20 , node3: 23]

Missing value detection

[node1:20 , node3: 23]

[node1:20 , node2: 0 , node3: 23]

New node can be easily added without invalidating the existing vector stamp.

Vector stamp is important tool which help her spot inconsistencies but it dosen’t resolve it for you. Conflict resolution would depend on the domain and business decision you working with. This is part of consistency / latency tradeoffs. It’s the decision to make whether network partition may make our system unavailable or we can detect and deal with the inconsistency.

I’m diving deeper into Designing Data-Intensive Applications and will be sharing insights on specific whitepapers, concepts, and design patterns that capture my attention. If you’d like to join me on this exploration, consider following me to receive automatic notifications about my next article!

Did you find this article valuable?

Support Aruna's blog by becoming a sponsor. Any amount is appreciated!