https://en.wikipedia.org/wiki/Riak
Riak implements the principles from Amazon's Dynamo paper[3] with heavy influence from the CAP Theorem. Written in Erlang, Riak has fault tolerance data replication and automatic data distribution across the cluster for performance and resilience.
Riak has a pluggable backend for its core storage, with the default storage backend being Bitcask. LevelDB is also supported.
Multi-datacenter replication
In multi-datacenter replication, one cluster acts as a "primary cluster." The primary cluster handles replication requests from one or more "secondary clusters" (generally located in other regions or countries). If the datacenter with the primary cluster goes down, a second cluster can take over as the primary cluster. There are two primary modes of operation: fullsync and realtime. In fullsync mode, a complete synchronization occurs between primary and secondary cluster(s), by default every six hours. In real-time mode,replication to the secondary data center(s) is triggered by updates to the primary data center. All multi-datacenter replication occurs over multiple concurrent TCP connections to maximize performance and network utilization. Note that multi-datacenter replication is not a part of open source Riak.
MariaDB is a community-developed fork of the MySQL relational database management system intended to remain free under the GNU GPL.
It includes the XtraDB storage engine for replacing InnoDB,[8] as well as a new storage engine, Aria, that intends to be both a transactional and non-transactional engine perhaps even included in future versions of MySQL.
http://basho.com/posts/technical/distributed-data-types-riak-2-0/
Riak knows what is stored in a counter key, and how to increment and decrement it through the counter API. It isn’t necessary to fetch, mutate, or put a counter. Instead you just incremented by 5 or decremented by 100. Vector Clocks, as discussed in the post entitled Clocks Are Bad, or, Welcome to the Wonderful World of Distributed Systems, as Riak knew how to merge concurrent writes there was never a sibling created.
This availability is achieved with mechanisms like sloppy quorum writes to fallback nodes. However, even without partitions and many nodes, interleaved or concurrent writes can lead to conflicts. Traditionally, Riak keeps all values and presents them to the user to resolve. The client application must have a deterministic way to resolve conflicts. It might be to pick the highest timestamp, or union all the values in a list, or something more complex. Whatever approach is chosen, it is ad-hoc, and created specifically for the data model and application at hand.
https://docs.basho.com/riak/kv/2.2.3/learn/concepts/crdts/
http://basho.com/posts/technical/clocks-are-bad-or-welcome-to-distributed-systems/
http://basho.com/posts/technical/riaks-config-behaviors-epilogue/
http://basho.com/posts/technical/
Riak implements the principles from Amazon's Dynamo paper[3] with heavy influence from the CAP Theorem. Written in Erlang, Riak has fault tolerance data replication and automatic data distribution across the cluster for performance and resilience.
Riak has a pluggable backend for its core storage, with the default storage backend being Bitcask. LevelDB is also supported.
MariaDB is a community-developed fork of the MySQL relational database management system intended to remain free under the GNU GPL.
It includes the XtraDB storage engine for replacing InnoDB,[8] as well as a new storage engine, Aria, that intends to be both a transactional and non-transactional engine perhaps even included in future versions of MySQL.
http://basho.com/posts/technical/distributed-data-types-riak-2-0/
Riak knows what is stored in a counter key, and how to increment and decrement it through the counter API. It isn’t necessary to fetch, mutate, or put a counter. Instead you just incremented by 5 or decremented by 100. Vector Clocks, as discussed in the post entitled Clocks Are Bad, or, Welcome to the Wonderful World of Distributed Systems, as Riak knew how to merge concurrent writes there was never a sibling created.
CRDT stands for (variously) Conflict-free Replicated Data Type, Convergent Replicated Data Type, Commutative Replicated Data Type, and others. The key, repeated, phrase is “Replicated Data Types”.
Conflict Free, or “Opaque No More”
Riak is an eventually consistent system. It leans, very much, towards the AP end of the CAP spectrum
The data types for Riak 2.0 converge automatically, at write and read time, on the server. If a client application can model its data using the data types provided, no sibling values will be seen and there is no longer a need to write ad-hoc, custom merge functions.
When the Data Types API is leveraged, Riak “knows” what type of thing is being stored and is able to perform the merge automatically.
- Counters: as in Riak 1.4
- Flags: enabled/disabled
- Sets: collections of binary values
- Registers: named Binary values with values also binary
- Maps: a collection of fields that supports the nesting of multiple Data Types
Data Type | Use Cases | Conflict Resolution Rule |
---|---|---|
Counters (v1.4) |
| Each actor keeps and independent count for increments and decrements. Upon merge, the pairwise maximum of any two actors will win (e.g. if one actor holds 172 and other holds 173, 173 will win upon merge) |
Flags |
| Enable wins over disable |
Sets |
| If an element is concurrent added and removed the add will win |
Registers |
| The most chronologically recent value wins, based on timestamps |
Maps |
| If a field is concurrently added, or updated and removed, the addd / update will win |
https://docs.basho.com/riak/kv/2.2.3/learn/concepts/crdts/
http://basho.com/posts/technical/clocks-are-bad-or-welcome-to-distributed-systems/
In a database that runs on a single server, setting aside any complications introduced by transactions or locks, the second of two updates to the same record will overwrite the first. Last write wins.
With Riak’s simplest conflict resolution behavior, the second of two updates to the same object may or may not overwrite the first, even if those two updates are spaced far apart. Last write wins, except when it doesn’t, but even then it does.
Confused yet?
The problem is simple: there is no reliable definition of “last write”; because system clocks across multiple servers are going to drift.
On a single server, there’s one canonical clock, regardless of accuracy. The system can always tell which write occurred in which order (assuming that the clock is always increasing; setting a clock backwards can cause all sorts of bad behavior).
So, back to our original problem with lost updates:
The nodes were a bit out of synch (up to 30 seconds… looking into why ntp wasn’t working!). So far it appears this was the issue.
If two updates to the same object occur within 30 seconds in such an environment, the end result is unpredictable.
Vector Clocks
One approach that should generally be employed when writing Riak applications is to supply vector clocks with each update. It’s not clear in this particular scenario that it would have helped, but it certainly can’t hurt. Giving Riak more information to track causal history is never a bad thing.
It’s almost always better to bite the bullet: instruct Riak to retain all conflicting updates as siblings (via
allow_mult=true
) and write your application to deal with them appropriately.
Also with 2.0, Riak will include the option of designating certain data as strongly consistent, meaning that the servers that hold a piece of data will have to agree on any updates to that data.
As appealing as that may sound, it is impossible to guarantee strong consistency without introducing coordination overhead and constraining Riak’s ability to continue to allow for requests when servers or networks have failed.
If your distributed system isn’t explicitly dealing with data conflicts, any correct behavior it exhibits is more a matter of good luck than of good design.
If your distributed database relies on clocks to pick a winner, you’d better have rock-solid time synchronization, and even then, it’s unlikely your business needs are served well by blindly selecting the last write that happens to arrive.
http://basho.com/posts/technical/riaks-config-behaviors-epilogue/
http://basho.com/posts/technical/