Massive Technical Interviews Tips: Distributed Application Techniques

Monday, September 28, 2015

Distributed Application Techniques

https://gigadom.wordpress.com/2011/05/13/design-principles-of-scalable-distributed-systems/

Vector Clocks
An obvious issue in distributed systems with hundreds of servers is that each server will have its own clock running at a slightly different rate. It is difficult to get a view of a global time considering that each system has slightly different clock speeds. How does one determine causality in such a distributed system? The solution to this problem is provided by Vector Clocks devised by Leslie Lamport. Vector Clocks provide a way of determining the causal ordering of events. Each system maintains an array of timestamps based on its own internal clock which it keeps incrementing. When a system needs to send an event to another system it sends the message with the timestamp generated from its internal array. When the receiving system receives the message at a timestamp that is less than the sender’s timestamp it increments its own timestamp by 1 and continues to increments its internal array through its own internal clock. In the figure the event sent from System 1 to System 2 is assumed to be fine since the timestamp of the sender “2” < “15. However when System 3 sends an event with timestamp 40 to System 2 which received it timestamp 35, to ensure a causal ordering where System 2 knows that it received the event after it was sent from System the vector clock is incremented by 1 i.e 40 + 1 = 41 and System 2 increments at it did before, This ensures that partial ordering of events is maintained across systems.

Vector clocks have been used in Amazon’s e-retail website to reconcile updates. The use of vector clocks to manage consistency has been mentioned in Amazon’s Dynamo Architecture

Distributed Hash Table (DHT)
The Distributed Hash Table uses a 128 bit hash mechanism to distribute keys over several nodes that can be conceptually assumed to reside on the circumference of a circle. The hash of the largest key coincides with the hash of the smallest key. There are several algorithms that are used to distribute the keys over this conceptual circle. One such algorithm is the Chord System. These algorithms try to get to the exact node in the smallest number of hops by storing a small amount of local data at each node.
The Chord System maintains a finger table that allows it to get to the destination node in O (log n) number of hops. Other algorithms try to get to the desired node in O (1) number of hops.
Databases like Cassandra, Big Table, and Amazon use a consistent hashing technique. Cassandra spreads the keys of records over distributed servers by using a 128 bit hash key.

Quorum Protocol
Gossip Protocol
This is the most preferred protocol to allow the servers in the distributed system to become aware of server crashes or new servers joining into the system, Membership changes and failure detection are performed by propagating the changes to a set of randomly chosen neighbors, who in turn propagate to another set of neighbors. This ensures that after a certain period of time the view becomes consistent.

Hinted Handoff and Merkle trees: To handle server failures replicas are sometimes sent to a healthy node if the node to which it was destined was temporarily down. For e.g. data destined for Node A is delivered to Node D which maintains a hint in its metadata that the data is to be eventually handed off to Node A when it is healthy. Merkle trees are used to synchronize replicas amongst nodes. Merkle trees minimize the amount of data that needs to be transferred for synchronization.

https://gigadom.wordpress.com/2011/05/04/designing-a-scalable-architecture-for-the-cloud/

Monday, September 28, 2015

Distributed Application Techniques

Labels

Popular Posts