Thursday, March 28, 2019

Gossip



Gossip MessagesStates are just collection of versioned key/value, so if there's newer version, which means the value is changed. Cassandra Gossip won't send out all the states out for syncing. It sends out version digests and only exchange the states that are out of date. Here is the process:
Gossip Process
  1. DigestSynMessage:
    • HostA sends out GossipeDigestSynMessage every second. It contains the largest version of local states.
    • HostB receives this message, it sort and compare the versions with local ones. Then it knows which local states are out of date, which ones are newer than remote status (HostA). Then it build following message, tells HostA these information:
  2. DigestAckMessage:
    • Basically digest_list contains status that HostB need from HostA. state_map contains status information that NodeA need. In this example, HostB doesn't have the latest info from HostA, HostA doesn't have the latest info from HostB, HostC status are up to date for both hosts.
    • When HostA gets the message, it apply the state_map to local states. And then build following message with the list from digest_list.
  3. DigestAck2Message
    • HostB receives the message, and simply update local state with the information from state_map.


Each host contains a state_map, which contains state information for each host in the cluster. Each host has the same data structure, Gossip protocol is to sync these states across the cluster. Each state is an versioned key/value. Every host keeps it's own AtomicInteger version variable, and get bumped whenever a state is changed.

HeartBeat and Failure Detection

Generation

Generation never change after Cassandra process starts. It's actually the current time when Cassandra starts.

Version

  • HeartBeat Version is the most frequently changing state. Each host increases it's local heartbeat version before sending out GossipDigestSyn. And this state needs to be spread out to all nodes with Gossip. So lots of Gossip traffic is for HeartBeat.
  • Cassandra failure detection is implemented with HeartBeat information. Each host keeps new heartbeat version arriving time windows for any other nodes. For example, here is how HostA detects HostB down:
Failure Detection
  • For every second, HostA bumps the version number and send it out to one host:
    • It could be sent (or pull) to HostB directly (like version 1, 3, 6)
    • It could be sent (or pull) to other node and then sent to NodeB (like version 4, 5)
    • Or very likely, the heartbeat version is already up to date (like version 2). It will be dropped.
    • The heartbeat version is increasing but not continuously. Because other state value might also change and bump the globle version too.
    • Currently, Cassandra keeps the last 1000 time windows. For a normal environment without network issue, the mean() for time windows interval is about 1 ~ 1.1 seconds. 1 second means every heartbeat is synced. 1.1 means about 1 in 10 heartbeat isn't synced before new heartbeat version arrived.
    • When no update time interval is bigger than mean() * ln(10) * phi_convict_threshold, that host will be marked as dead. Default phi_convict_threshold is 8, ln(10) x 8 = 2.3 x 8 = 18.4. Which means even in best case, the failure detection delay is 18seconds, typically it's about 20 seconds 18.4 x 1.1 = 20.3.


https://www.ibm.com/developerworks/cn/opensource/os-cn-apache-cassandra3x3/index.html






Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts