Massive Technical Interviews Tips: System Misc

Thursday, October 5, 2017

System Misc

https://community.hortonworks.com/articles/43057/rack-awareness-1.html

Rack awareness is having the knowledge of Cluster topology or more specifically how the different data nodes are distributed across the racks of a Hadoop cluster. The importance of this knowledge relies on this assumption that collocated data nodes inside a specific rack will have more bandwidth and less latency whereas two data nodes in separate racks will have comparatively less bandwidth and higher latency.

The main purpose of Rack awareness is:

Increasing the availability of data block
Better cluster performance

https://www.elastic.co/guide/en/elasticsearch/reference/5.4/allocation-awareness.html#allocation-awareness

https://community.hortonworks.com/questions/71458/can-anyone-explain-kafka-rack-awareness-feature.html

As you pointed-out, Kafka 0.10.0.0 supports rack awareness. KAFKA-1215 added a rack-id to kafka config. You can specify that a broker belongs to a particular rack by adding a property to the broker config: broker.rack=my-rack-id.

The rack awareness feature spreads replicas of the same partition across different racks. This extends the guarantees Kafka provides for broker-failure to cover rack-failure, limiting the risk of data loss should all the brokers on a rack fail at once. The feature can also be applied to other broker groupings such as availability zones in EC2.

Thursday, October 5, 2017

System Misc

Labels

Popular Posts