[Design] Big Data Storage - Shuatiblog.com
ref
How do you build the index table and how many machines do you need on the cluster to store the index table?
Index by hashed userid; will distribute traffic effectively across servers; cache active users recent messages in memory.
Cannot use Netapp box. From what I read in FB engg blog, they have all the info in main memory of server.
Total data = 1 trillion * 10 words * 6 bytes / word = 60TB + 1TB for Indexes.
Considering servers have 64 GB ram. 61 GB usable to store index, 1000 servers.
Read full article from [Design] Big Data Storage - Shuatiblog.com
ref
Question
Given 1 trillion messages on fb and each message has at max 10 words.How do you build the index table and how many machines do you need on the cluster to store the index table?
- Trillion (short scale) (1,000,000,000,000; one million million; 1012; SI prefix: tera-), the current meaning in both American and British English
One possible answer
Total data = 1 trillion * 10 words * 6 bytes / word = 60TB = one small NetApp boxIndex by hashed userid; will distribute traffic effectively across servers; cache active users recent messages in memory.
Cannot use Netapp box. From what I read in FB engg blog, they have all the info in main memory of server.
Total data = 1 trillion * 10 words * 6 bytes / word = 60TB + 1TB for Indexes.
Considering servers have 64 GB ram. 61 GB usable to store index, 1000 servers.
Read full article from [Design] Big Data Storage - Shuatiblog.com