Thursday, July 23, 2015

[Design] Big Data Storage - Shuatiblog.com



[Design] Big Data Storage - Shuatiblog.com
ref

Question

Given 1 trillion messages on fb and each message has at max 10 words.
How do you build the index table and how many machines do you need on the cluster to store the index table?
1 TB = 1000000000000bytes = 1012bytes = 1000gigabytes.
  • Trillion (short scale) (1,000,000,000,000; one million million; 1012; SI prefix: tera-), the current meaning in both American and British English

One possible answer

Total data = 1 trillion * 10 words * 6 bytes / word = 60TB = one small NetApp box
Index by hashed userid; will distribute traffic effectively across servers; cache active users recent messages in memory.
Cannot use Netapp box. From what I read in FB engg blog, they have all the info in main memory of server.
Total data = 1 trillion * 10 words * 6 bytes / word = 60TB + 1TB for Indexes.
Considering servers have 64 GB ram. 61 GB usable to store index, 1000 servers.

Read full article from [Design] Big Data Storage - Shuatiblog.com

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts