Monday, August 24, 2015

[NineChap Sys] System Design Class 1 - Shuatiblog.com



[NineChap Sys] System Design Class 1 - Shuatiblog.com
the process of defining the architecture, components, modules, interfaces and data to satisfy specified requirements.
  1. conceptual design (macro)
  2. logical design
  3. physical design (micro)

Top-down design

Eg. MS Office, Huawei Security System

Bottom-up design

Most start-up use, MVP first using Medetor + MongoDb.

5 steps (SNAKE Principle):

  1. Scenario: case/interface – input
  2. Necessary: constrain/hypothesis – input
  3. Application: service/algorithm – output
  4. Kilobit: data – output
  5. Evolve – solution
Now let's start by one example: design a radio

Step One, Scenario

  1. brain storm
    1. register/log in
    2. play music
    3. recommendation
  2. prioritize
    1. play music
      1. Get channel
      2. select a channel
      3. play

Step Two, Necessary

  1. ask
    1. total user – 100,000,000
    2. daily users – 1,000,000
  2. predict
    1. user analysis
    2. Traffic analysis
    3. Memory analysis
    4. QPS
Details:
  1. user analysis
    Avg Concurrent users = daily users / 5 = 200,000
    Peak Concurrent users = concurrent user * 3 = 600,000
    considering your product may grow in the next 3 month:
    Max Peak users in 3 month = Peak users * 10 = 6,000,000
  2. Traffic analysis
    Request of new music per user: 1 music/min
    Music size = 3MB
    Max peak traffic (in 3 months) = 6,000,000 * 3MB / 60 = 300GB/s
  3. Memory analysis
    Memory per user (metadata) = 10KB
    Max daily memory = 1,000,000 * 10 * 10 = 100 million KB = 100GB
    (10 times of avg daily user)

Step Three, Application

  1. Replay the case, one service for each
  2. Merge the services

Step Four, Kilobit: data

  1. Append 1 dataset for each service
    Eg. User service: stability, more addition, less modify and deletion.
    Eg. Channel Service: high concurrency, MongoDB
    Eg. Music Service: MP3 File Systems

Last Step, Evolve

  1. Better: constrains
    eg. able to handle 300GB/s traffic?
  2. Broader: new cases
    share music? delete user account?
  3. Deeper: details design
From views of Performance, Scalability and Robustness.

Exercise

design user recommendation module

Simple algo:

u1={m3,m5,m7,m11}  u2={m1,m2,m3,m4,m5,m6,m7,m8,m9}  Similarity( u1, u2 ) = 3  
m – music
u – user
Similarity = # of same music for different users

Adv algo:

find his top-1 similar user. Stay tuned for future posts.

Use the 5 Steps (SNAKE)

  1. Step One, Scenario
  2. Step Two, Necessary
  3. Step Three, Application
  4. Step Four, Kilobit: data
  5. Last Step, Evolve
Because this question is relatively easy, we will not do case-analysis (Macro).
Instead, we do micro design by starting at the interface.

Step One, Scenario

Interface
class Recommender {      public int findSimilarUser(int userId) {          //      }  }  

Step Two, Necessary

  1. ask
    1. total users = 100,000,000
    2. total music = 10,000,000
    3. peak users in 3 month = 6,000,000
    However, not everyone is logged in. Thus we won't need to recommend for everybody. On average, the logged-in ratio is 1% – 30%. Let's assume 5%.
    1. participation percentage = 5%
    And user's interest won't change every minute. Let's recalculate only after 10 minutes.
    1. calculation frequency = 1 update/10min/user
  2. predict
    1. user analysis (skip)
    2. Traffic analysis (skip)
    3. Memory analysis (skip)
    4. QPS
    Peak QPS = 6,000,000 * 5% / (10 * 60) = 500/s

Step Three, Application

The simpliest algorithm: BF compare. The complexity is O(m n) for each user, where m is # of music a person likes, and n is # of total users. For k users, it takes O(k m n) time (k can be = peak concurrent users).
This is roughly 0.2s per user. Thus Max QPS = 5.
One word about complexity-to-seconds estimation.
O(n ^ 3) –> 1s
O(n ^ 2) –> 0.2s
O(n) –> 20ms
O(k) –> k ms

Step Four, Kilobit: data

Very simple:

Last Step, Evolve

Read on.

How to go from Level 0 to Level 1

Refer to the previous question. How can we improve???
  1. performance
  2. scalability
  3. robustness

performance

A better algo: Inverted Index

Avg performance increase to ~ 20ns (with some optimization of MapReduce procedure, discuss later).
Max QPS increase to 50.

scalability

Use a dispatcher to re-direct the requests to multiple machines.

How many machines do we need then?

Well we need 500 QPS. The algo above achieves ~ 50 QPS. Should we need 10 machines?
The answer is know. A machine with 8 (or 16) core CPU could be able to handle.
We can also have a hot-standby, to be safe.
hot standby is used as a failover mechanism to provide reliability in system configurations.
When a key component fails, the hot spare is switched into operation.

robustness

Tips about system design for senior engineers:
Draw 1 machine first. This machine can contains multiple datasets and run multiple processes.
On top of this machine, the interface layer is one single Manager process. The Manager is in charge of almost everything: handling data lost, handle high concurrency, copy multiple instance of itself…
Like this:

Back-end

Now we need a cluster of datasets (which has Manager on top of it), and a cluster of Recommenders. Manager is in charge of copying multiple instances.
Dataset can be put in different physical locations. Recommender don't really need, cuz it's only do calculation job.

Receiving requests

Just now we used Receptionist (or Dispatcher) to handle request. Now we use a Web service (eg. Apache). It's not necessary to make it a cluster.

Big Brother

We need a monitor system to oversee everything.
Also, Big Brother is in charge of heart-beat. If not received, Big Brother have some double-check machanism.

Connecting the dots

Dispatcher is used to connect the 4 components. It's like a messaging queue that collects and distributes jobs among everybody (eg. control and distributed info).
It can be stateful or stateless.
Keep in mind the connection between Dataset and Recommender remains. It's slow going thru Dispatcher.

Distribute it

During development, the 5 components can be put on same machine. When we deploy distributely, we use Socket connection (keep alive) to connect them.
Notice the Web Service is connection heavy, which consume large CPU and RAM resource. It's better to seperate to one machine.
Big brother is read/write heavy, so it's OKAY to put on same machine with Dispatcher.
Since Dataset and Recommender have data exchange, it's a good idea to put on same machine.

Additional questions

Implement Dispatcher with consumer-producer model.
https://en.wikipedia.org/wiki/Queries_per_second
Queries Per Second (QPS) is a common measure of the amount of search traffic an information retrieval system, such as a search engine or a database, receives during one second.[1]
High-traffic systems must watch their QPS in order to know when to scale the system to handle more load.
Read full article from [NineChap Sys] System Design Class 1 - Shuatiblog.com

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts