Thursday, August 27, 2015

[NineChap Sys] System Design Class 2 - Shuatiblog.com



[NineChap Sys] System Design Class 2 - Shuatiblog.com
This class covers database design:
  1. design data with class and inheritance
  2. design a user system (Netflix 2015)
  3. design a payment system (Yelp, BigCommerce 2015)
Example One
design account (login/out) system for our radio.

Step one, scenario

  1. register, update, remove account
  2. login/out
  3. user balance, VIP services

Step Two, necessary

  1. Ask
    1. total user: 100 million
    2. daily user: 1 million
  2. predict
    1. daily active user in 3 month: 10 million
    2. register percentage: 1%
    3. daily new register: 100 thousand
  3. more predict
    1. login percentage: 15%
    2. average login frequency: 1.2 (ppl may input wrong password 20% of time)
    3. daily login attempts: 10 million * 15% * 1.2 = 1.8 million
    4. average login frequency: 1.8 million / 24hr = 21 login/sec
    5. normal login frequency: 21 * 2 = 42 login/sec
    6. peak login frequency: 42 * 3 = 126 login/sec

Step Four, Kilobit

Data – User table should contain name and password. What else?
class User {
    int userId; (primary key)
    String name;
    String password;
}
Data – User Table
class UserTable {
    list<User> table;

    public insert(){}
    public delete(User){}
    public update(User){}
    public User select(){}
}
CRUD, (Sometimes called SCRUD with an “S” for Search) are the four basic functions of persistent storage.

Step Five, Evolve

advanced 2: verification and forbidden accounts

We have to know the concept of Account State Lifecycle Graph:
  1. ban: fake user, advertising users… bannned by the management
  2. inactive: user choose to suspend his own account, voluntarily.
  3. delete account: normally we won’t remove all related data (just make userId as “deleted”). Otherwise a lot of data can be violated. All your chatting history actually remains.

redesign User Table

Old User table:
class User {
    int userId; (primary key)
    String name;
    String password;
}
Modified User table:
class User {
    int userId;
    char name[10];
    char hiddenPassword[10];
    int state;
}
  1. We added state, to support Account life cycle.
  2. We changed username to fixed size, for better performance on searching and storing. Can prevent certain attacks, too.
  3. save encrypted password.

advanced 3: login/out process

  1. User account auto logged out after a certain period of time.
  2. multiple account logged in at same time.

Session

Session is a conversation between user and server.
  1. User can have >1 session, if he log in from different devices.
  2. Session must be verified, thus we have to keep sessionId.
Session status: “iPad online”, “PC online”…
Modify User table:
class User {
    int userId;
    char name[10];
    char hiddenPassword[10];
    int state;
    List<session> sessionList;
}
Important in Session table:
  1. device ID
  2. time-out period
    class Session { private sessionId; int userId;
     int deviceCode;
     long timeOut;
    
    }
User table would include a session list.

further improvement: session

  1. we update sessionList very frequently.
  2. size of sessionList is dynamic.
Solution: seperate the table.
Question: When to clean up the session data (considering huge amount of data and frequent calculation)?
Answer: every end of day. Or store sessions in-memory, so it lose all the data when machine restarts (it is used in Gaming platforms). Or we can clean up one user’s session list whenever the user did a new log-in.
We do not remove session whenever it expires. It’s too much calculation.

further improvement: inheritance

Apply inheritance to UserTable and SessionTable:
class Table {
    list<Row*> table;

    public insert(){}
    public delete(){}
    public update(){}
    public User select(){}
}

class UserTable extends Table {
}

class SessionTable extends Table {
}
As for the Row class:
class Row {
    List<Attributes> row;
}

class User extends Row {
}

class Session extends Row {
}

advanced 4: search in table

  1. find my userId
  2. find my userId range
Solution 1: add HashMap in the table. Can do search in O(1), but can’t find range.
Solution 2: BST data structure. Can do search range and search in O(log2 n) time.

Best solution: B+ tree

B+ tree – everything in O(logb n) which is close to constant time.
Plus, B+ tree is hard disk friendly. Read more on a future post.

advanced 5: VIP services

User could buy VIP services using his acccount balance.
class ProService {
    int userId;
    double money;
    long endTime;

    public addMoney(){}
    public buyPro(){}
}

Q5.1: System crashes before adding time

Solution: transaction with log
WriteLOG
Transaction_1123: BEGIN
money=20; endTime=16_07_2016
If system crash happens here, system will read the log, recover and roll back all original data. Try not to complete the transaction – just fail it.
WriteLOG
Transaction_1123: BEGIN
money=20; endTime=16_07_2016
WriteLOG
Transaction_1123: END
money=10; endTime=16_08_2016
What happens if system crash during writing the log? or during the rollback?

Q5.2: check list

  1. one user id have 2 corresponding pro-services information.
  2. Shallow user: a pro-services info does not have corresponding user.
Solution: have a checker class.

Q5.3: 2 simutaneous purchase.

Solution: lock.
  1. first process lock money & endTime.
  2. Read money = 20
  3. another process try to lock, but end up waiting (sleeping).
  4. first process do the job, and release the lock.
  5. another process wakes up.
lock have time-out settings. It can be applied in distributed system as well.
Question: does lock make your execution slow?
  1. If another process is sleeping, CPU will be fully consumed by other process. So it won’t impact.
  2. You can do some async processing, too.
  3. When you lock, try to lock only a small piece of code, not the entire method. In DB, lock only a row, not a table.
  4. Java CAS (Compare and swap )

Q5.4: machine crash

Solution: duplication.
  1. How many copies?
    Google did 3. Normally 2 in same data center, and 1 in another location.
    Backup data normally is on another machine. But there’s also RAID (redundant array of independent disks) which:
    combines multiple physical disk drive components into a single logical unit for the purposes of data redundancy, performance improvement, or both.
  2. When does the copy happen?
    Option 1 is doing everyday nightly. This is called a ‘check point’.
    Option 2 is use another server to support Shadow Redundancy. All data from Machine 0 will be copied to Machine 1 WHILE it happens. The result is, Machine 1 is identical to Machine 0. If Machine 0 crashes, Machine 1 may be behind less than 1 second.
    The way to duplicate is either re-play all the actions, or to read Machine 0’s log and apply the new data.
  3. How to copy?
    User send data to 1 server, and from that server, pipeline. This ensures good usage of server bandwith, and serial transfer of data ensures low latency (several ms).
    It’s also possible to do tree-like transfer, but the above method is preferred cuz all machine consume same bandwidth.
  4. What is log?
    It is actually ‘checkpoint’ + log. It allows you to rollback.
Summary:

Final note: Data consistency

Main sources of inconsistency comes from:
  1. network fault
  2. disk error
The disk eror is solved by checksum (compare during disk writing).

Summary

ACID is a set of properties that guarantee that database transactions are processed reliably.
  1. Atomicity: all or nothing
    Q5.1: System crashes before adding time
  2. Consistency: validate according to defined rules
    Q5.2: check list
  3. Isolation: independency between transactions (lock)
    Q5.3: 2 simutaneous purchase.
  4. Durability: stored permanently
    Q5.4: machine crash
Question:
  1. design a user system (Netflix 2015)
Hint: table design, register, login/out, password check, find password.
  1. Design payment system (Yelp, BigCommerce 2015)
Hint: the question does not ask about table/ds design itself, but rather the problems associated with payment. Read about ACID principle.
Read full article from [NineChap Sys] System Design Class 2 - Shuatiblog.com

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts