Tuesday, April 5, 2016

System Design Misc Part 2

Term performance of web application is used to mean several things. Most developers are primarily concerned with response time and scalability.
  • Response Time
    Is the time taken by web application to process request and return response. Applications should respond to requests (response time) within acceptable duration. If application is taking beyond the acceptable time, it is said to be non-performing or degraded.
  • Scalability
    Web application is said to be scalable if by adding more hardware, application can linearly take more requests than before. Two ways of adding more hardware are
    • Scaling Up (vertical scaling) :– increasing the number CPUs or adding faster CPUs on a single box.
    • Scaling Out (horizontal scaling) :– increasing the number of boxes.
CAP theorem has shown that is not possible to get Consistency, Availability and Partition tolerance simultaneously. NoSql databases usually compromise on consistency to get high availability and partition.
Splitting database
Database can be split vertically (Partitioning) or horizontally (Sharding).
  • Vertically splitting (Partitioning) :– Database can be split into multiple loosely coupled sub-databases based of domain concepts. Eg:– Customer database, Product Database etc. Another way to split database is by moving few columns of an entity to one database and few other columns to another database. Eg:– Customer database , Customer contact Info database, Customer Orders database etc.
  • Horizontally splitting (Sharding) :– Database can be horizontally split into multiple database based on some discrete attribute. Eg:– American Customers database, European Customers database.
Architecture bottlenecks
Scaling bottlenecks are formed due to two issues
  • Centralised component A component in application architecture which can not be scaled out adds an upper limit on number of requests that entire architecture or request pipeline can handle.
  • High latency component A slow component in request pipeline puts lower limit on the response time of the application. Usual solution to fix this issue is to make high latency components into background jobs or executing them asynchronously with queuing.
CPU Bound Application
An application is said to be CPU bound if application throughput is limited by its CPU. By increasing CPU speed application response time can be reduced.
Few scenarios where applications could be CPU Bound
  • Applications which are computing or processing data with out performing IO operations. (Finance or Trading Applications)
  • Applications which use cache heavily and don’t perform any IO operations
  • Applications which are asynchronous (i.e. Non Blocking), don’t wait on external resources. (Reactive Pattern Applications, NodeJS application)
In the above scenarios application is already working in efficiently but in few instances applications with badly written or inefficient code which perform unnecessary heavy calculations or looping on every request tend to show high CPU usage. By profiling application it is easy to figure out the inefficiencies and fix them.
These issues can be fixed by
  • Caching precomputed values
  • Performing the computation in separate background job.
Different ways of caching, how caching can reduce load and improve performance and scalability of web applications is covered in post Caching To Scale Web Applications
IO Bound Application
An application is said to be IO bound if application throughput is limited by its IO or network operations and increasing CPU speed does not bring down application response times. Most applications are IO bound due to the CRUD operation in most applications Performance tuning or scaling IO bound applications is a difficult job due to its dependency on other systems downstream.
Few scenarios where applications could be IO Bound
  • Applications which are depended on database and perform CRUD operations
  • Applications which consume drown stream web services for performing its operations
Resist creating new systems. See if an old system will suffice
Old systems have already been debugged, and new systems will always bring new problems

When we say "if it ain't broke, don't fix it," we don't mean you should ignore things like new business opportunities, new media, changing culture, etc. However, if you're going to plunge into creating a new system, focus on building modular systems that perform the minimum tasks necessary and that interact with other sub-systems. You'll gift your descendants with more options to stick with the old parts that still work, while only having to replace the few pieces that they must.

Don't try to fix a broken system by adding more parts
  1. Eliminate points of self-reference
    Avoid systems that modify their own source code or data structures (including saving SQL in table cells), that give themselves their own "Partner ID", that invoke their own API in the process of providing that API, etc. These risk a positive feedback loop that make the system unstable. You are not building an AI. Distinguish between hierarchical/stack designs and these strange loops
  2. Design a system to run downhill
  1. Don't reject conflicting facts, store and report them
    Say your system gets a notification that a package has shipped, but the order is marked as cancelled. If you reject the notification because it conflicts with internal state, then you'll lose information. If you think it's okay because you fired off an error report to somebody, what happens when it gets lost? The system's state will never agree with reality. Store conflicting data and provide tools for discovering it and fixing it. There is no such thing as a single version of the truth
  2. Automate the mundane, not the exceptional (Don't Build Airtight Systems)
  1. Every input to the system should be logged in a way that can be played back like a tape to recreate the stock
  2. Never save state in a file format that only one program can understand
    You need to be able to inspect, tweak and re-process data with a variety of tools, not just the original document creator. This is why XML is so popular, in spite of itself
  3. Be zealous with the scalpel
    Customer databases, product catalogs, accounting, inventory, Orders-to-Cash (order management) and Point of Sale should all be cut out into separate systems with separate stores, developed independently. Make them talk to each other SOA-style. If it looks like it's a separate concern, then it is. If it turns out you were wrong, and the service shouldn't stand on its own, then at least that component is already modularized and easy to integrate into wherever it belongs
  4. Choosing the right name is everything
    Having intuitive, accurate, descriptive names for a system's parts is equally as important as what those parts do. But there's more: choosing a name can make the thing. The wrong name can lead you to develop the wrong solution by making you picture the wrong idea
  5. Know the difference between experience and speculationFocusing on the long term implies that you can predict the future. Engineering for speculated needs, rather than needs you know about from experience, is hubris: systems invent their own needs and you cannot know what they'll be until the system needs them. It is better to pay the cost of refactoring and re-engineering later than it is to pay for the wrong optimizations now
  6. Do not turn the system's artifacts into assets
    It must be possible to jettison all or part of a system's pieces without incurring a loss. Do not try to convert any part of it into a product. Your people--after they have come to understand the problem and ways of solving it--are the real asset
  1. Systems in general work poorly, or not at all
    Someone who knows what they're doing will always beat an organization (or an ERP system) that's following fixed rules
  2. All complex systems that work invariably evolved from simple systems that worked
    Never build a large and complex system from scratch. Never assume you can make a broken system work by making it more complex
  3. Complex systems always present unexpected behavior
    This is the reason why you must not build an airtight system. If the system is sealed, then you won't be able to deal with the unexpected
  4. A "complex" system has 2 or more stocks
    Think of complexity as an exponential effect. Each new part increases complexity a little, but it's the stocks that double it. A database is simple. A database with a cache is complex
  5. New systems that replace old systems always bring new problems
    Supporting Unicode in URLs made web addresses friendlier to other languages, but gave scammers the ability to spoof "paypal.com" by registering the domain name using a Cyrillic letter "a"
  6. Fail-Safe systems fail by failing to fail safe
    The fail-safe valve on the top of the reactor vessel at Three Mile Island failed in the safe (open) state, but the sensor which told the control room it was open failed by reporting that it was shut. Hilarity ensued
  7. The divergence between the outputs of a system, and what the system was intended to output, increase with complexity
    Supermarket Breakfast Cereal is not cereal, it's baked pieces of injection-molded paste. Supermarket Bread is not bread, it's baked starch foam. A fast-food milkshake is not a milk shake, it's whipped corn sugar. Cinnamon isn't cinnamon, it's a cheaper spice called Cinnamomum burmanni. Do not assume that just because the System calls it a "Table", or a "Transaction", or a "Customer", or a "Required Ship Date", that it actually is one
  8. Complex systems tend to oppose their own intended function (Le Chatelier's Principle)
    The widespread use of antibiotics has resulted in the creation of new antibiotic-resistant germs. Nasal sprays, when overused, cause a rebound effect that's worse than the original stuffy nose. Your accounting system will inevitably make it impossible to record certain kinds of transactions
  9. Reality, to a system, is whatever is reported to it
    Don't pass data through an interpretation phase before presenting it to the system. Give the system tools to interpret data after it's been encoded and stored, and don't change that data under its nose without telling it why at the same time
  10. Systems attract Systems People
    A Systems Person won't see how absurd a procedure is as long as it happens to agree with the rules of the System. When a person sitting within arm's reach of you refuses to answer a question until you've put it through the ticketing system, then you are sitting next to a Systems Person. The IRS still makes people mail them checks for as little as $0.10, even though it costs them ten times that much to pay someone to open the envelope and process it
  11. All systems are an attempt to create an artificial intelligence
    The human brain works because it throws away most of its work. Systems fail because they never throw away any of their work

Further Reading

看多了以后 你的最终目标应该是心里有了一个大框架 一个基本的distributed system
是怎么搭起来的 然后心里有很多if condition 如果要是满足这个条件 我应该用什么
技术 比如如果read heavy那么用cache会提升performance之类的 同时知道应该避免什
么东西 比如避免single point of failure 再比如时间和空间的tradeoff在read
heavy的时候应该倾向于时间 Write heavy的时候倾向于空间等等

你总结出来的和我总结出来的大框架和if conditions肯定不完全一样 但因为system
design本来就是一个open ended question 所以不用害怕 能够自圆其说 就不会有问题

然后可以讲一下high level architecture,就是分成哪几个component,互相之间如果
之后面试官可能会让你深入某个component detail讨论;

另外,f家还喜欢让你估算机器之类的,做一些back-of-envelopme calculation。所以


3) behavior,建议大家了解一下fb的culture,准备一下常见的behavior questions,

System design设计手机上读取photo feeds的app。
功能: 读取好友的最近图片
要求: 满足功能的同时减少对手机的能耗。

1) 给你10g文件,1g内存,数总共有多少个不同的数,答案是用bit来记录数字,总共
4b个interger,最多用0.5gb来记录,follow up是如果只有400m怎么办,答案是把数字

System design
问如何实现 Facebook messenger
给出了一个hierarchical infrascture.

Design timeline的group权限,比如说user发一条status可以选择对某个group的好
graph的存储。感觉system design只要讲个大概思路就行,面试官不会去纠结太细节的

 一些和facebook相关的system design.网页上用markup language define了一些
object, how to store these objects, how to define relationship between
objects and users, how to search for relationship,
how to find recently listened song by one user, one song may be listend by the same user in
multiple times. etc.

 美国人, 系统设计,设计各系统能返回 top 10 listened songs from your

Given a location (a coordinate), return top 100 nearest places.
Follow up, given a location, return top 100 events within x months in 
nearest places。follow up 其实就是多加一个时间维度。



follow up的话就是多加一个dimension代表时间。

backup, cache等瞎说。边引导边回答。当面给的feedback都还算positive。


第一面system design. 先问怎么求submatrix的和,回答说先预先计算好 (0, 0), (i,
回答先确定bottleneck是cpu, disk io, 还是 network io. 然后针对每项他都详细问
给我加了另外一面system design. 最后还是挂在这一面上。

system design. 问的是 shorten url. 因为之前准备过这个题,所以回答应该


第二轮系统设计:让设计分布式的large scale的producer和consumer问题。就是有一
机的producer和consumer,实现produce和consume函数,其实就是相当于fix size的

个index,每个单词对应一个status update id的列表,查询结果是取列表的交集。我

个数的分布,也就是balls and bins的问题。不用coding。

Design a key-value store.
2. Design Google search
3. Architect a world-wide video distribution system
4. Build Facebook chat
5. Design News Feed

第一轮system design是一位很客气很nice的国人大哥:有很多台机器,设计一个类似
web crawler的一个系统。然后让我把每台机器上面跑的software的module diagram画

题目很简单就是设计news feed,上来我就想按系统设计的思路来讲,结果那哥跟我说
不管系统上的问题,就问我怎么拿Json来传数据生成一个news feed。然后画了一个
news feed, 就跟我们看到的facebook的news feed一样,然后问我news feed上面的各
比如 {
up, 但是我确定我写的很快而且一次下来bug free的但是还是只出了一个问题,这是


1. 入门级的news feed
一般的followup question是估算需要多少server

2. facebook chat,这个也算是挺常问的

3. typeahead search/search suggestion,这个也常见

4. Facebook Messaging System(有提到inbox search, which has been asked before)
messaging system就是一个把所有chat/sms/email之类的都结合起来的一个系统

5. 任给一个手机的位置信号(经纬度),需要返回附近5mile 的POI

6. Implement second/minute/hour/day counters
这题真不觉得是system design,但万一问道,还是要有准备,貌似在总部面试会被问

7. facebook photo storage,这个不太会被问起,但是知道也不错

8. facebook timeline,这个也不太是个考题,看看就行了

implement memcache

implement tinyurl(以及distribute across multiple servers)

determine trending topics(twitter)

copy one file to multiple servers

稍微知道一下dynamo key value store,以及google的gfs和big table

这个high scalability上有很多讲system design的东西,不光是facebook的,没空的
facebook engineering blog



engineering blog看起,把人家用的technology stack/product都搞清楚,然后在把能

1. product spec/usage scenario 和面试者confirm这个东西到底是做什么的
可以先列出来几个major functionality,然后有时间的话,再补充一些不重要的

2. define some major components

以上是question specific的东西,

一般的方法有:Write back cache。大概的意思就是 Client 只负责写给cache,cache自己去负责写给数据库,但不是每条都写回数据库,隔一段时间写回去。
其实对同一条记录发生大量并发写的情况是很少的,即便Facebook这样的级别,最挑战的并不是短时间很多写,而是短时间很多读,因为只要是给人用的东西,一定是读多于写。短时间很多人读,而此时正好cache里又被没有这个数据,这才是最头疼的。Facebook为了解决这个问题,拓展了memcached,增加了一个叫做 lease-get 的方法。具体原理和实现细节,Google搜索 “Facebook Memcached” :

No comments:

Post a Comment


Review (561) System Design (304) System Design - Review (196) Java (179) Coding (75) Interview-System Design (65) Interview (60) Book Notes (59) Coding - Review (59) to-do (45) Linux (40) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (29) Product Architecture (28) Big Data (27) Soft Skills (27) Concurrency (26) MultiThread (26) Miscs (25) Cracking Code Interview (24) Distributed (24) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) OOD Design (20) System Design - Practice (19) How to Ace Interview (16) Security (16) Algorithm (15) Brain Teaser (14) Google (14) Redis (14) Linux - Shell (13) Spark (13) Spring (13) Code Quality (12) How to (12) Interview-Database (12) Interview-Operating System (12) Tools (12) Architecture Principles (11) Company - LinkedIn (11) Solr (11) Testing (11) Resource (10) Search (10) Amazon (9) Cache (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Company - Uber (8) Interview - MultiThread (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Scalability (8) Trouble Shooting (8) Cassandra (7) Company - Facebook (7) Design (7) Git (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Machine Learning (7) NoSQL (7) C++ (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) API Design (4) Be Architect (4) Big Fata (4) C (4) Company Product Architecture (4) Data structures (4) Design Principles (4) Facebook (4) GeeksforGeeks (4) Generics (4) Google Interview (4) Hardware (4) JDK8 (4) Optimization (4) Product + Framework (4) Puzzles (4) Python (4) Shopping System (4) Source Code (4) Web Service (4) node.js (4) Back-of-Envelope (3) Chrome (3) Company - Pinterest (3) Company - Twiiter (3) Company - Twitter (3) Consistent Hash (3) Elasticsearch (3) GOF (3) Game Design (3) GeoHash (3) Growth (3) Guava (3) Html (3) Interview-Big Data (3) Interview-Linux (3) Interview-Network (3) Java EE Patterns (3) Javarevisited (3) Map Reduce (3) Math - Probabilities (3) Performance (3) RateLimiter (3) Resource-System Desgin (3) Scala (3) UML (3) ZooKeeper (3) geeksquiz (3) AI (2) Advanced data structures (2) AngularJS (2) Behavior Question (2) Bugs (2) Coding Interview (2) Company - Netflix (2) Crawler (2) Cross Data Center (2) Data Structure Design (2) Database-Shard (2) Debugging (2) Docker (2) Garbage Collection (2) Go (2) Hadoop (2) Interview - Soft Skills (2) Interview-Miscs (2) Interview-Web (2) JDK (2) Logging (2) POI (2) Papers (2) Programming (2) Project Practice (2) Random (2) Software Desgin (2) System Design - Feed (2) Thread Synchronization (2) Video (2) reddit (2) Ads (1) Algorithm - Review (1) Android (1) Approximate Algorithms (1) Base X (1) Bash (1) Books (1) C# (1) CSS (1) Client-Side (1) Cloud (1) CodingHorror (1) Company - Yelp (1) Counter (1) DSL (1) Dead Lock (1) Difficult Puzzles (1) Distributed ALgorithm (1) Eclipse (1) Facebook Interview (1) Function Design (1) Functional (1) GoLang (1) How to Solve Problems (1) ID Generation (1) IO (1) Important (1) Internals (1) Interview - Dropbox (1) Interview - Project Experience (1) Interview Stories (1) Interview Tips (1) Interview-Brain Teaser (1) Interview-How (1) Interview-Mics (1) Interview-Process (1) Java Review (1) Jeff Dean (1) Joda (1) LeetCode - Review (1) Library (1) LinkedIn (1) LintCode (1) Mac (1) Micro-Services (1) Mini System (1) MySQL (1) Nigix (1) NonBlock (1) Process (1) Productivity (1) Program Output (1) Programcreek (1) Quora (1) RPC (1) Raft (1) Reactive (1) Reading (1) Reading Code (1) Refactoring (1) Resource-Java (1) Resource-System Design (1) Resume (1) SQL (1) Sampling (1) Shuffle (1) Slide Window (1) Spotify (1) Stability (1) Storm (1) Summary (1) System Design - TODO (1) Tic Tac Toe (1) Time Management (1) Web Tools (1) algolist (1) corejavainterviewquestions (1) martin fowler (1) mitbbs (1)

Popular Posts