Tuesday, April 5, 2016

System Design Misc Part 2



http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html
http://venkateshcm.com/2014/05/Architecture-Issues-Scaling-Web-Applications/
Performance
Term performance of web application is used to mean several things. Most developers are primarily concerned with response time and scalability.
  • Response Time
    Is the time taken by web application to process request and return response. Applications should respond to requests (response time) within acceptable duration. If application is taking beyond the acceptable time, it is said to be non-performing or degraded.
  • Scalability
    Web application is said to be scalable if by adding more hardware, application can linearly take more requests than before. Two ways of adding more hardware are
    • Scaling Up (vertical scaling) :– increasing the number CPUs or adding faster CPUs on a single box.
    • Scaling Out (horizontal scaling) :– increasing the number of boxes.
NoSQL
CAP theorem has shown that is not possible to get Consistency, Availability and Partition tolerance simultaneously. NoSql databases usually compromise on consistency to get high availability and partition.
Splitting database
Database can be split vertically (Partitioning) or horizontally (Sharding).
  • Vertically splitting (Partitioning) :– Database can be split into multiple loosely coupled sub-databases based of domain concepts. Eg:– Customer database, Product Database etc. Another way to split database is by moving few columns of an entity to one database and few other columns to another database. Eg:– Customer database , Customer contact Info database, Customer Orders database etc.
  • Horizontally splitting (Sharding) :– Database can be horizontally split into multiple database based on some discrete attribute. Eg:– American Customers database, European Customers database.
Architecture bottlenecks
Scaling bottlenecks are formed due to two issues
  • Centralised component A component in application architecture which can not be scaled out adds an upper limit on number of requests that entire architecture or request pipeline can handle.
  • High latency component A slow component in request pipeline puts lower limit on the response time of the application. Usual solution to fix this issue is to make high latency components into background jobs or executing them asynchronously with queuing.
CPU Bound Application
An application is said to be CPU bound if application throughput is limited by its CPU. By increasing CPU speed application response time can be reduced.
Few scenarios where applications could be CPU Bound
  • Applications which are computing or processing data with out performing IO operations. (Finance or Trading Applications)
  • Applications which use cache heavily and don’t perform any IO operations
  • Applications which are asynchronous (i.e. Non Blocking), don’t wait on external resources. (Reactive Pattern Applications, NodeJS application)
In the above scenarios application is already working in efficiently but in few instances applications with badly written or inefficient code which perform unnecessary heavy calculations or looping on every request tend to show high CPU usage. By profiling application it is easy to figure out the inefficiencies and fix them.
These issues can be fixed by
  • Caching precomputed values
  • Performing the computation in separate background job.
Different ways of caching, how caching can reduce load and improve performance and scalability of web applications is covered in post Caching To Scale Web Applications
IO Bound Application
An application is said to be IO bound if application throughput is limited by its IO or network operations and increasing CPU speed does not bring down application response times. Most applications are IO bound due to the CRUD operation in most applications Performance tuning or scaling IO bound applications is a difficult job due to its dependency on other systems downstream.
Few scenarios where applications could be IO Bound
  • Applications which are depended on database and perform CRUD operations
  • Applications which consume drown stream web services for performing its operations
http://www.yacoset.com/Home/system-architecture-tips
Resist creating new systems. See if an old system will suffice
Old systems have already been debugged, and new systems will always bring new problems

When we say "if it ain't broke, don't fix it," we don't mean you should ignore things like new business opportunities, new media, changing culture, etc. However, if you're going to plunge into creating a new system, focus on building modular systems that perform the minimum tasks necessary and that interact with other sub-systems. You'll gift your descendants with more options to stick with the old parts that still work, while only having to replace the few pieces that they must.

Don't try to fix a broken system by adding more parts
  1. Eliminate points of self-reference
    Avoid systems that modify their own source code or data structures (including saving SQL in table cells), that give themselves their own "Partner ID", that invoke their own API in the process of providing that API, etc. These risk a positive feedback loop that make the system unstable. You are not building an AI. Distinguish between hierarchical/stack designs and these strange loops
  2. Design a system to run downhill
  1. Don't reject conflicting facts, store and report them
    Say your system gets a notification that a package has shipped, but the order is marked as cancelled. If you reject the notification because it conflicts with internal state, then you'll lose information. If you think it's okay because you fired off an error report to somebody, what happens when it gets lost? The system's state will never agree with reality. Store conflicting data and provide tools for discovering it and fixing it. There is no such thing as a single version of the truth
  2. Automate the mundane, not the exceptional (Don't Build Airtight Systems)
  1. Every input to the system should be logged in a way that can be played back like a tape to recreate the stock
  2. Never save state in a file format that only one program can understand
    You need to be able to inspect, tweak and re-process data with a variety of tools, not just the original document creator. This is why XML is so popular, in spite of itself
  3. Be zealous with the scalpel
    Customer databases, product catalogs, accounting, inventory, Orders-to-Cash (order management) and Point of Sale should all be cut out into separate systems with separate stores, developed independently. Make them talk to each other SOA-style. If it looks like it's a separate concern, then it is. If it turns out you were wrong, and the service shouldn't stand on its own, then at least that component is already modularized and easy to integrate into wherever it belongs
  4. Choosing the right name is everything
    Having intuitive, accurate, descriptive names for a system's parts is equally as important as what those parts do. But there's more: choosing a name can make the thing. The wrong name can lead you to develop the wrong solution by making you picture the wrong idea
  5. Know the difference between experience and speculationFocusing on the long term implies that you can predict the future. Engineering for speculated needs, rather than needs you know about from experience, is hubris: systems invent their own needs and you cannot know what they'll be until the system needs them. It is better to pay the cost of refactoring and re-engineering later than it is to pay for the wrong optimizations now
  6. Do not turn the system's artifacts into assets
    It must be possible to jettison all or part of a system's pieces without incurring a loss. Do not try to convert any part of it into a product. Your people--after they have come to understand the problem and ways of solving it--are the real asset
  1. Systems in general work poorly, or not at all
    Someone who knows what they're doing will always beat an organization (or an ERP system) that's following fixed rules
  2. All complex systems that work invariably evolved from simple systems that worked
    Never build a large and complex system from scratch. Never assume you can make a broken system work by making it more complex
  3. Complex systems always present unexpected behavior
    This is the reason why you must not build an airtight system. If the system is sealed, then you won't be able to deal with the unexpected
  4. A "complex" system has 2 or more stocks
    Think of complexity as an exponential effect. Each new part increases complexity a little, but it's the stocks that double it. A database is simple. A database with a cache is complex
  5. New systems that replace old systems always bring new problems
    Supporting Unicode in URLs made web addresses friendlier to other languages, but gave scammers the ability to spoof "paypal.com" by registering the domain name using a Cyrillic letter "a"
  6. Fail-Safe systems fail by failing to fail safe
    The fail-safe valve on the top of the reactor vessel at Three Mile Island failed in the safe (open) state, but the sensor which told the control room it was open failed by reporting that it was shut. Hilarity ensued
  7. The divergence between the outputs of a system, and what the system was intended to output, increase with complexity
    Supermarket Breakfast Cereal is not cereal, it's baked pieces of injection-molded paste. Supermarket Bread is not bread, it's baked starch foam. A fast-food milkshake is not a milk shake, it's whipped corn sugar. Cinnamon isn't cinnamon, it's a cheaper spice called Cinnamomum burmanni. Do not assume that just because the System calls it a "Table", or a "Transaction", or a "Customer", or a "Required Ship Date", that it actually is one
  8. Complex systems tend to oppose their own intended function (Le Chatelier's Principle)
    The widespread use of antibiotics has resulted in the creation of new antibiotic-resistant germs. Nasal sprays, when overused, cause a rebound effect that's worse than the original stuffy nose. Your accounting system will inevitably make it impossible to record certain kinds of transactions
  9. Reality, to a system, is whatever is reported to it
    Don't pass data through an interpretation phase before presenting it to the system. Give the system tools to interpret data after it's been encoded and stored, and don't change that data under its nose without telling it why at the same time
  10. Systems attract Systems People
    A Systems Person won't see how absurd a procedure is as long as it happens to agree with the rules of the System. When a person sitting within arm's reach of you refuses to answer a question until you've put it through the ticketing system, then you are sitting next to a Systems Person. The IRS still makes people mail them checks for as little as $0.10, even though it costs them ten times that much to pay someone to open the envelope and process it
  11. All systems are an attempt to create an artificial intelligence
    The human brain works because it throws away most of its work. Systems fail because they never throw away any of their work

Further Reading

https://github.com/WeizhengZhou/leetcode3/blob/master/systemDesign/mySystemDesignSummary.txt
看多了以后 你的最终目标应该是心里有了一个大框架 一个基本的distributed system
是怎么搭起来的 然后心里有很多if condition 如果要是满足这个条件 我应该用什么
技术 比如如果read heavy那么用cache会提升performance之类的 同时知道应该避免什
么东西 比如避免single point of failure 再比如时间和空间的tradeoff在read
heavy的时候应该倾向于时间 Write heavy的时候倾向于空间等等

你总结出来的和我总结出来的大框架和if conditions肯定不完全一样 但因为system
design本来就是一个open ended question 所以不用害怕 能够自圆其说 就不会有问题

https://github.com/WeizhengZhou/leetcode3/blob/master/systemDesign/facebookSystemDesign.txt
解这种题是个*交流*的过程,或者说是给出方案然后获取反馈的不断循环的过程。
一般的流程:
首先你要问清楚requirement;
然后可以讲一下high level architecture,就是分成哪几个component,互相之间如果
interact,在白板上画一画;
之后面试官可能会让你深入某个component detail讨论;
也有可能变换requirement让你重新设计

另外,f家还喜欢让你估算机器之类的,做一些back-of-envelopme calculation。所以
最好对一些计算机相关的基本常数,fb的用户量等等有个大概的了解。

准备的时候建议看看fb的design高频题。一方面有可能面试的时候刚好碰到这几个
topic,另一方面其实很多design都是相通的。

3) behavior,建议大家了解一下fb的culture,准备一下常见的behavior questions,
面试之前rehearsal一下。

System design设计手机上读取photo feeds的app。
功能: 读取好友的最近图片
阅览好友的相册
要求: 满足功能的同时减少对手机的能耗。

1) 给你10g文件,1g内存,数总共有多少个不同的数,答案是用bit来记录数字,总共
4b个interger,最多用0.5gb来记录,follow up是如果只有400m怎么办,答案是把数字
hash一下或者说scan文件多次,每次取尾数bits不一样的数,不用code

System design
问如何实现 Facebook messenger
给出了一个hierarchical infrascture.

Design timeline的group权限,比如说user发一条status可以选择对某个group的好
友可见。题目很简单,但是会讨论到facebook用户规模的估算,服务器估算,social 
graph的存储。感觉system design只要讲个大概思路就行,面试官不会去纠结太细节的
东西。

 一些和facebook相关的system design.网页上用markup language define了一些
object, how to store these objects, how to define relationship between
objects and users, how to search for relationship,
how to find recently listened song by one user, one song may be listend by the same user in
multiple times. etc.

 美国人, 系统设计,设计各系统能返回 top 10 listened songs from your
friends.

Given a location (a coordinate), return top 100 nearest places.
Follow up, given a location, return top 100 events within x months in 
nearest places。follow up 其实就是多加一个时间维度。

提出的方案就是对平面坐标系做grid,每个grid里的locations放到一台机器上。搜索
的时候就是针对input的location找到候选的grids(以某个半径画个圆),再从中通过
map-reduce找到前100个location。 

可以根据grid里location的密度或者访问量决定是不是要再做partition以提高
scalability。

follow up的话就是多加一个dimension代表时间。


白人,设计一个在线图片编辑系统,完全没有经验,只能按版上的partition,
backup, cache等瞎说。边引导边回答。当面给的feedback都还算positive。


第四轮:design,设计手机系统,可以查看周围的好友,饭店,电影院等等

第一面system design. 先问怎么求submatrix的和,回答说先预先计算好 (0, 0), (i,
j)的和,然后可以用这个和求其它的和。以为他会顺着这个问数据大了怎么design
system,结果没有,问了个跟这个题毫不相关的,怎么检测一个程序为什么慢。然后就
回答先确定bottleneck是cpu, disk io, 还是 network io. 然后针对每项他都详细问
怎么做。交流的过程中有时候没太明白他的问题吧。反正这一面的结果很不好,当天就
给我加了另外一面system design. 最后还是挂在这一面上。

system design. 问的是 shorten url. 因为之前准备过这个题,所以回答应该
是非常好的。面试官没问我是否见过这个题,我也就没说我准备过了。

电面一面:
给一堆F的用户,以及朋友关系,朋友之间的关系是双向的。问能否将朋友的关系图分
成两个partition。使得任何有直接朋友关系的两个人必须处在不同的partition里。


第二轮系统设计:让设计分布式的large scale的producer和consumer问题。就是有一
堆机器是producer,一堆机器是consumer。后来顺便写了一道coding题,范围变成是单
机的producer和consumer,实现produce和consume函数,其实就是相当于fix size的
cache的add和pop问题,不用考虑多线程


 一个看上去很强壮的老美,广告组的,问设计题。FB用户每天发非常多的status
update,要求设计一个系统,能够对最近几天内的update进行关键字搜索。我回答建一
个index,每个单词对应一个status update id的列表,查询结果是取列表的交集。我
对大数据处理完全没经验,不清楚这轮会被鄙视到什么程度,反正从结果看是pass了……

含有大量URL的文件里查找频率最高的K个URL。先给单机哈希表的解法,内存不够的
情况,给了按哈希值把大文件拆成小文件的解法。接着被问并行化,给了MapReduce的
解法。接着被问哈希表相关的计算题,M个slots的哈希表(哈希值范围是1~M,用链表
处理冲突),往里放了N个元素,假设他们的哈希值是随机的均匀分布,问slots里元素
个数的分布,也就是balls and bins的问题。不用coding。


Design a key-value store.
2. Design Google search
3. Architect a world-wide video distribution system
4. Build Facebook chat
5. Design News Feed


第一轮system design是一位很客气很nice的国人大哥:有很多台机器,设计一个类似
web crawler的一个系统。然后让我把每台机器上面跑的software的module diagram画
出来。然后还给了上下行带宽,让估算时间。


题目很简单就是设计news feed,上来我就想按系统设计的思路来讲,结果那哥跟我说
不管系统上的问题,就问我怎么拿Json来传数据生成一个news feed。然后画了一个
news feed, 就跟我们看到的facebook的news feed一样,然后问我news feed上面的各
个组件要在客户端生成json里面的数据是怎么样的。
比如 {
    Name:
    Image:
    Like:
    Comment:
    .
    .
    .
}
然后整个过程就是讨论JSON里面的具体数据应该是什么样的。
另外还有一个问题就是coding的那几轮都只出了一个问题其中有一轮甚至都没follow
up, 但是我确定我写的很快而且一次下来bug free的但是还是只出了一个问题,这是
正常还是想黑我的节奏。。。

稍微总结一下

1. 入门级的news feed
http://www.quora.com/What-are-best-practices-for-building-somet
http://www.infoq.com/presentations/Scale-at-Facebook
http://www.infoq.com/presentations/Facebook-Software-Stack
一般的followup question是估算需要多少server
另外这个帖子有讨论
http://www.mitbbs.ca/article_t/JobHunting/32463885.html
这篇文章稍微提到要怎么approach这种题,可以稍微看看
http://book.douban.com/reading/23757677/


2. facebook chat,这个也算是挺常问的
http://www.erlang-factory.com/upload/presentations/31/EugeneLet
https://www.facebook.com/note.php?note_id=14218138919
http://www.cnblogs.com/piaoger/archive/2012/08/19/2646530.html
http://essay.utwente.nl/59204/1/scriptie_J_Schipers.pdf

3. typeahead search/search suggestion,这个也常见
https://www.facebook.com/video/video.php?v=432864835468
问题在这个帖子里被讨论到,基本上每个问题,在视频里都有回答
http://www.mitbbs.com/article_t/JobHunting/32438927.html


4. Facebook Messaging System(有提到inbox search, which has been asked before)
messaging system就是一个把所有chat/sms/email之类的都结合起来的一个系统
http://www.infoq.com/presentations/HBase-at-Facebook
http://sites.computer.org/debull/A12june/facebook.pdf
http://www.slideshare.net/brizzzdotcom/facebook-messages-hbase/
https://www.youtube.com/watch?v=UaGINWPK068


5. 任给一个手机的位置信号(经纬度),需要返回附近5mile 的POI
这个这里有讨论,这题貌似nyc很爱考...
http://www.mitbbs.ca/article0/JobHunting/32476139_0.html


6. Implement second/minute/hour/day counters
这题真不觉得是system design,但万一问道,还是要有准备,貌似在总部面试会被问
道....
这个帖子有讨论
http://www.mitbbs.com/article_t/JobHunting/32458451.html


7. facebook photo storage,这个不太会被问起,但是知道也不错
https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf
https://www.facebook.com/note.php?note_id=76191543919


8. facebook timeline,这个也不太是个考题,看看就行了
https://www.facebook.com/note.php?note_id=10150468255628920
http://highscalability.com/blog/2012/1/23/facebook-timeline-bro


除了这些,准备一下这些题目
implement memcache
http://www.adayinthelifeof.nl/2011/02/06/memcache-internals/

implement tinyurl(以及distribute across multiple servers)
http://stackoverflow.com/questions/742013/how-to-code-a-url-sho

determine trending topics(twitter)
http://www.americanscientist.org/issues/pub/the-britney-spears-
http://www.michael-noll.com/blog/2013/01/18/implementing-real-t

copy one file to multiple servers
http://vimeo.com/11280885

稍微知道一下dynamo key value store,以及google的gfs和big table


另外推荐一些网站
http://highscalability.com/blog/category/facebook
这个high scalability上有很多讲system design的东西,不光是facebook的,没空的
话,就光看你要面试的那家就好了..
facebook engineering blog
http://www.quora.com/Facebook-Engineering/What-is-Facebooks-arc
http://stackoverflow.com/questions/3533948/facebook-architectur

其他家的
http://www.quora.com/What-are-the-top-startup-engineering-blogs


==================================================================
在说说怎么准备这样的面试
首先如果你连availability/scalability/consistency/partition之类的都不是太有概
念的话,我建议先去wikipedia或者找一个某个大学讲这门课的网站稍微看一下,别一
点都不知道
这个链接也不错
http://www.aosabook.org/en/distsys.html

如果你这些基本的东西都还知道,那么我觉得你就和大部分毫无实际经验的人差不多一
个水平...
能做的就是一点一点去准备,如果你还有充足的时间的话,建议从你面试的那家公司的
engineering blog看起,把人家用的technology stack/product都搞清楚,然后在把能
找到的面试题都做一遍呗....我们做coding题说白了不也是题海战术...而且你如果坚
持看下去,真的会看出心得,你会发现很多地方都有相同之处,看多了就也能照葫芦画
瓢了...

再有就是面试的时候应该怎么去approach这种题,我说说我的做法
1. product spec/usage scenario 和面试者confirm这个东西到底是做什么的
可以先列出来几个major functionality,然后有时间的话,再补充一些不重要的
把你想的都写下来

2. define some major components
就是画几个圈圈框框的,每个发表一番您的高见....然后讲他们之间怎么interact

以上是question specific的东西,
这个讲完了,我们可以讲一些每道题都是用的,比如说
怎么scale/怎么partition/怎么实现consistency,这些东西,可以套用到任何题上

http://www.jiuzhang.com/qa/645/
设计课上讲的数据库中针对同一记录发生大量并发写的问题,不太明白,一般有哪些优化方法呢?
一般的方法有:Write back cache。大概的意思就是 Client 只负责写给cache,cache自己去负责写给数据库,但不是每条都写回数据库,隔一段时间写回去。
其实对同一条记录发生大量并发写的情况是很少的,即便Facebook这样的级别,最挑战的并不是短时间很多写,而是短时间很多读,因为只要是给人用的东西,一定是读多于写。短时间很多人读,而此时正好cache里又被没有这个数据,这才是最头疼的。Facebook为了解决这个问题,拓展了memcached,增加了一个叫做 lease-get 的方法。具体原理和实现细节,Google搜索 “Facebook Memcached” :

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts