Thursday, July 23, 2015

Design Interview Questions Miscs



http://dongxicheng.org/search-engine/system-designing-in-finging-jobs/
(1) 要求设计一个DNS的Cache结构,要求能够满足每秒5000以上的查询,满足IP数据的快速插入,查询的速度要快。(题目还给出了一系列的数据,比如:站点数总共为5000万,IP地址有1000万,等等)
(2) 有N台机器,M个文件,文件可以以任意方式存放到任意机器上,文件可任意分割成若干块。假设这N台机器的宕机率小于1/3,想在宕机时可以从其他未宕机的机器中完整导出这M个文件,求最好的存放与分割策略。
(3) 假设有三十台服务器,每个上面都存有上百亿条数据(有可能重复),如何找出这三十台机器中,根据某关键字,重复出现次数最多的前100条?要求用Hadoop来做。
(4) 设计一个系统,要求写速度尽可能高,说明设计原理。
(5) 设计一个高并发系统,说明架构和关键技术要点。
(6) 有25T的log(query->queryinfo),log在不段的增长,设计一个方案,给出一个query能快速反回queryinfo

  1) 关于高并发系统设计。主要有以下几个关键技术点:缓存,索引,数据分片,锁粒度尽可能小。
(2) 问题2涉及到现在通用的分布式文件系统的副本存放策略。一般是将大文件切分成小的block(如64MB)后,以block为单位存放三份到不同的节点上,这三份数据的位置需根据网络拓扑结构配置,一般而言,如果不考虑跨数据中心,可以这样存放:两个副本存放在同一个机架的不同节点上,而另外一个副本存放在另一个机架上,这样从效率和可靠性上,都是最优的(这个google公布的文档中有专门的证明,有兴趣的可参阅一下。)。如果考虑跨数据中心,可将两份存在一个数据中心的不同机架上,另一份放到另一个数据中心。
(3)问题4涉及到BigTable的模型。主要思想是将随机写转化为顺序写,进而大大提高写速度。具体是:由于磁盘物理结构的独特设计,其并发的随机写(主要是因为磁盘寻道时间长)非常慢,考虑到这一点,在BigTable模型中,首先会将并发写的大批数据放到一个内存表(称为“memtable”)中,当该表大到一定程度后,会顺序写到一个磁盘表(称为“SSTable”)中,这种写是顺序写,效率极高。说到这,可能有读者问,随机读可不可以这样优化?答案是:看情况。通常而言,如果读并发度不高,则不可以这么做,因为如果将多个读重新排列组合后再执行,系统的响应时间太慢,用户可能接受不了,而如果读并发度极高,也许可以采用类似机制。
http://www.shuatiblog.com/blog/2015/02/05/winning-game-rank-pagerank/
We have a history of match result of pingpong games, assume each match is <player1, player2, result, timestamp>, player1 and player2 are long type, result is a bit value (1 means player1 win).

design a algorithm to sort the players by their possibility to win future games.

Solution
For each play,
winning score = score_diff * {delay factorcurrent time – winning time};

Multithreading Async Increment Problem
link
If two threads are incrementing a variable 100 times each without synchronization, what would be the possible min and maximum value.

Solution
Well, max is init+200, and min is init+2.
P1 & P2 copy var
P1 increments 99 times. so var becomes var + 99
P2 increments once. so var becomes var + 1
P1 copies var (value is var + 1)
P2 increments 99 times. so var becomes var + 100
P1 increments once. so var becomes var + 2
http://stackoverflow.com/questions/16056462/minimum-and-maximum-value-of-a-global-variable-with-2-thread
if you translate increment code to assembly it's pseudocode will be like:
1-mov ax,mem[c]
2-inc ax
3-mov mem[c],ax
if we have 2 thread consider this scenario:
thread 1: line 1
thread 2: line (1-2-3) for 99 times
thread 1: line (2-3)
thread 2: line 1
thread 1: line (1-2-3) for remaining 99 times
thread 2: line (2-3) fro the last time
now the value of c is 2 so minimum is 2
https://hellosmallworld123.wordpress.com/2014/10/10/google/
6. system design,设计一个键盘的输入法,不需要一个一个按,而是快速的滑动,怎样设计,怎样给suggestion
给suggestion应该涉及到用trie的数据结构。此外如果用trie的话还要注意排序的问题,如果trie要排序的话还要记下每个char的频率。
https://hellosmallworld123.wordpress.com/2014/09/09/f-onsite-interview/
Design a system that find the places near given coordinates. What if there are billions of places.
The billionth query int a day will win a car
https://hellosmallworld123.wordpress.com/2014/05/13/object-oriented-design-problems/
There are three basic relations, inheritance (arrow), association(line) and aggregation (diamond solid or shallow).
Here are some problem from the Crack the coding interview.
1. Design the data structures for a generic deck of cards. Explain how you would subclass it to implement particular card games.
First we decide what classes we will have. Say if we are implementing a blackjack game.
We need to have a card class, which will have the suit and the value. suit will come from a enum class and value is just an integer value (notice that the Ace and King Queen and Jack might need specific treatment). Then we will have a Deck class which will aggregate with the card class. Then we will need a hand class that will handle the distribution of cards and count the scores.
2. Imagine you have a call center with three levels of employees: fresher, technical lead (TL), product manager (PM). There can be multiple employees, but only one TL or PM. An incoming telephone call must be allocated to a fresher who is free. If a fresher can’t handle the call, he or she must escalate the call to technical lead. If the TL is not free or not able to handle it, then the call should be escalated to PM. Design the classes and data structures for this problem. Implement a method getCallHandler().
First the fresher, TL and PM have common entries such as name, salary, address, that can be put into a common subclass, each of them will also have a rank which differ them from each other. The callHandler will take the call and put all the call in a queue, and distribute the call according to the rank to different employee.
3.Design a musical juke box (MP3 player like ipod) using object oriented principles
First lets break down the components:
Player itself.
Flash that stores songs.
Display that display the info about the song
Then we further break down the components to functions
PlayList that queue up the song will be played
4. Design a chess game using object oriented principles
We need a chess board to record the name, and a handler to handle the input of the users, Then each chess piece will have some common attributes so they can be put into a basic class while each of them can inherit the base class and add their own attributes.
设计文件系统

2. 数据结构for spreadsheet
3. 一个app需要用cache,怎么实现thread safe
4. social network, billions id, every id has about 100 friends roughly, what is
max connections between any two ppls. write algorithm to return min
connections between two ids: int min_connection(id1, id2)
you can call following functions
expand(id) return friends list of id
expandall(list) return friends union of all the ids in the list
intersection(list1, list2) return intersection
removeintersection(list1, list2)
5. Open google.com, you type some words in the edit box to search something, it will return lots of search results. Among these returning results (that is, website link), you may only CLICK some results that are interesting to you. The system will record the “CLICK”action. Finally, you will have the search results (i.e. url) and “CLICK” informatin at hand.
Question: how do you find the similarity of these searching?
6. 如何找出最热门的话题(根据tweets)。如果一个话题一直热门,我们不想考虑怎么办
7. Discuss design challenges of a distributed web crawler running on commercial PCs. How to utilize network bandwidth of those PCs efficiently?
8. Design a site similar to tinyurl.com
9. large log file,含有 customer id, product id, time stamp想得到在某一天中某个custom看网页的次数1. 足够memory 2. limited memory
10. 设计一个actor和movie的数据库的schema, 支持从movie得到它的actors和从actor得到ta出现过的moive (Google, phone, 2006)
11. 某建筑有五十层高,打算装俩电梯,设计该电梯系统
12. how to design facebook’s newsfeed?
13. 一个文件里n行m列,每行是一个record,每列一个feature,你时不时要按不同feature排序和查找。不能用数据库,文件大小内存能装下,数据结构和算法不限,语言不限,给出你最好的办法。
14. Design online game
15. static 变量用来在整个class中共享数据.基于此,各种synchronization技术, 以及busy waiting的优缺点,啥时候要用基于busy waiting的 spinlock主要是基于性能的探讨。 如果有一个应用程序运行时没有达到timing constraint, 你如何去分析问题出在哪儿, 可以用什么工具或者技术。
16. 设计题, 有一个多台机器构成的cluster。 现在有大量公司的数据文件 (并有多个备份)。 如果设计一个算法,使得每台机器尽量均衡的使用,并且 每个公司文件的不同copy不能存在于同一台机器上。主要的Idea就是用round-robin的方式assign每个公司的原数据文件到一台机器,再结合使用hashtable。 Interviewer提到我的解法正是他现在在使用的解法。
17. Design a class providing lock function which provide lock only if it sees there are no possible deadlocks.
18. 设计一个分布式文件系统,给定path name,可以读写文件。具体的system design这里就不提了。其中一个细节是,给定path name,怎么知道哪个node拥有这个文件。我提出需要实现一个lookup function,它可以是一个hash function,也可以是一个lookup table。如果是lookup table,为了让所有client sync,可以考虑额外做一个lookup cluster。然后Interviewer很纠结,既然可以用hash function,为什么还搞得那么复杂。我就告诉他hash function的缺点。假定一开始有N个node,hash function把M个文件uniformly distribute到N个node上。某天发现capacity不够,加了一个node。首先,要通知所有的client machine,configuration 改变了。如果不想重启client machine的process,这不是一个trivial job。其次,文件到node的mapping也变了。比如,本来按照hash function,一个文件是放在node 1。加了一个node 后,它可能就map到node 2了。平均来说,N/(N
+1)的文件需要move到新的node。这个data migration还是很大的。然后我就提出一些hash function的design,可以减少data migration。最后他提了一个问题,说要实现一个function,要统计distributed file system所有目录的大小。前提是,一个目录下的文件可能放在不同的node上。我说这个不就是在每个node上统计,然后发到一个merge吗。他说对,但是又问用什么data structure来表示。我说这就是hash table,key就是directory name,value就是大小。因为directory本身是树结构,这个hash table的key可以用tree来组织。最后让我实现一个function,把我说得这个data structure serialize成byte array。因为这个byte array就是网络传输的data。我用了depth first traverse。不过等我程序写完,才发现,用breath first traverse会更方便,code也会很简洁
19. 超大图的存储问题
20. 给个Lock w/ two atomic method lock() and unlock(),请用lock实现一个文件读写的系统,要求:
1: reader blocks writer;
2: writer blocks reader;
3: writer blocks writer;
21。设计一个web cache server,假设存储网页数量是10个billion,打算怎么设计
22.你可以得到网站访问记录,每条记录有user IP, 写一个程序,要随时能算出过去5分钟内访问次数最多的1000个IP. 这个好像跟着这个rolling window 的precision 有关,所以我们暂且定为5秒钟update 一次window
23. Design free and malloc.
24. how to design data structures for a facebook network and how to design an algorithm to find connection? How to optimize it if data is distributed into multiple computers?
25. design a deck class and member function to randomly select a card from those cards which haven’t been selected before. (You can assume the number of this function call will never be larger than the number of cards) For example, we have a deck of four card: 1,2,3,4. First it may select 3, then next time it should randomly select one from 1,2,4… And design a member function to reset.
26. google search design problem. How to distribute data and how to design backup system
27. 设计一个online chat system
28. design bit.ly url shortening web service。算法设计,后端存储,中间层cache,前端load balance,最后是web analytics。
29. Design and implement an algorithm that would correct typos: for example, if an extra letter is added, what would you do?
30. Suppose there are 2 persons A and B on FB . A should be able to view the pictures of B only if either A is friend of B or A and B have at least one common friend . The interviewer discussed it for nearly 30 minutes . The discussion mainly included following points:
1. How are you going to store the list of friends for a given user?
2. File system vs DB
3. Given list of friends of 2 users, how are you going to find common friends?
4. If you are going to store the friends in DB then how will the table look like?
5. How many servers do you need?
6. How are you going to allocate work to servers?
7. How many copies of data will you need?
8. What problems will you face if you are maintaining multiple copies of data.
31. design structure for auto completion
32. 如何实现search suggestions。
33. 设计fb的系统支持like那个button
34. design 股票#,time,price;
-设计一个client side显示股票信息,给出尽可能多的user case
-在给出的user case里面,怎么设计客户端,使得客户段性能提高
-怎么设计server端
-数据如何传输
-server端如何保存数据
-怎么设计database table保存数据
-不用index怎么提高数据查询速度
-database是怎么实现数据查询的(要求从database implementation角度解释)


Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts