Tuesday, July 5, 2016

[CareerCup] 10.1 Client-facing Service 面向客户服务器 - Grandyang - 博客园



[CareerCup] 10.1 Client-facing Service 面向客户服务器 - Grandyang - 博客园
10.1 Imagine you are building some sort of service that will be called by up to 1000 client applications to get simple end-of-day stock price information (open, close, high, low). You may assume that you already have the data, and you can store it in any format you wish. How would you design the client-facing service which provides the information to client applications? You are responsible for the development, rollout, and ongoing monitoring and maintenance of the feed. Describe the different methods you considered and why you would recommend your approach. Your service can use any technologies you wish, and can distribute the information to the client applications in any mechanism you choose.

这道题是一道设计题,说是有一些数据信息要给1000个客户端应用访问,问我们的用什么样的面向客户服务器来实现所有的功能。根据树中描述,我们的服务器需要易于用户使用,也要易于自己使用,可以易于未来需要的变更修改,高效和可扩展性好,那么主要有以下三种实现方法:
1. 使用txt文件,这是最简单的方法,用户在FTP服务器上下载这个文本文件,这可能一定程度上方便了维护,因为文本文件易于浏览和备份,但是访问解析起来很麻烦,尤其是添加了新数据之后。
2. 使用SQL数据库,让客户直接对数据库操作。好处是我们可以利用数据库强大的检索功能找出我们想要的一些条件搜索结果,数据库可以回朔,且备份方便,便于客户集成现有的应用程序。坏处是可能过载了,我们需要整个SQL数据库的东西来维护,还需要实现额外层来浏览和维护数据,尽管数据库很安全,但是我们不能让用户访问一些他们不应接触的数据。
3. 使用XML,如果我们的数据里固定的格式和固定的大小,例如company_name, open. high, low, losing price. 那么XML可以写出如下这样:

使用XML的优点是便于发布,便于被机器和人读取,大多数程序语言都有XML的解析,便于增加数据,不会影响到解析,可使用现有工具进行备份。缺点是会给客户所有的数据,如果客户只要部分数据,这种方法就不高效,而且对数据的查询需要解析整个文件。

Client Ease of Use: We want the service to be easy for the clients to implement and useful for them.
Ease for Ourselves: This service should be as easy as possible for us to implement, as we shouldn't impose unnecessary work on ourselves. We need to consider in this not only the cost of implementing, but also the cost of maintenance.
Flexibility for Future Demands: This problem is stated in a "what would you do in the real world" way, so we should think like we would in a real-world problem. Ideally, we do not want to overly constrain
ourselves in the implementation, such that we can't be flexible if the requirements or demands change.
Scalability and Efficiency: We should be mindful of the efficiency of our solution, so as not to overly burden our service.
https://github.com/filipegoncalves/interview-questions/blob/master/systems_design/StockData.md

Step 1: Scope the Problem

We are specifically told that our system should support about 1000 users, which is good (most questions are not so specific), but this doesn't mean that we can be sloppy with the design: our service can get popular overnight; it should still be able to scale horizontally in a cost effective manner.
There aren't particularly many features or use cases for this system. As a user, we use the service to get stock price information. So, we can define 4 basic operations:
  • GetOpenPrice(sID): gets the open price of the stock with id sID.
  • GetClosePrice(sID): gets the close price of the stock with id sID.
  • GetHighPrice(sID): gets the highest price for the day of the stock with id sID.
  • GetLowPrice(sID): gets the lowest price for the day of the stock with id sID.
We can also define a higher level convenience function to get all prices at once:
  • GetInfo(sID): gets the open, close, high and low prices of the stock with id sID.
We assume that in this first version of the system users can only get stock price information for the current day, but business needs dictate that we should store all past information for analytics purposes. To be more specific, we are trying to get rich by developing new models to predict the stock market trends, so it will be useful to have a long historical database of the evolution of stock prices over time. For this reason, we should expect to accumulate quite a large database over time.

Step 2: Make reasonable assumptions

Let's make some back of the envelope calculations to get a feel for how big and how busy our system might be. Our expected userbase is not large - 1,000 users/day. That's about 40 users/hour. However, we should think about peak traffic and when it occurs. It is likely that by the end of the day, users will be more interested in querying the system. When developing services over the web, a good heuristic for peak traffic is to predict that approximately 15% to 20% of the total traffic for the day occurs in a single hour. In other words, the peak hour eats up between 15% and 20% of the total traffic. Let's err on the safe side and assume 20% - that gives us a peak traffic of 200 users/hour = 3.333 users/minute, so roughly 0.05 users/second. Not exactly large scale.
There are over 100K publicly traded companies in the world. Again, just to be safe, let's place an upper bound of 500K. An alphanumeric, case-sensitive ID can be used to identify stocks. This implies that the IDs use log_62(500K) characters, which is roughly 19/6 = 3, so we will go ahead and use IDs of length 4.
For the purposes of our analytics needs, it is also important to predict how large our dataset can grow. Let's consider a time window of 20 years. Analytics don't have to be very precise, so we are ok with storing stock prices once in an hour. Each price is stored as a 64-bit float, so, 8 bytes. We assumed 500K publicly registered traded companies, so that is roughly 4 MB of data per hour, or 96MB/day = 2.88 GB/month, approximately 35 GB per year. In a period of 20 years, this adds up to 700 GB.
So, if we are to register every stock price of every possible traded company every hour over 20 years, we would mostly be good with 1 or 2 servers. That's good to know. We don't exactly need a super powerful cluster of machines for analytics. A 1 TB harddrive is enough!

Step 3: Start designing

We start off small (and that is probably going to be enough for our userbase) with 1 single server to answer requests. This specific question seems to place some relevance on how we choose to store and present the data, so we will be a little more detailed than usual.
One possibility is to run an HTTP server that uses a MySQL database in the backend to store and retrieve prices. Each request is served by querying the DB with the given stock ID. This query effectively fetches the price for the client. The result is written back in a dynamically generated webpage, perhaps using PHP, and sent back to the client. This works and is relatively simple to set up, but there are some drawbacks: MySQL is a complex beast. The users requests only read from the database - do we really need to have a MySQL engine behind offering ACID guarantees (Atomicity, Consistency, Isolation, Durability)? If users only read from the database, we shouldn't really be worried about ACID properties. Furthermore, the whole process to satisfy a request sounds overly complicated: connect to the database, issue a query, retrieve the results, generate an HTML page dynamically, send it back to the user. Remember that each component in an architecture is another architectural piece that will cost money, need maintenance, need upgrades, etc. MySQL is probably not a very good choice here.
A different approach might be FTP. We could set up an FTP server with anonymous logins and allow users to freely browse the publicly available data. We could store everything in good old text files, et voila - our system is up and running. While conceptually simple, this may be suboptimal for many reasons: for one, FTP is not a safe protocol. Sure, we allow anonymous logins, and stock information is not really top secret stuff, but what if some day we decide to give private accounts to each user? Then usernames and passwords would be sent out in the clear - not ideal in the 21st century. We could use SFTP instead, but really, how familiar is the common internet user with FTP? The average user will be confused. What about mobile users? How many Android / iOS users have an FTP client installed on their mobile device? It's not really a usable system. Also, if we were to start giving out private user accounts in the future, our servers would start cluttering up zillions of user accounts. Is that really necessary to provide a stock price service?
What about web services? A shiny simple SOAP-based web service where stock prices are all stored in XML files on disk and retrieved using the web service API. This has the advantage that we map our usecases directly into the implementation: the web services implement the 5 functions mentioned in step 1. Everything's XML based, so there's plenty of parsers available out there for pretty much any language that other developers might be using to implement a client. Oh yeah, that's another great point: as long as the API is documented, other developers can easily make Android or iOS apps that serve as a client. We can also implement a webpage on top of the API to provide service over HTTP. Corporate / enterprise clients can have their own software that communicates with the webservices API. What's not to love? This looks like a reasonable approach; what are the downsides? Well, XML does incur a little overhead - textual representation of data is more expensive - it takes up more disk space, uses more bandwidth, and takes longer to transmit. This could be a problem for users behind slow connections, but more often than not, most users today have access to highspeed connections. Perhaps a more serious issue is how to integrate this with our long time dataset for analytics. Data used for analytics is usually kept in OLAP database engines, so we might need to do some conversion work when transferring our stock prices to the analytics data warehouse. Is this acceptable? Maybe. If anything, we will always have to do some sort of conversion and transfer work to get data in the analytics system, so this looks like something we can't avoid.
Overall, the web services approach looks promising and reasonable.
This question doesn't seem so much about scalability and large scale issues, so we will omit steps 4 and 5. Either way, as usual, one of the key issues is growing. What if we get popular? 1000 clients is a very small userbase - should we really be assuming that we won't grow beyond that? What if we want to scale? We can leverage the fact that this specific system has an extremely high read-to-write ratio, so: cache, cache, cache. (Ab)use caching
http://petercrushcode.blogspot.com/2016/06/design-service-providing-end-of-day.html
Can traditional SQL store be a good fit? As stated earlier, we don't need rich searching capabilities.
We may want to go with the No-SQL stores such as Redis or Memcached. From a personal experience, Redis can provide both low response time and persistency. 

Putting the web service in front of the redis server, the first design is now completed.

Figure 1. Initial design
Redis can easily handle over 80000 requests per second and any web server can handle around 3000 requests per second. It seems simple and easy to understand. Is it enough?

Not quite yet. We have two potential "Single point of failure" problems: Web server and Redis.

If any of two components fails, our stock service will stop.

Design in Figure 2 address the single point of failure problems.

Figure 2. Better design with redundancy
 In this new design, we put a reverse proxy/ load balancing server in front of the web servers. Nginx is one good candidate. We can also use the dedicated switch to do so. You can put a backup nginx server to cope with the failure of the Nginx server.

This way, the requests from 1000 clients can be distributed across multiple web servers. 

Secondly, Redis replica can improve the redundancy of the Redis store. It will make sure the data in redis is duplicate across multiple Redis servers.

https://prezi.com/wocrwpssvcw-/system-design/
Read full article from [CareerCup] 10.1 Client-facing Service 面向客户服务器 - Grandyang - 博客园

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts