Sunday, November 27, 2016

Self Introduction



http://blog.gainlo.co/index.php/2016/10/14/8-secretes-software-engineer-self-introduction/
First of all, your self introduction is a critical part of making a good first impression. You can hardly underestimate how important it is because it affects interviewer’s judgment unconsciously. It’s same as we are inevitably judging people based on their appearances even before talking with them.

What is a good software engineer self introduction?

First and foremost, it must be concise. I’m surprised by how many people ignored this point. Take Google/Facebook interview as an example, interviewers usually expect to spend 5-10min discussing candidate’s background. Having verbose introduction not only will bore interviewers, it makes valuable points in your introduction less likely to be remembered. Keep in mind that self introduction is not presentation, please cut to the point with zero junk information.
In addition, good introduction highlights valuable points in your experience. You don’t need to illustrate all your projects, but things that show your technical/leadership skills should be pointed out. Be clear about which points you’d like to mention and put them into a 5-10min time frame.
Lastly, introduction is not only about the content, but communication. Bad communication skills can make great content boring.

Show your relevance to the job

In essence, self introduction serves two purposes. First, it gives interviewers a general idea of your background – whether you are an experienced engineer, a new grad or a series entrepreneur and so on. More importantly, it provides information about why you are relevant to the job.
Most people ignored the second point and that’s why their introductions have zero useful information. Let me give you an example. Suppose you are applying for a company that is doing a lot of AI and machine learning. A good introduction can be “I’ve been working on data analysis related work for the past 2 years including both infrastructure and algorithms. For example, I’ve improved our ranking system with XYZ metrics. BTW, I’m also taking a bunch of online course about deep learning.”
Obviously, the candidate is not a machine learning expert, however, he’s trying hard to show why he’s relevant. He may also mention that he has done some awesome front-end work, which however is even less impressive than his last point (taking courses).
With that in mind, you are expected to have different introduction to different jobs.

I’m an experienced engineer

If you’ve been working for a couple of years, you might have quite a lot projects to talk about. Then you should really care about conciseness and relevance.
Select projects/experiences carefully and ask yourself why I choose this project to discuss. It might be because the project is technically challenging and shows your leadership skills at the same time. Or it might be because the project is highly relevant to what the new company is doing. All in all, you should be aware of the reason for everything you want to discuss. Otherwise, you are not sure what you are talking about.
Also, it’s important to keep a balance between high level ideas and details, which is a common mistake for experience engineers. Things like “I’ve built the whole infrastructure of our crawling system” is too vague except you came from a well known company like Google, which justifies the complexity. Otherwise, you’d better explain why this point is important (relevant). You may mention that the system has crawled 1B URLs with 20 machines.
However, the opposite extreme is equally bad. Too many technical details can only confuse interviewers. Again, a good criterion is to ask yourself – does this detail provide extra information that significantly shows my relevance and strength? If not, don’t bother to mention it. Another way is to find someone who knows very few about your technical experience, introduce your project and let him tell you what he has remembered. Check whether all important points are covered or any of them is trumped by irrelevant details.

Confidence communication

As we mentioned above, communication skill is also part of the introduction. In an extreme case, you are reading your introduction from a paper, which is terrible even if the content is great.
Be confident about yourself. If you are not confident about your projects, don’t expect anyone else to like it. It’s not something you should pretend to be, but deep inside you should just believe what you have done is important. If projects you prepared are not challenging in your mind, switch to something you are more confident about.
Don’t rush. The point is not to cover as much information as possible. Instead, your job is to make sure that interviewers get the 2-3 points you want to convey. If your content is concise enough, speak slowly and confidently. At the end of the day, what matters is how much information the interview has absorbed rather than you have outputted.
Always prepare. No matter how confident you are, always prepare your introduction before every interview. Prepare specific content for each company/position you are targeting at, prepare every question interviewers may ask, and prepare all the details of everything you’ll mention even they may never be used.



Friday, November 25, 2016

How To Design Google Docs



http://blog.gainlo.co/index.php/2016/03/22/system-design-interview-question-how-to-design-google-docs/
The question looks quite general at first glance and it indeed is. Google Docs is a huge system with tons of features. If you spend few minutes thinking about his problem, you may realize that Google Docs is much more complex than it seems to be.
it’s recommended to provide high-level solutions when the question is big. And one way to abstract your solution is dividing a big system into smaller components.
Apparently, Google Docs is a huge system that has a bunch of features, including doc storage, share docs, formatting, editing and so on. In fact, I can barely address such a big problem without separating it to different sub-problems.
  • File storage. Since Google Docs is part of Google Drive, I include the storage feature as well. The system allows users to group files (docs) into folders and support features like edit/create/remove etc.. It works like an OS.
  • Online editing and formatting. There’s no doubt that one of the core features of Google Docs is online editing. It supports almost everything of Microsoft Office and maybe more.
  • Collaboration. It’s truly amazing that Google Docs allows multiple people to edit a single doc simultaneously. This is a technical challenge for sure.
  • Access Control. You can share docs with your friends and give different permissions (owner, read-only, allow comment etc.).
A bunch of less important features are ignored here, like add-ons, spell-checking, publish to the web and so on.

Storage and Formatting

Also, storage and formatting can be regarded as backend and front-end to some extent.
IMHO, the storage system of Google Docs (or Google Drive) is very close to an operating system. It has notions like folders, files, owners etc..
Therefore, to build such system, the basic building block is a file object, which contains content, parent, owner and some other meta data like creation date. Parent denotes the folder relation and the root directory has empty parent. I won’t discuss how to scale the system as building a distributed system can be extremely complicated. There are tons of things to be considered like consistency, replication.
For the front-end formatting, an interesting question is how you would store documents with corresponding formats. If you know Markdown, it’s definitely one of the best solutions. Although Google Docs can be more complicated, the basic idea of Markdown still applies.

Concurrency

This is not an easy problem, to be honest. You can’t just let each person work on his own and then merge everyone’s copy or the last one to edit wins. If you have tried the collaborative editing feature, you can actually see what the other person is doing and you get instant feedback.
If you have used Git for version control, some of the ideas here can be similar. First, let’s consider the simplest case – only 2 people are editing the same doc. Assuming the doc is “abc”.
Basically, the server can keep 2 copies of the same doc to each person and tracks the full revision history as well. When A edits the doc by adding “x” in the beginning, this change will be sent to the server together with the last revision seen by A. Suppose at this time, B deletes the last character “c” and this change is sent to the server as well.
Since the server knows the change is made on which revision, it will adjust the change accordingly. More specifically, B’s change is deleting the 3rd character “c”, which will be transformed to deleting the 4th character as A adds “x” in the beginning.
This is called operational transformation. It’s okay if you never heard of this. The basic idea is to transform each person’s mutation based on its revision and revisions from other collaborators.

Access Control

Google Docs allows you to invite collaborators to each doc with different level of permissions.
A naive solution shouldn’t be hard. For each file, you can keep a list of collaborators with corresponding permissions like read-only, owner etc.. When one wants to do specific actions, the system checks his permission.
Usually, I’d like to ask what are challenges to scale such access control system.
As is known to all, when scaling system to millions of users, there can be a lot of issues. Few things I’d like to mention here are:
  • Speed. When the owner updates the permission of a folder (e.g. remove a specific viewer), this update should be propagated to all its children. speed can be a concern.
  • Consistency. When there are multiple replications, it’s non-trivial to keep every replica consistent especially when multiple people update the permission at the same time.
  • Propagation. There can be a lot of propagation cases. Besides updating the permission of a folder should reflect on all its children, if you give read permission of a doc D to someone, he may have read permission to all the parents of doc D as well. If someone deleted doc D, we may revoke his read permission of D’s parents (maybe not, this is more of a product decision).
Designing a complex system like Google Docs can be intimidating. But once you divide the system into smaller components, it becomes much simpler.

How to Design a Trending Algorithm



http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/

http://blog.gainlo.co/index.php/2016/05/03/how-to-design-a-trending-algorithm-for-twitter/
As to design a trending algorithm for Twitter, the system should be able to provide a list of topics that are popular at the current moment. I like to make the problem general and a little bit vague since I want to see how a candidate can approach this kind of open-ended question. For instance, there’s no clear definition of what popular means here.
A general idea is that let’s use a term (or a word) to represent a topic and it can be a hashtag (like #gainlo) or just a phrase. If a term has a huge volume in recent tweets compared to the past, the term should be identified as popular. For example, if millions of people are talking about #gainlo today but in the past only hundreds of people talked about it, #gainlo should definitely be a hot topic at the current moment.
The reason we should compare to past volume is that for some common terms like “Monday” or “weather”, they have a huge volume at any time and shouldn’t be selected as trending in most cases.
To sum up, the basic idea is that for each term, if the ratio of term volume within last several hours and term volume of last X days is high, it will be regarded as a trending topic.

Infrastructure

Of course the trending topics should be displayed instantly, which means we can ask users to wait for an hour so that the system can calculate and rank all the terms. So what the underline infrastructure looks like?
Obviously, the calculation can be costly given the huge amount of tweets everyday. In this case, we can consider using offline pipelines.
More specifically, we can keep several pipelines running in the offline that calculates the ratio of each term and output the results to some storage system. The pipelines may refresh every several hours assuming there’s no big difference between a short period of time. So when a user checks the trending topics from the front end, we can just this user with pre-computed results.
The basic approach is really naive, do you have any ideas to improve it? It can be from any perspective, like improve the system cost, or improve the data quality etc.. I’ll illustrate few ideas in the following part of the post.

Absolute term volume

If you simply calculate the ratio as explained above, I’m pretty sure there will be some very weird terms selected. Think about the following scenario, suppose there are only 300 people who tweeted about a weird topic “#gailo-mock-interview” and in the past no one has ever talked about it. The ratio (volume within last few hours / volume within last X days) is 1, which can rank at the top of the list.
Apparently, this is not something we want to show to users. I believe that you’ve already identified the problem here. If the absolute volume is not big enough, some unpopular terms may get picked. You can calculate the ratio like volume within last few hours / (volume within last X days + 10000) so that small volume gets diluted. Or you can use a separate signal as absolute term volume score to combine with the ratio.

Influencers

Another idea is that if some topics are discussed by high profile people, they might be more likely to be interesting and popular.
There are many ways to design this algorithm. One approach is to first identify who are high profile users. We can simply use follower number (although there are lots of fake influencers who bought followers). If a topic was tweeted by any influencer, we can count this topic tweeted by multiple times. So just multiply the tweet counts with a parameter based on the popularity of the influencer.
One may argue that we should not give influencers more weight since if a topic is trending, there must be a huge number of normal users talking about it. That might be true. The point here is that you will never know the result until you give it a try. So I’m definitely not saying that this is the right thing to do, but it may be worth to have an experiment.

Personalization

Different people have different taste and interests. We can adjust the trends list according different users. This can be quite complicated since there are so many things you can do to make it personalized.
For instance, you can calculate a relevance score between each topic and the user based on signals including his previous tweets, who he has followed and what tweets he has favorited etc.. Then the relevance score can be used together with the trending ratio.
In addition, location should also be a valuable signal. We may even calculate trending topics for each location (maybe city level).
how does the system dedup them? Are there other signals like users search history can be integrated?

Thursday, November 24, 2016

Design eCommerce Website



https://mp.weixin.qq.com/s/1AwHJ7CGxf8DQxSjYUeTvg
  • (实现方案2)如果用户提交订单时进行库存预占,那么将也只能有1个用户将1000个商品提单成功,其它的人均提示“库存不足,提单失败”。
  • (实现方案3)如果用户提交订单&支付成功时进行库存预占,那么这1000个人都能生成订单,但是只有1个人可以支付成功,其它的订单均会被自动取消。
京东到家目前采用的是方案2,理由:
  • 用户可能只是暂时加入购物车,并不表示用户最终会提单并支付。
  • 所以在购物车进行库存校验并预占,会造成其它真正想买的用户不能加入购物车的情况,但是之前加车的用户一直不付款,最终损失的是公司。
  • 方案3会造成生成1000个订单,无论是在支付前校验库存还是在支付成功后再检验库存,都会造成用户准备好支付条件后却会出现99.9%的系统取消订单的概率,也就是说会给99.9%的用户体验到不爽的感觉。
  • 数据表明用户提交订单不支付的占比是非常小的(相对于加入购物车不购买的行为),目前京东到家给用户预留的最长支付时间是30分钟,超过30分钟订单自动取消,预占的库存自动释放
综上所述,方案2也可能由于用户下单预占库存但最终未支付,造成库存30分钟后才能被其它用户使用的情况,但是相较于方案1,方案3无疑是折中的最好方案。

重复提交订单的问题?

重复提交订单造成的库存重复扣减的后果是比较严重的。比如商家设置有1000件商品,而实际情况可能卖了900件就提示用户无货了,给商家造成无形的损失
可能出现重复提交订单的情况:
  • (1、用户善意行为)app上用户单击“提交订单”按钮后由于后端接口没有返回,用户以为没有操作成功会再次单击“提交订单”按钮
  • (2、用户恶意行为)黑客直接刷提单接口,绕过App端防重提交功能
  • (3、提单系统重试)比如提单系统为了提高系统的可用性,在第一次调用库存系统扣减接口超时后会重试再次提交扣减请求
好了,既然问题根源缕清楚了,我们一一对症下药
  • (1、用户善意行为)app侧在用户第一次单击“提交订单”按钮后对按钮进行置灰,禁止再次提交订单
  • (2、用户恶意行为)采用令牌机制,用户每次进入结算页,提单系统会颁发一个令牌ID(全局唯一),当用户点击“提交订单”按钮时发起的网络请求中会带上这个令牌ID,这个时候提单系统会优先进行令牌ID验证,令牌ID存在&令牌ID访问次数=1的话才会放行处理后续逻辑,否则直接返回
  • (3、提单系统重试)这种情况则需要后端系统(比如库存系统)来保证接口的幂等性,每次调用库存系统时均带上订单号,库存系统会基于订单号增加一个分布式事务锁,伪代码如下:
    
    
    1.   int ret=redis.incr(orderId); 
    2.   redis.expire(orderId,5,TimeUnit.MINUTES); 
    3.   if(ret==1){//添加成功,说明之前没有处理过这个订单号或者5分钟之前处理过了 
    4.       boolean alreadySuccess=alreadySuccessDoOrder(orderProductRequest); 
    5.       if(!alreadySuccess){ 
    6.           doOrder(orderProductRequest); 
    7.       }else
    8.           return "操作失败,原因:重复提交"
    9.       } 
    10.   }else
    11.       return "操作失败,原因:重复提交"
    12.   }

库存数据的回滚机制如何做

需要库存回滚的场景也是比较多的,比如:
  • (1、用户未支付)用户下单后后悔了
  • (2、用户支付后取消)用户下单&支付后后悔了
  • (3、风控取消)风控识别到异常行为,强制取消订单
  • (4、耦合系统故障)比如提交订单时提单系统T1同时会调用积分扣减系统X1、库存扣减系统X2、优惠券系统X3,假如X1,X2成功后,调用X3失败,需要回滚用户积分与商家库存。
其中场景1,2,3比较类似,都会造成订单取消,订单中心取消后会发送mq出来,各个系统保证自己能够正确消费订单取消MQ即可。而场景4订单其实尚未生成,相对来说要复杂些,如上面提到的,提单系统T1需要主动发起库存系统X2、优惠券系统X3的回滚请求(入参必须带上订单号),X2、X3回滚接口需要支持幂等性。
其实针对场景4,还存在一种极端情况,如果提单系统T1准备回滚时自身也宕机了,那么库存系统X2、优惠券系统X3就必须依靠自己为完成回滚操作了,也就是说具备自我数据健康检查的能力,具体来说怎么实现呢?
可以利用当前订单号所属的订单尚未生成的特点,可以通过worker机制,每次捞取40分钟(这里的40一定要大于容忍用户的支付时间)前的订单,调用订单中心查询订单的状态,确保不是已取消的,否则进行自我数据的回滚。

多人同时购买1件商品,如何安全地库存扣减

现实中同一件商品可能会出现多人同时购买的情况,我们可以如何做到并发安全呢?
可能出现重复提交订单的情况:
  • (1、用户善意行为)app上用户单击“提交订单”按钮后由于后端接口没有返回,用户以为没有操作成功会再次单击“提交订单”按钮
  • (2、用户恶意行为)黑客直接刷提单接口,绕过App端防重提交功能
  • (3、提单系统重试)比如提单系统为了提高系统的可用性,在第一次调用库存系统扣减接口超时后会重试再次提交扣减请求
好了,既然问题根源缕清楚了,我们一一对症下药
  • (1、用户善意行为)app侧在用户第一次单击“提交订单”按钮后对按钮进行置灰,禁止再次提交订单
  • (2、用户恶意行为)采用令牌机制,用户每次进入结算页,提单系统会颁发一个令牌ID(全局唯一),当用户点击“提交订单”按钮时发起的网络请求中会带上这个令牌ID,这个时候提单系统会优先进行令牌ID验证,令牌ID存在&令牌ID访问次数=1的话才会放行处理后续逻辑,否则直接返回
  • (3、提单系统重试)这种情况则需要后端系统(比如库存系统)来保证接口的幂等性,每次调用库存系统时均带上订单号,库存系统会基于订单号增加一个分布式事务锁,伪代码如下:
    
    
    1.   int ret=redis.incr(orderId); 
    2.   redis.expire(orderId,5,TimeUnit.MINUTES); 
    3.   if(ret==1){//添加成功,说明之前没有处理过这个订单号或者5分钟之前处理过了 
    4.       boolean alreadySuccess=alreadySuccessDoOrder(orderProductRequest); 
    5.       if(!alreadySuccess){ 
    6.           doOrder(orderProductRequest); 
    7.       }else
    8.           return "操作失败,原因:重复提交"
    9.       } 
    10.   }else
    11.       return "操作失败,原因:重复提交"
    12.   }

库存数据的回滚机制如何做

需要库存回滚的场景也是比较多的,比如:
  • (1、用户未支付)用户下单后后悔了
  • (2、用户支付后取消)用户下单&支付后后悔了
  • (3、风控取消)风控识别到异常行为,强制取消订单
  • (4、耦合系统故障)比如提交订单时提单系统T1同时会调用积分扣减系统X1、库存扣减系统X2、优惠券系统X3,假如X1,X2成功后,调用X3失败,需要回滚用户积分与商家库存。
其中场景1,2,3比较类似,都会造成订单取消,订单中心取消后会发送mq出来,各个系统保证自己能够正确消费订单取消MQ即可。而场景4订单其实尚未生成,相对来说要复杂些,如上面提到的,提单系统T1需要主动发起库存系统X2、优惠券系统X3的回滚请求(入参必须带上订单号),X2、X3回滚接口需要支持幂等性。
其实针对场景4,还存在一种极端情况,如果提单系统T1准备回滚时自身也宕机了,那么库存系统X2、优惠券系统X3就必须依靠自己为完成回滚操作了,也就是说具备自我数据健康检查的能力,具体来说怎么实现呢?
可以利用当前订单号所属的订单尚未生成的特点,可以通过worker机制,每次捞取40分钟(这里的40一定要大于容忍用户的支付时间)前的订单,调用订单中心查询订单的状态,确保不是已取消的,否则进行自我数据的回滚。

多人同时购买1件商品,如何安全地库存扣减

现实中同一件商品可能会出现多人同时购买的情况,我们可以如何做到并发安全呢?
伪代码片段1:
  1. synchronized(this){ 
  2.     long stockNum = getProductStockNum(productId); 
  3.     if(stockNum>requestBuyNum)  { 
  4.       int ret=updateSQL("update stock_main set stockNum=stockNum-"+requestBuyNum +" where productId="+productId); 
  5.         if(ret==1){ 
  6.             return "扣减成功"
  7.         }else { 
  8.             return "扣减失败"
  9.         } 
  10.     } 
  11. }
伪代码片段1的设计思想是所有的请求过来之后首先加锁,强制其串行化处理,可见其效率一定不高,
伪代码片段2:
  1. int ret=updateSQL("update stock_main set stockNum=stockNum-"+requestBuyNum +" where productId="+productId+" and stockNum>="+requestBuyNum ); 
  2. if(ret==1){ 
  3.    return "扣减成功"
  4. }else { 
  5.    return "扣减失败"
  6. }
这段代码只是在where条件里增加了and stockNum>="+requestBuyNum即可防止超卖的行为,达到了与上述伪代码1的功能
如果商品是促销品(比如参与了秒杀的商品)并发扣减的机率会更高,那么数据库的压力会更高,这个时候还可以怎么做呢 海量的用户秒杀请求,本质上是一个排序,先到先得.但是如此之多的请求,注定了有些人是抢不到的,可以在进入上述伪代码Dao层之前增加一个计数器进行控制,比如有50%的流量将直接告诉其抢购失败,
另外同一个用户,不允许多次抢购同一件商品,我们又该如何做呢
  1. public String doBuy(user,productId,productNum){ 
  2.     //用户除了第一次进入值为1,其它时候均大于1 
  3.     int tmp=redis.incr(user.getUid()+productId);  
  4.     if(tmp==1){ 
  5.         redis.expire(user.getUid()+productId,3600); //1小时后key自动销毁 
  6.         doBuy1(user,productId,productNum); 
  7.     }else
  8.         return "抢购失败"
  9.     } 
  10. }
http://blog.gainlo.co/index.php/2016/08/22/design-ecommerce-website-part/
First of all, building an eCommerce website requires things like database design, system availability, concurrency consideration and so on so forth. All of them are extremely important in today’s distributed systems. 

a common strategy of system design interview is starting with simple and basic things instead of jumping into details directly. So how would you design the basic data structure of an eCommerce website? And what about the database schema?
I’ll skip the data structure for user models as it should be quite similar to other applications. Let’s focus on the product. In the simplest scenario, we need three major objects: ProductUser and Order.
Product defines the basic model for a product in the shopping cart. Some important fields include price, the amount left, name, description, and the category. Category can be tricky here. Of course you can make it a string field in the SQL database, but a better approach is to have a Category table that contains category ID, name and maybe other information. So the each product can keep a category ID.
Order stores information about all the orders made by users. So each row contains the product ID, user ID, amount, timestamp, status and so on. So when a user proceeds to checkout, we aggregate all the entries associated with this user to display in the shopping cart (of course we should filter out items that were bought in the past).

NoSQL in eCommerce

In the above analysis, we are using a relational database like MySQL. In reality, NoSQL database can be a better choice for eCommerce website sometimes.
In case many people don’t know about NoSQL, in layman’s term, NoSQL database tries to store a bunch of things in a single row instead of multiple tables. For instance, instead of having a separate Order table, we can store all the items a user has bought in the same row of User tableAs a result, when fetching a user, not only will we get all the personal information, but also his purchase history.
Why can NoSQL be (slightly) better in this case? Let’s use Product model as an example. Suppose we are selling books. A product has category book and tons of attributes like author, publish date, version, the number of pages etc. and this SQL table may have 20 columns. That’s fine.
And now, we also want to sell laptops. So a product should also store attributes of a laptop including brand name, size, color etc.. As you can imagine, with more categories introduced, the Product table can have tons of columns. If each category has 10 attributes in average, it’s gonna be 100 columns with only 10 categories supported!
However, for NoSQL database like MongoDB, a great advantage is that it supports huge number “columns” like this. Each row can have a large number columns but not all of them are set. It’s like storing JSON object as a row (in fact, MongoDB is using something very similar called BSON). As a result, we can just store all those attributes (columns) of a product in a single row, which is exactly what NoSQL database good at.

Concurrency

Take concurrency as an example. let’s say there’s only one book left in the store and two people buy it simultaneously. Without any concurrency mechanism, it’s absolutely possible that both have bought it successfully. How do you achieve concurrency in eCommerce websites?
Let’s analyze this step by step. From what we learned from OS classes, we know that lock is the most common technique to protect common resources. Suppose both user A and B want to buy the same book. What we can do is when A fetches the data about this book, place a lock on this row so that no one else can access it. Once A finishes the purchase (decrease the amount left), we release the lock so that B can access the data. The same approach should apply for all the resources and this can solve the problem totally.
The above solution is called pessimistic concurrency control. Although it prevents all the conflicts caused by concurrency, the downside is that it’s costly. Obviously, for every data access we need to create and release a lock, which may be unnecessary most of the time.
Can we solve the problem without a lock?
Although the approach can prevent two people accessing the same data simultaneously, placing locks is quite costly. How would you solve this problem more efficiently?
Optimistic concurrency control is another way to support concurrency. The idea is very simple – instead of using locks, each process/thread can access data freely. However, before committing changes, each transaction should check if the data has the same state as it did when you last read it. In other words, you check the data in the beginning of the transaction and check again before committing to see if they are still the same.
If the data hasn’t been modified, you can safely commit it. Otherwise, roll back and try again until there’s no conflict. It’s important to compare the two approaches here. For OCC (Optimistic concurrency control), apparently it’s more efficient to read/write data unless there are conflicts. With that in mind, for systems that are unlikely to have conflicts, OCC is a better option. However, when resources are frequently accessed by multiple clients, restarting transaction becomes costly in OCC and it’s better to place locks in each transaction (PCC).
In applications like Amazon, there are so many products that it’s not that frequent to have multiple people accessing the same product simultaneously. As a result, OCC is a better choice.

Availability in eCommerce

It’s a big loss if Amazon website is down for 1 minute. To achieve high availability in distributed systems, the best way is to have hundreds or thousands of replicas so that you can tolerate many failures. However, it’s worth to note here that availability and consistency go hand in hand.
If you have many replicas, it’s definitely harder to guarantee that each replica has the same data. On the flip side, if you want to achieve high consistency, you’d better have fewer replicas, thus the system is prone to failure.
The idea here is not trying to achieve both. Instead, based on the nature of the product, you should be able to make the trade-off. For an eCommerce website, latency and down time usually means losing revenue, which can be a big number sometimes. As a result, we might care more about availability than consistency. The latter can be improved through other approaches.

Strong consistency

One approach is to force all updates to happen in the same order atomically. More specifically, when someone is updating the resource, it’s locked across all servers until all of them share the same value (after the update). As a result, if an application is built upon a system with strong consistency, it’s exactly the same as working on a single machine. Apparently, this is the most costly approach as not only is placing locks expensive, but it also blocks the system for every update.

Weak consistency

Another extreme case is that we can provide minimum curation. Every replica will see every update, however, they may be in different orders. As a result, this approach makes the update operation extremely light-weighted, but the downside is providing a minimum guarantee of consistency.
Note: we are not explaining the accurate definition of consistency model. Instead, we’d like to illustrate ideas with examples, which are more helpful for preparing system design interviews.

Eventual consistency

As you can imagine, a more practical approach lies somewhere in between. In a nutshell, the system only guarantees that every replica will have the same value eventually. At a certain period of time, data might be inconsistent. But in the long term, the system will resolve the conflict.
Let’s take Amazon’s Dynamo as an example. Basically, it’s possible that each replica holds different versions of the data at a particular time. So when the client read the data, it may get multiple versions. At this point, the client (instead of the database) is responsible to resolve all the conflicts and update them back to the server.
You might wonder how does the client resolve those conflicts. It’s mostly a product decision. Take the shopping cart as an example, it’s very important to not lose any additions since losing additions means losing revenue. So when facing different values, the client can just choose the one with most items.

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts