Massive Technical Interviews Tips: Design Cache System

Thursday, November 12, 2015

Design Cache System

Related: http://massivetechinterview.blogspot.com/2015/11/cache-system-misc.html
http://myprogrammingpractices.blogspot.com/2015/06/from-mitbbs-sumo-logiccache-system.html
感觉设计题考cache挺常见的，大家讨论一下？设计一个cache system，要pseduo code，存储结构，API等，不要求LRU等替换策略，需要考虑concurrent的情况。要求考虑真实的使用场景，也就是这个cache system码工们用起来很方便。我给的答案就是传统的hashtable的api，加上处理miss、
需要从硬盘或者数据库load的时候，做些处理确保不重复load。

这种完全open的设计题最怕了，面试官很容易从你的解题过程中判断你的老练程度，
problem solving的思维方式，系统设计的基本原理，pro con的tradeoff，用code快速
描述的能力，等等。

http://programmers.stackexchange.com/questions/136254/what-data-structure-should-i-use-for-this-caching-strategy

If you wan to use a LRU eviction cache (Least Recently Used eviction), then probably a good combination of data structures to use is:

Circular linked list (as a priority queue)
Dictionary

This is why:

The linked list has a O(1) insertion and removal time
List nodes can be reused when the list is full and no extra allocations need to be performed.

This is how the basic algorithm should work:

The data structures

LinkedList<Node<KeyValuePair<Input,Double>>> list;Dictionary<Input,Node<KeyValuePair<Input,Double>>> dict;

Input is received
If the dictionary contains the key
- return the value stored in the node and move the node to the beginning of the list
If the dictionary does not contain the key
- compute the value
- store the value in the last node of the list
- if the last not has a value, remove the previous key from the dictionary
- move the last node to first position.
- store in the dictionary the (input, node) key value pair.

LinkedHashMpa
Guava Cache

2. 设计带历史记录的哈希表。对于同一个key下出现过的多个value都记录，每个value
都加个timestamp。查找时get（key， ts），输出value，其时间戳是在ts或者ts之前
最近的。

https://www.quora.com/What-are-good-ways-to-design-cache-system-in-website

How often the data on the website is updated?

How frequently the data needs to be refreshed? Is it acceptable for old data to be displayed? for how long?

What is the expected amount of traffic to the website? Is the data displayed publicly or is it a private system?

What are the performance bottlenecks? heavy database queries? waiting for remote soap services?

https://cseweb.ucsd.edu/classes/fa14/cse240A-a/pdf/08/CSE240A-MBT-L15-Cache.ppt.pdf

缓存架构设计细节二三事

（1）淘汰缓存是一种通用的缓存处理方式

（2）先淘汰缓存，再写数据库的时序是毋庸置疑的

（3）服务化是向业务方屏蔽底层数据库与缓存复杂性的一种通用方式

主流优化方案是服务化：加入一个服务层，向上游提供帅气的数据访问接口，向上游屏蔽底层数据存储的细节，这样业务线不需要关注数据是来自于cache还是DB。

非主流方案是异步缓存更新：业务线所有的写操作都走数据库，所有的读操作都总缓存，由一个异步的工具来做数据库与缓存之间数据的同步，具体细节是：

（1）要有一个init cache的过程，将需要缓存的数据全量写入cache

（2）如果DB有写操作，异步更新程序读取binlog，更新cache

在（1）和（2）的合作下，cache中有全部的数据，这样：

（a）业务线读cache，一定能够hit（很短的时间内，可能有脏数据），无需关注数据库

（b）业务线写DB，cache中能得到异步更新，无需关注缓存

这样将大大简化业务线的调用逻辑，存在的缺点是，如果缓存的数据业务逻辑比较复杂，async-update异步更新的逻辑可能也会比较复杂。

Thursday, November 12, 2015

Design Cache System

Labels

Popular Posts