Monday, August 24, 2015

Design Decisions for Scaling Your High Traffic Feeds - High Scalability -



Design Decisions for Scaling Your High Traffic Feeds - High Scalability -
https://github.com/tschellenbach/Feedly
There are two strategies for building feed systems:
  1. Pull, where the feed is gathered during reads
  2. Push where all the feeds are pre computed during the writes.
Most real live applications will use a combination of these two approaches. The process of pushing an activity to all your followers is called a fanout.
The first version worked on a PostgreSQL database, the second used Redis and the third and current version runs on Cassandra.
Part One - The Database
select * from love where user_id in (...)
It took some database tweaking but this simple system held up well into ~100M loves and 1M users. Around that time the performance of this solution started to fluctuate. In general it kept on working, but for some users the latency spiked to multiple seconds.

Part Two - Redis & Feedly

Our second approach stored a feed for every user in Redis. When you loved an item this activity was fanned out to all your followers.

Redis was a good solution, but several reasons made us look for an alternative. Firstly we wanted to support multiple content types, which made falling back to the database harder and increased our storage requirements. In addition the database fallback we were still relying on became slower as our community grew. Both these problems could be addressed by storing more data in Redis, but doing so was prohibitively expensive.

Part Three - Cassandra & Feedly

We briefly looked at HBase, DynamoDB and Cassandra 2.0. Eventually we opted for Cassandra since it has few moving parts, is used by Instagram and is supported by Datastax. Fashiolista currently does a full push flow for the flat feed and a combination between push and pull for the aggregated feed. We store a maximum of 3600 activities in your feed, which currently takes up 2.12TB of storage. The fluctuations caused by high profile users are mitigated using priority queues, overcapacity and auto scaling.

Feed Design

1. Denormalize Vs Normalized

There are two approaches you can choose here. The feed with the activities by people you follow either contains the ids of the activities (normalized) or the full activity (denormalized).


Storing only the id vastly reduces your memory usage. However it also means another trip to your data store every time you load the feed. One factor to consider is how often you copy the data when denormalizing. It makes a huge difference if you are building a notification system or a news feed system. For a notification you usually notify 1 or 2 users for every action which occurs. However with a follow based feed systems the action might get copied to thousands of followers.

Furthermore the best choice really depends on your storage backend. With Redis you need to be careful about memory usage. Cassandra on the other hand has plenty of storage space, but is quite hard to use if you normalize your data.

For notification feeds and feeds built on Cassandra we recommend denormalizing your data. For feeds built on Redis you want to minimize your memory usage and keep your data normalized. Feedly allows you to pick which approach you prefer.

2. Selective Fanout Based On Producer
In their paper Yahoo’s Adam Silberstein et.al. argue for a selective approach for pushing to users feeds. A similar approach is currently used by Twitter. The basic idea is that doing fan-outs for high profile users can cause a high and sudden load on your systems

This means you need a lot of spare capacity on standby to keep things real time (or be ok with waiting for autoscaling to kick in). In their paper they suggest reducing the load caused by these high profile users by selectively disabling fanouts. Twitter has apparently seen great performance improvements by disabling fanout for high profile users and instead loading their tweets during reads (pull).

3. Selective Fanout Based On Consumer
Another possibility of selective fanouts is to only fan-out to your active users. (Say users who logged in during the last week). At Fashiolista we used a modified version of this idea, by storing the last 3600 activities for active users, but only 180 activities for inactive ones. After those 180 items we would fallback to the database. This setup slows down the experience for inactive users returning to your site, but can really reduce your memory usage and costs.
Redis Vs Cassandra
Redis however has a few limitations. All of your data needs to be stored in RAM which eventually becomes expensive. 

In addition there is no support for sharding built into Redis. This means that you have to roll your own system for sharding across nodes. (Twemproxyis a great option for this). Sharding across nodes is quite easy, but moving data when you add or remove nodes is a pain. You can work around these limitations by using Redis as a cache and falling back to your database. As soon as it becomes hard to fallback to the database I would consider moving from Redis to Cassandra.
Read full article from Design Decisions for Scaling Your High Traffic Feeds - High Scalability -

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts