Tuesday, February 13, 2018

Design Chat



Related: http://massivetechinterview.blogspot.com/2015/07/design-chat-server-hello-world.html
Videos
https://www.youtube.com/watch?v=zKPNUMkwOJE&t=2s

https://www.pubnub.com/tutorials/angularjs/chat-typing-indicators/
http://www.slate.com/articles/technology/bitwise/2014/02/typing_indicator_in_chat_i_built_it_and_i_m_not_sorry.html


This “keep-alive” method of notification caused a lot of traffic. At one point we estimated that something like 95 percent of all message traffic through the Messenger servers consisted of those typing messages. Only 5 percent of messages actually contained what users actually had typed.  (And unlike with most programs today, users had the option to turn the typing indicator off.)
Facebook goes one further and tells you whether or not your friend has seenyour last message, letting you know exactly when you can start worrying about why she hasn’t responded to you yet.


http://highscalability.com/blog/2014/1/6/how-hipchat-stores-and-indexes-billions-of-messages-using-el.html
That kind of growth puts a lot of pressure on a once adequate infrastructure. HipChat exhibited a common scaling pattern. Start simple, experience traffic spikes, and then think what do we do now? Using bigger computers is usually the first and best answer. And they did that. That gave them some breathing room to figure out what to do next. On AWS, after a certain inflection point, you start going Cloud Native, that is, scaling horizontally. And that’s what they did.
But there's a twist to the story. Security concerns have driven the development of an on-premises version of HipChat in addition to its cloud/SaaS version.
  • Code Deployment: Capistrano
  • Monitoring: Sensu and monit pumping alerts to Pagerduty

  • Chat messages don't actually make up the majority of traffic ; it's presence information (away, idle, available), people connecting/disconnecting, etc. So 60 messages per second may seem low, but it is just an average.
  • HipChat wants to be your notification center, where you go to collaborate with your team and get all the information coming from your tools and other systems. Helps keeps everyone in the loop. Especially with remote offices.
  • The big reason to use HipChat over say IRC, is that HipChat stores and indexes every conversation so you can search for them later. Search is emphasized so you can stay inside HipChat. The win for teams of this feature is that you can go back at any time and remember what happened and what you agreed to do. It will also route messages to multiple devices owned by the same user, as well as temporary message caching/retry if your device is unreachable when a message is sent.
https://developer.atlassian.com/server/hipchat/
Started with Adobe Air on the client side, it leaked memory and would take down machines. So moved to native apps

HipChat is based on XMPP, a message is anything inside an XMPP stanza, which could be anything from a line of text or long section of log output

HipChat is based on XMPP, a message is anything inside an XMPP stanza, which could be anything from a line of text or long section of log output

This is a real problem as not losing messages is a high priority. Customers indicate that not dropping messages is more important than low latency. Users would rather get messages late than not at all.

  • As a first pass they split out front-end servers and app servers. Proxies handle connections and the backend apps processes stanzas. The number of frontend servers is driven by the number of active listening clients and not the number of messages sent. It's a challenge to keep so many connections open while providing timely service.
  • After fixing the datastore issues the plan is to look into how to optimize the connection management. Twisted works well, but they have a lot of connections, so have to figure out how to handle that better

Thought Redis would be the failure point. Thought Couch/Lucene would be good enough. Didn’t do proper capacity planning and looking at message growth rate. Growing faster than they thought and shouldn’t have focussed so much on Redis and focused on data storage instead

  • Amazon’s flakiness makes you develop better. Don’t put all your eggs in one basket, if a node goes down you have deal with it or traffic will be lost for some users.
  • Uses a dynamic model. Can lose an instance quickly and bring up new instances. Cloud native type stuff. Kill a node at any time. Kill a Redis master. Recover in 5 minutes. Split across all 4 US-East availability zones currently, but not yet multi-region.
  • EBS only let’s you have 1TB of data. Didn’t know about this limit when they ran into it. With Couch they ran into problems with EBS disk size limitations. HipChat’s data was .5 terabytes. To do compaction Couch had to copy data into a compaction file which doubles the space. On a 2 TB RAID hitting limits during compaction on the weekends. Didn’t want a RAID solution

Can’t consider a full SaaS solution as that would be a lockin.

Moved to ElasticSearch as their storage and search backend because it can eat all the data they can feed it, it is highly available, it scales transparently by simply adding more nodes, it is multi-tenant, it can transparently with sharding and replication handle a node loss

Test out caching. ES can cache filter results, it is very fast, but you need a lot of heap space. With 22 gigs of heap on 8 boxes, memory was exhausted with caching turned on. So turn off cache unless planned for

  • Ran into this problem with ElasticSearch. Originally had 6 ES node running as master electable. A node would run out of memory or hit a GC pause and on top of that a network loss. Then others could no longer see the master, hold an election, and declare itself a master. Flaw in their election architecture that they don’t need a quorum.  So there’s a brain split. Two masters. Caused a lot of problems.
  • Solution is to run ElasticSearch masters on dedicated nodes. That’s all they do is be the master. Since then there have been no problems. Masters handle how shard allocation is done, who is primary, and maps where replica shards are located.  Makes rebalancing much easier because the masters can handle all the rebalancing with excellent performance. Can query from any node and will do the internal routing itself
- SPOF?
  • Uses a month index. Every month is a separate index. Each primary index has 8 shards then two replicas on that. If one node is lost the system still functions.
  • Not moving RDS into ES. Stuff they need SQL for is staying in RDS/MariaDB, typicall User management data.
  • A lot of cache in Redis in a master/slave setup until Redis Cluster is released. There’s a Redis stat server, who is in a room, who is offline. Redis history caches the last 75 messages to prevent constantly hitting the database when loading conversation for the first time. Also status for internal status or quick data, like the number of people logged in
  • Bamboo is used for continuous integration.
  • Moving into voice, private 1-1 video, audio chat, basic conferencing.
  • Might use RabbitMQ for messaging in the future


http://highscalability.com/blog/2014/10/13/how-league-of-legends-scaled-chat-to-70-million-players-it-t.html
Michal structures the talk in an interesting way, using as a template the expression: Make it work. Make it right. Make it fast.
Making it work meant starting with XMPP as a base for chat. WhatsApp followed the same strategy. Out of the box you get something that works and scales well...until the user count really jumps. To make it right and fast, like WhatsApp, League of Legends found themselves customizing the Erlang VM. Adding lots of monitoring capabilities and performance optimizations to remove the bottlenecks that kill performance at scale.
Perhaps the most interesting part of their chat architecture is the use of Riak’s CRDTs(convergent  replicated data types) to achieve their goal of a shared nothing fueled massively linear horizontal scalability
  • REST APIs provide chat as a backend service for other LoL services. The store talks to chat to verify friendship, for example. Leagues use the chat social graph to group new players together so they can play more often and compete with each other.
Erlang has many benefits. It’s built with concurrency, distribution and scalability in mind. It supports hot code reloading so bugs can be patched without stopping a service.

  • Let it crash. Don’t try to slowly recover from a major failure. Instead, restart from a known state. For example, if there’s a large backlog of pending queries to the database the database is restarted. All the new queries are processed in real-time while the queued up queries will be rescheduled for processing.

  • Riak servers use multi datacenter replication to export persistent data to a secondary Riak cluster. Costly ETL queries, like social graph queries, are run on the secondary cluster without interrupting the primary cluster. Backups are also run off the secondary cluster.
  • Over time, with the necessity to focus on scalability, performance, and fault tolerance, most of Ejabberd was rewritten.
    • Rewrote to match their requirements. For example, in LoL friendship is bidirectional only, XMPP allows asymmetrical friendship. XMPP friendship creation required about 16 messages between client and servers, which was a hit on the database.  The new protocol is three messages.
    • Removed unnecessary and unwanted code.
    • Optimised the protocol itself.
    • Wrote a lot of tests to make sure nothing was broken.
  • Profile code to remove clear bottlenecks.
  • Avoid shared mutable state so it can scale linearly on a single server as well as in a clustered environment.
    • Multi User Chat (MUC). Every chat server can handle hundreds of thousands of connections. For every user connection there’s a session process. Every time a user wanted to update their presence or send a message to a room that event had to go to a single process called MUC router. It would then relay messages to the relevant group chats. This was a clear bottleneck. The solution was to parallelize the routing. Now lookup for a group chat room happens in the user session. Able to use all available cores

Every Ejabberd server contains a copy of the session table, which contains a mapping between user IDs and sessions. Sending a message requires looking where the user’s session is in the cluster. Messages were written to the session table. By checking if the session exists, checking presence priority, and some other checks the number of distributed writes was reduced by 96%. Huge win. More users could login much faster and presence updates could occur much more frequently


  • Had to spent a lot of work in the chat server to implement eventual consistency. Implemented a Ejabberd CRDT library (convergent replicated data types). Takes care of all the write conflicts. Tries to converge objects to a stable state.
  • How does CRDT work? Instead of appending a new player directly to a friends lists an operational log is maintained for the objects. In the log are entries like “Add Player 1” and “Add Player 2”. Next time the object read the log is consulted and any conflicts are resolved. Then the logs are applied in any order to the object because order doesn’t matter. This way the friend’s list is in a consistent state. The idea is the value is updated in place, instead a long log of operations is built for the objects and the operations are applied whenever the object is read.
  • Built over 500 real-time counters that are collected every minute and posted into a monitoring system (Graphite, Zabbix, Nagios).
  • Counters have thresholds which generate alerts when crossed. Problems can be addressed long before players notice any service problems.
  • For example, a recent client update entered an infinite loop of  broadcasting its own presence. Looking at Graphite it was immediately clear chat servers were hammered with presence updates that began with the new client release.

Implementation Feature Toggles (Feature Flags)

Partial deployments. New code can be enabled only for certain users or a percentage of users can have the new code activated. This allows testing potentially dangerous features at far less than full load. If the new feature works it can be turned on for everyone

Code Reloading On The Fly

  • One of the great features of Erlang is the ability to hot load new code on the fly.
  • In one case third party clients (like pidgin) were not well tested and it turned out they were sending different kinds of events than the official client. The fixes could be deployed and integrated into the chat servers without having to restart the whole chat. This means lower downtime for players
  • Built-in the ability to enter debug mode for selected user sessions. If there’s a suspicious user or an experimental user (QA testing on production servers), even though there are 100,000 sessions on a chat server only a particular session need be logged. Logging includes XML traffic, events, and metrics. This saves huge on log storage.
  • With a combination of feature toggles, partial deployment, and selective logging it’s possible to deploy a feature to production servers so it can be tested only by a few people. The relevant logs can be collected and analyzed without noise from all users

Load Testing Code

  • Every night an automatic verification system deploys a build of all changes to a load test environment and runs a battery of load tests.
  • Server health is monitored during the tests. Metrics are pulled and analyzed. A Confluence page is generated with all the metrics and test results. An email summary is sent with a summary of the test results.
  • Changes can be compared to the previous build so it’s possible to track of how code changes impact tests, finding problems like it was a disaster or changes like memory consumption was improved by X percent
  • Would like to use the social graph to make the experience better. Analyze player connections and understand how that impacts enjoyment in the game.
  • Plan to migrate in-game chat to the out-of-game chat servers.
  • Scale surfaces bugs. Even if a bug only happens once in a billion times that means at the scale of League of Legends the bug will occur once a day. Even unlikely events will happen over time.
  • Key to success is understanding what the system is actually doing. Know if your system is in a healthy state or about to crash.
  • Have a strategy.  LoL has a strategy of horizontal scaling their chat service. To support their strategy they did something different. They bought into not just NoSQL with Riak, but they changed their approach to leverage CRDTs to make horizontal scaling as seamless and powerful as possible.
  • Make it work. Start somewhere and evolve. Ejabberd got them off the ground. Would it have been easier to make a from scratch? Maybe, but they were able to evolve a system to match their requirements as they learned what those requirements were.
  • Make it visible. Add tracing, logging, alerting, monitoring, graphs, and all that good stuff.
  • Make it DevOps. LoL added transactions to software updates, feature flags, hot updates, automated load testing, highly configurable log levels, etc. to make the system easier to manage.
  • Reduce chatty protocols. Tailor functionality to what your system needs. If your system only supported bidirectional friendships, for example, then you don’t a more general and costly protocol.
  • Avoid shared mutable state. A common tactic, but it’s always interesting to see how shared state causes more and more problems at each step up in scale.
  • Leverage your social graph. A chat service naturally provides a social graph. That information can be used to both improve user experience and implement novel new features.
https://engineering.riotgames.com/news/chat-service-architecture-protocol
The three backend components are the following:
  • The Extensible Messaging and Presence Protocol (XMPP) used for client/server communications and extended internally to meet League of Legends’ specific needs.
  • Chat servers written in Erlang and C used to communicate with clients, initially based on the open source version of ejabberd and rewritten over the past few years.
  • A persistent data store used for the social graph, ignore lists, offline messages, message history, and other features.
We took six factors into consideration when choosing a protocol:
  • Support for 1:1 messages, group chats, and offline messages. (These fundamental concepts support the development of player experiences like private messages, game lobby conversations, and pre- and post-game chats without the need to reinvent the wheel.)
  • Built-in presence mechanisms. (These notify players of their friends’ availability.)
  • Open specifications for transparency and auditability; a wide review scope;and, allowance for Riot-specific tuning.
  • A proven security model. (This ensures we can protect the privacy of players.)
  • An extensibility model. (This allows for custom behaviors.)
  • Availability of open source projects. (This accelerates development and delivery of chat to players.)
For instance, in order to broadcast availability to friends, the client sends:
<presence type=’available’/>
To ask the server to fetch a full friends list:
<iq type=’get’ id=’roster1234’>
 <query xmlns=’jabber:iq:roster’/>
</iq>
And to message a friend:
<message to=’sum1234@pvp.net’>
  <body>Hey there, wanna play?</body>
</message>

A fuller description and a comprehensive list of XEPs can be found here: http://xmpp.org/xmpp-protocols/xmpp-extensions/.

XMPP EXTENSION #1: FRIEND ROSTER NOTES

According to the RFC 6121, an item element represents each friend on the roster. Contacts can have multiple parameters, such as a summoner name (name attribute), internal Jabber identifier or JID (jid attribute) used for internal routing, and their group (group sub-element):
<item jid="sum9876@pvp.net" name="0xDEADB33F">
  <group>General</group>
</item>

We introduced an additional optional sub-element note that holds short, contact-specific text written by the player:

<item jid="sum9876@pvp.net" name="0xDEADB33F">
  <group>General</group>
  <note>Versatile dude, plays only ranked, not a big fan of ARAMs</note>
</item>

we ensure that the notes stay encrypted on disk using strong cryptography algorithms. We encrypt payloads — before persisting in a data store — with AES-CTR-128, while we use HMAC-SHA1 for signing the data. We also implemented support for rotating encryption keys should we decide to update them in the future.

XMPP EXTENSION #2: INCREMENTAL IGNORE LIST UPDATES


A fairly complex XEP describing so-called privacy lists management is found here: http://xmpp.org/extensions/xep-0016.html. However, it only provides get/set semantics. According to the extension, we can’t modify the existing privacy list by just adding or removing an item. Every change I make requires uploading the whole list to the chat server. Since ignore lists can grow relatively large, transmitting them back and forth between server and client is both costly and useless
In order to add an item to the privacy list, clients issue:
<add name=’LOL’>
  <item value=’sum1234@pvp.net’ action=’deny’ order=’1’ type=’jid’ />
</add>
And to unblock someone:
<remove name=’LOL’>
  <item value=’sum1234@pvp.net’ />
</remove>


PROTOCOL COMPATIBILITY

https://engineering.riotgames.com/news/chat-service-architecture-servers
These servers manage individual players’ chat sessions, and also apply all necessary stability and security validations like privacy settings, traffic rate limiting, metrics collection, and logging.
We deploy chat services on a per-region basis (we call them ‘shards’), meaning each League of Legendsregion has its own chat cluster that provides features only to players assigned to that shard. As a result players can’t communicate between shards, and chat servers can’t use data from other regions. For example, North America servers can’t directly communicate with Europe West servers.
Although we could run chat on fewer machines in every region, we prefer to maintain comfortable headroom for fault tolerance and to accommodate future growth. If upgrades require a shutdown, for example, we can shut down half of the nodes in a single cluster without interrupting service to players.

Before developing the components of the server written in C, we spent a lot of time profiling and optimizing the existing Erlang codebase. We tried to find potential concurrency and efficiency bottlenecks using a full battery of existing tools, operating at different abstraction levels:
https://engineering.riotgames.com/news/chat-service-architecture-persistence
I trust that Riot chat servers will persist information about my account—for example, the roster of my friends, metadata about a certain buddy, or the list of players that I have blocked. Furthermore, that data should be available to me at any time, anywhere, and stored securely so that only I can access it. 

A few years ago, before the size of the LoL player base really started taking off, we made the decision to build chat servers using an open source XMPP server implementation: ejabberd. (You can read more about the protocol and server-side implementation in previous posts of mine.) In the past ejabberd offered only a few persistence layers: you could either use mnesia or an ODBC backend (which meant MSSQL, PostgreSQL, or MySQL servers).

  • Friends lists (aka rosters) and metadata about your friends such as notes, groups, friendship status, creation date, etc.
  • Privacy lists: a register of blocked players whom you don’t want to hear from.
  • Offline messages: all chat messages that were sent to you while you were offline, and which should be delivered as soon as you log back in.
We built each shard to have three MySQL servers: one instance each for master, backup, and ETL (Extract, Transform, Load). The master is responsible for handling reads and writes from the chat servers and replicating its data to both the backup and ETL servers. The backup instance is there to (surprise!) snapshot data and be ready to act as the master server in the case of either maintenance or failure of the original master. Finally, the ETL server is used for data analysis and can be deployed on slightly less performant hardware.
  • The MySQL master needed to be scaled vertically—in order to achieve more capacity, we had to continually add more memory plus extremely expensive storage (e.g. FusionIO). This would have become a bottleneck to shipping value to players in the form of new features, and that’s something Riot takes extremely seriously.
  • The master also acted as a single point of failure in our system. Even small glitches in its performance resulted in timeouts when players would load their friends list. Despite having a backup server, transient problems that did not trigger failover were enough to impact the overall system. Major outages (caused by software or hardware) often snowballed into service downtime and significant manual work to resolve the issue.
  • As the data set started to grow, schema migrations of any kind became extremely costly. Their application required careful planning, extreme diligence, and often hours of scheduled chat downtime. This significantly slowed development and delayed the release of features to players.
We evaluated several possible solutions to the problems listed above. Options included application-level sharding of MySQL, MySQL ClusterCassandra, and Riak. We encountered difficulties in quickly recovering from synchronous replication issues while evaluating both sharded and clustered MySQL. Cassandra was certainly a viable choice, but it imposed schema constraints that we really wanted to avoid. The last option - Riak - proved to be a flexible, scalable, and fault tolerant data store and we decided to proceed with it.
Typically NoSQL databases put engineers in a much more relaxed environment: schema-less interfaces allow for faster iteration; horizontal scalability removes the burden of many scale problems; and data is protected by internal replication. On a very high level, Riak is a distributed key-value NoSQL database that can be scaled horizontally in linear fashion and also provides AP (availability and partition tolerance) semantics as defined by the CAP theorem.


https://cheesecakelabs.com/blog/simple-chat-architecture-mvp/
With the database ready, we created all the endpoints needed to communicate with the Front-end. Our API became very simple, just two endpoints: /chatsand /chats/:idChat/messageshaving the following HTTP actions:
  • POST: /api/v1/chats/ — Creates a new chat between two users.
  • GET: /api/v1/chats/ — Lists all chats the logged-in user belongs to.
  • POST: /api/v1/chats/:idChat/messages/ — Adds a new message to a specific chat.
  • GET: /api/v1/chats/:idChat/messages/ — Lists all interactions between users on a specific chat.
Thinking about real-time







In order to evolve the application into a real-time architecture, we needed to add two key pieces: a data structure featuring publish/subscription and a WebSocket server.
For Pub/Sub, we chose Redis – an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker
To listen to the Redis publications and create a WebSocket server, we decided to create a microservice using NodeJS. When initialized, it creates a WebSocket using the socket.io library, subscribing as a Redis listener.
The microservice works like a messaging router. After receiving messages from Redis, it checks if the recipient is connected via WebSockets and forwards the content. Messages sent to unconnected recipients are discarded, but, when users log in to chat, they will receive messages through HTTP requests.

Now it’s time to fix the page refresh issue, making our conversation more fluid. For this, we needed a bidirectional communication between the server and the client. According to our architecture, now we just need to add the socket.iolibrary on the Front-end side and connect it to microservice. With this, we were able to have a permanent connection between the client and the server, allowing them to receive and send messages at any moment.
In the Back-end side, we added two things: the connection with Redis and the publication of the message after it has been saved in the database. For the Front-end structure, we just created a new action called ADD_MESSAGE_FROM_SOCKET and the  <WebSocket /> component.
The action has the simple responsibility of adding the messages received in the correct store. The component took the role of creating/removing the connection to the WebSocket and listening to the port. When it receives a new message via the WebSocket, it just calls the action ADD_MESSAGE_FROM_SOCKET.
Some important decisions were made:
  • The connection from the WebSocket to the Client should not be responsibility of the web server.
  • The messages should be saved in PostgreSQL and published in Redis.
  • The client could only receive new messages via WebSockets, the insertion of new messages would still occur via HTTP.
https://www.quora.com/What-is-a-better-choice-for-a-chat-application-Redis-pubsub-or-RabbitMQ-ZMQ-Why
Redis is a database, not primarily a queue. You will likely run into limitations, but it might make sense if the chat is simple and you're storing other stuff in it.

RabbitMQ is a very nice message broker. It does one thing and does it well.
If you don't do crazy things, it'll be simple to use for chat.


redis - REmote DIctionary Server
Redis Pub/Sub allows a publisher (sender) send a message to a channel without knowing if there is any interested subscriber (receiver). Also, a subscriber expresses interest in a channel to receive messages without any knowledge of a publisher.

Publishers and Subscribers are decoupled to make the process very fast and improve scalability since both subscribers and publishers are not aware of each other.
It is not capable of persistence, which means messages are not saved or cached. Once a subscriber misses a message, there is no way it can get the message again.

https://medium.com/always-be-coding/scaling-secret-real-time-chat-d8589f8f0c9b
  • The experience should be fast, reliable and simple
  • The user shouldn’t need to “pull-to-refresh” to get new messages
























Then, there are application-specific questions to answer:
  • How many users can chat together?
  • Can the user send photos or videos?
  • Are the chats permanent or ephemeral?
  • Should the user know when the recipient has read the message (read receipts)?
  • Should the user know when the other person is typing a message?
  • Should the user know when the other person is currently viewing the chat (presence)?

  1. We didn’t want private conversations to cannibalize public discussions
  2. We didn’t have a strong case for how it should work or why it was important to the overall product
  • Chat should be ephemeral. We didn’t want people feeling like their private discussions lasted forever (they lasted 24 hours since the last exchange)
  • Users should be able to send (ephemeral) photos
  • It should feel like a conversation and mimic the real-world as much as possible. Indicate the person is present (someone looking at you when you talk), indicate the person has read your message (acknowledgment or nodding along), indicate the person is typing (someone is talking and you should pay attention)

  1. Use a 3rd-party chat service (e.g., Layer)
  2. Use a 3rd-party websocket implementation (e.g., Pusher)
  3. Roll our own websocket-based protocol backed by our servers running on AWS or GCE (we were almost entirely hosted in App Engine at the time)
Chat needs to be fast and it’s not good enough to do polling (kill your servers with requests and your users with slowness), so we needed persistent connections (e.g., websocket).
We quickly discarded #1 as an option (Layer), because we wanted to control our own user data and we wanted full control over the stack and the experience (admittedly, we didn’t dive much into Layer’s entire offering, but we felt that rolling our own would the be the fastest path).
When a user entered a chat room a private, presence channel was created and connection established with pusher. This let the user receive notifications from Pusher for the duration of being in the chat room. It was destroyed when the user left or backgrounded the app. Luckily, there were 3rd party Pusher protocol libraries (albeit we had to modify them) available for iOS and Android, which sped up development.
Client -> Server -> Pusher -> Client






When the user enters a chat room, an idempotent ID is created on the server that is effectively “<user1_id>:<user2_id>:<secret_id>”, also known as the chat session id. Important note: user ids in this key were always sorted (partial-ordering), guaranteeing idempotency. That way, given a pair of users and a secret, we can always generate the single ID for that secret.

For fast queries for things like showing the user all of their ongoing or previous chats, we created indexes on the participant and created time properties, allowing fast answers for things like “Fetch chats user X is a participant in sorted time in descending order” and locally sorted by last update time. Because chats only lasted 24 hours since the last message exchange, we knew the number of chats would be a reasonably small number to fetch and sort locally on the server (e.g., < 100 in almost every case). If they exceeded that, chances are the user was a bad actor and we could drop some on the floor.

  • Start small and be ready to build a throw-away prototype to help force a decision
  • It’s often unwise to roll your own implementation, no matter how fun it might be (obviously, but we all keep doing it!)


https://www.interviewbit.com/problems/design-messenger/#

https://hackernoon.com/scaling-websockets-9a31497af051
On the other hand, WebSockets differ from HTTP requests in the sense that they are persistent. The WebSocket client opens up a connection to the server and reuses it. On this long running connection, both the server and client can publish and respond to events. This concept is called a duplex connection. A connection can be opened through a load balancer, but once the connection is opened, it stays with the same server until it’s closed or interrupted.
This in turn means that the interaction is stateful; that you will end up storing at least some data in memory on the WebSocket server for each open client connection. For example, you’ll probably be aware which user is on the client-side of the socket and what kind of data the user is interested in.

Web socket implementations like socket.io have the concept of channels. Think of it as an address where clients subscribe to, and either a service or other clients publishes to.

You need a Publish/Subscribe broker
With one server, it’s actually quite easy to build a pub/sub service with just WebSockets. This is because on one server the server will be aware of all clients and what data the clients are interested in.
Think about our example app. When a client sends through coordinates for a drawing, we just find the correct channel for the drawing and publish the updates made to the drawing to that channel. All the clients are connected to the one server, so they all get notified of the change. It’s kind of like in-memory pub/sub.

When building an app like this, you would probably have a database in the mix already, even before you start thinking about scaling. You wouldn’t just trust connected clients to store all the data for all the drawings would you. No, you’ll want to persist the drawing data as it comes in from the clients, so that you can serve up drawing data anytime a user opens up a drawing.

One of the users connected to WS1 draws something on the whiteboard. In your WebSocket server logic, you write to the database, to ensure that the changes have been persisted, and then publish to a channel based on a unique identifier associated to the drawing, most probably based on the database id for the drawing. Let’s say that the channel name in this case is drawing_abc123.
At this point, you have the data written away safely in the DB and you have published an event to your pub/sub broker (Redis channel) notifying other interested parties that there is new data.
Because you have users connected to the other WebSocket servers (WS2, WS3), interested in the same drawing, they will have open subscriptions to Redis on the drawing_abc123 channel. They get notified of the event, and each of the servers queries the DB for updates and emit it on the WebSocket channel used on your WebSocket tier.
So you see, the pub/sub broker is used to allow you to expose a pub/sub model with a scaled-out WebSocket cluster.

Handling failover
When a client is connected to a WebSocket server, and that server falls over, the client can open a connection through the load balancer to another WebSocket server. The new WebSocket server will just ensure that there is a subscription to the pub/sub broker for the data that the WebSocket client is interested in and start piping through changes on the WebSocket when they occur.

Working with deltas
One thing to take into consideration when a client reconnects is making the client intelligent enough that it sends through some sort of data synchronization offset, probably in the form of a timestamp, so that the server doesn’t send it all the data again.
If every update to the drawing in question is time stamped, the clients can easily store the latest timestamp that they received. When the client loses the connection to a particular server, it can just reconnect to your websocket cluster (through your load balancer) by passing in the last timestamp that it received and that way the query to the DB can be built up so that it’ll only return updates that occur after the client last successfully received updates.
In loads of applications it might not be that important to worry about duplicates going down to the client. But even then, it might be a good idea to use a timestamp approach to save both your resources and the bandwidth of your users.
When you scale out, you need a way for the web socket services to subscribe to changed data, because changes to said data will also originate from other servers than itself. A Database that supports live queries is perfect for this purpose, for example RethinkDB. That way you have only WebSockets and your DB. That said, you might already be using a pub/sub capable technology (Redis, RabbitMQ, Kafka) in your environment, and it’ll be a much easier sell than introducing a new DB technology to the mix.


Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts