Wednesday, October 7, 2015

MQTT - Facebook Messenger



http://usefulstuff.io/2013/03/how-it-works-facebook-part-2/
Facebook Messages and Chat
Facebook messaging system is powered by a system called Cell. The entire messaging system (email, SMS, Facebook Chat, and the Facebook Inbox) is divided into cells, and each cell contains only a subset of users. Every Cell is composed by a cluster of application server (where business logic is defined) monitored by different ZooKeeper instances.
Application servers use a data acces layer to communicate with metadata storage, an HBase based system (old messaging infrastructure relied on Cassandra) which contains all the informations related to messages and users.
Cells are the “core” of the system. To connect them to the frontend there are different “entry points”. An MTA proxy parses mail and redirect data to the correct application. Emails are stored in the same structure than photos: Haystack. There are also discovering service to map user-to-Cell (based on hashing) and service-to-Cell (based on ZooKeeper notifications) and everything expose an API.
There is a “dirty” cache based on Memcached to serve messages (from a local cache of datacenter) and social information about the users (like social indexes).
facebook_messages_architecture
The search engine for messages is built using an inverted index stored in HBase.
Chat is based on an Epoll server developed in Erlang and accessed using Thrift and there is also a subsystem for logging chat messages (in C++). Both subsystems are clustered and partitioned for reliability and efficient failover.
Real-time presence notification is the most resource-intensive operation performed (not sending messages): keeping each online user aware of the online-idle-offline states of their friends. Real-time messaging is done using a variation of Comet, specifically XHR long polling, and/or BOSH.
Insights and Sources
http://www.hivemq.com/blog/mqtt-essentials-part2-publish-subscribe
The publish/subscribe pattern
The publish/subscribe pattern (pub/sub) is an alternative to the traditional client-server model, where a client communicates directly with an endpoint. However, Pub/Sub decouples a client, who is sending a particular message (called publisher) from another client (or more clients), who is receiving the message (called subscriber). This means that the publisher and subscriber don’t know about the existence of one another. There is a third component, called broker, which is known by both the publisher and subscriber, which filters all incoming messages and distributes them accordingly.

the main aspect in pub/sub is the decoupling of publisher and receiver, which can be differentiated in more dimensions:
  • Space decoupling: Publisher and subscriber do not need to know each other (by ip address and port for example)
  • Time decoupling: Publisher and subscriber do not need to run at the same time.
  • Synchronization decoupling: Operations on both components are not halted during publish or receiving
Pub/Sub also provides a greater scalability than the traditional client-server approach.

Message Filtering
Option 1: Subject-based filtering
Topics are in general strings with an hierarchical structure, that allow filtering based on a limited number of expression.

Option 2: Content-based filtering

MQTT uses subject-based filtering of messages. So each message contains a topic.

Distinction from Message Queues
the name MQTT comes from an IBM product called MQseries and has nothing to do with “message queue“.

A message queue stores message until they are consumed
A message will only be consumed by one client
Queues are named and must be created explicitly
A queue is far more inflexible than a topic. Before using a queue it has to be created explicitly with a separate command. Only after that it is possible to publish or consume messages. In MQTT topics are extremely flexible and can be created on the fly.

http://jpmens.net/2013/02/25/lots-of-messages-mqtt-pub-sub-and-the-mosquitto-broker/
Publish/Subscribe queues are fun and useful. I first learned about them when tinkering with Redis a while back. One big drawback of RedisPub/Sub is that Redis project refuses to add some form of transport layer security, which means anything and everything is transferred into and out of Redis unencrypted.

Clients subscribe to be notified of incoming messages pertaining to specific topics, and other clients publish on those topics. A topic (think of it as a kind of channel) classifies messages. For example, I could have topics callednagios/mtanagios/disktest/jp/private, etc. Clients can subscribe to any number of topics, and may include wild-cards when subscribing (e.g. nagios/#). In the context of MQTT, messages are blobs of opaque data (UTF-8, i.e. binary safe) with a maximum size of 256MB.

What is MQTT?


MQTT stands for MQ Telemetry Transport. It is a publish/subscribe, extremely simple and lightweight messaging protocol, designed for constrained devices and low-bandwidth, high-latency or unreliable networks. The design principles are to minimise network bandwidth and device resource requirements whilst also attempting to ensure reliability and some degree of assurance of delivery. These principles also turn out to make the protocol ideal of the emerging “machine-to-machine” (M2M) or “Internet of Things” world of connected devices, and for mobile applications where bandwidth and battery power are at a premium.
https://www.ibm.com/developerworks/community/blogs/mobileblog/entry/why_facebook_is_using_mqtt_on_mobile?lang=en
MQTT, specifically, was a great fit for the new buzzword Internet of Things, which brings all sorts of devices onto the same network.
Another highlight of the Facebook Messenger app was the possibility to have individual chat sessions between two people or a group chat, thanks to the publisher-subscriber nature of MQTT.

REST is for sleeping. MQTT is for mobile
HTTP on mobiles is a bit heavy, fragile and slow and drains batteries quickly.
Google and Apple Push
They offer no quality of service, really don’t have much in the way of guaranteed messaging. The practical result customers see is that notifications arrive quickly, late or not at all. 

Andy Stanford-Clark and Arlen Nipper invented MQTT to solve a problem they had: how to do reliable messaging over unreliable networks? 


So Arlen & Andy developed a very simple, extremely efficient publish/subscribe reliable messaging protocol and named it MQ Telemetry Transport (MQTT).   A protocol that enabled devices to open a connection, keep it open using very little power and receive events or commands with as little as 2 bytes of overhead.  A protocol that has the things built in that you need for reliable behavior in an unreliable or intermittently connected wireless environments.  Things such as “last will & testament” so all apps know immediately if a client disconnects ungracefully, “retained message” so any user re-connecting immediately gets the very latest business information, etc.
  • latency – how to get faster phone-to-phone communications
  • battery – and do that without killing batteries
  • bandwidth – or sucking up the user’s available bandwidth
at Lucy did was instead of trying to brute force the problem with HTTP or HTTP-based protocols like XMPP (which inherit all of HTTP’s issues on mobile), she adopted an obscure m2m protocol “MQTT”

MQTT is small footprint, efficient, low power on the device. 
response time = revenue, response time = business performance, and most importantly response time saves lives.
https://www.facebook.com/notes/facebook-engineering/building-facebook-messenger/10150259350998920
 With just a few weeks until launch, we ended up building a new mechanism that maintains a persistent connection to our servers. To do this without killing battery life, we used a protocol called MQTT that we had experimented with in Beluga. MQTT is specifically designed for applications like sending telemetry data to and from space probes, so it is designed to use bandwidth and batteries sparingly. By maintaining an MQTT connection and routing messages through our chat pipeline, we were able to often achieve phone-to-phone delivery in the hundreds of milliseconds, rather than multiple seconds.

http://www.slideshare.net/henriksjostrand/devmobile-2013-low-latencymessagingusingmqtt
WebSockets?
– Bi-directional data exchange
• Useful for push and streaming
data (i.e. stock-ticker, media)
– But missing key function
• A low level TCP socket; not a
full blown messaging protocol
• No assured delivery
• No built-in publish / subscribe
• Requires coding of message
handling on top of it.

Lean
 Minimized on-the-wire format
• Smallest possible packet size is 2 bytes
• No application message headers

Reliable
Three qualities of service:
• 0 – at most once delivery
• 1 – assured delivery but may be duplicated
• 2 – once and once only delivery
• In-built constructs to support loss of contact between client and server.
• “Last will and testament” to publish a message if the client goes offline.
• Stateful “roll-forward” semantics and “durable” subscriptions.

Simple
• Simple / minimal pub/sub messaging semantics
• Asynchronous (“push”) delivery
• Simple set of verbs -- connect, publish, subscribe and disconnect.

MQTT uses (much) less bandwidth than HTTP
MQTT protocol flows:
– Fixed header (2 bytes)
– Variable header (optional, length varies)
– Message payload (optional, length encoded, up to 256MB)
§ Fixed header indicates the API call, the length of the payload
and Quality of Service
§ Variable header contents depends on API call defined in the
fixed header
– Message ID, Topic name, client identifier and so on.

MQTT Topics
§ All subscriptions are to a topic space
§ All messages are published to an individual topic
§ Topic names are hierarchical
– Levels separated by “/”
– Single-level only wildcards “+” can appear anywhere in the topic string
– Multi-level (whole subtree) wildcards “#” must appear at the end of the string
• Wildcards must be next to a separator
• Can't use wildcards when publishing
§ MQTT topics can be 64KB long

MQTT Keep Alive

MQTT Last Will & Testament
§ During connection, a Will message and topic can be specified
– Abnormal disconnections will cause WMQ to publish the message
– Clean disconnects will not cause the message to publish
§ Can set the message as retained
– Message is published to a subscriber when registering
§ Useful to report the connection status of the client
– Will message is a retained “down”
– Upon connecting, client publishes a retained “up” message.

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts