Thursday, February 11, 2016

Blue Ocean: Messages delivery pipeline usign Apache Kafka, Solr and Velocity



Blue Ocean: Messages delivery pipeline usign Apache Kafka, Solr and Velocity
There are many messaging frameworks like Websphere MQ, RabbitMQ and Kafka. One of the benefits of Kafka is the messaging persistence for configurable period due to its disk-based message storage. See Kafka website for nice introduction about Kafka.

Recently I work on a high-throughput message delivery pipeline which allows multiple consumers listen to various Kafka topic streams, and filters message based on configurable rules and delivers the messages to various endpoints such as file system, email, web services and etc.

The following diagram illustrates the high-level components in the pipeline. Its core job consists of MessageLisenter: a Kafka consumer listen to a topic, RuleEvaluator: process the polled data stream against velocity template based rules and MessageDeliver: deliver the message to one endpoint. If we define subscription as a kafka topic, rules and an endpoint. Then pipeline is a thread pool which contains many subscription-based process jobs or threads.

A Spring task scheduler is used to pull all of subscriptions from an object store and create thread if necessary.
The pipeline is a state machine goes through various states such as message_acquired, message_rule_matched, message_delivered and etc. We also use Apache Solr to index the job related data and its status. This allows us to build a job monitoring UI to query Solr and display job status to users.

Another two important features are the message delivery retry and replay capability. Basically when message goes through the steps in the pipeline, it can fail at any points such as failing to deliver or rule engine system failure. In failing to deliver case we will retry in an exponentially increased interval interval until giving up at some point. Then once developer fixes any issue associated with the failure, he/she can send relay message and pipeline can replay it.

Circuit breakers are also implemented to allow us throttle the various network-intensive requests if there are any system failure.  See Martin Fowler's blog for introduction to circuit breaker.
Read full article from Blue Ocean: Messages delivery pipeline usign Apache Kafka, Solr and Velocity

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts