Saturday, April 13, 2019

Big Data Algorithms and Data Structures



https://puncsky.com/hacking-the-software-engineer-interview



Bloom Filter

A Bloom filter is a data structure used to detect whether an element is in a set in a time and space efficient way.
False positive matches are possible, but false negatives are not – in other words, a query returns either “possibly in set” or “definitely not in set”. Elements can be added to the set, but not removed (though this can be addressed with a “counting” bloom filter); the more elements that are added to the set, the larger the probability of false positives.
Usecases
  • Cassandra uses Bloom filters to determine whether an SSTable has data for a particular row.
  • An HBase Bloom Filter is an efficient mechanism to test whether a StoreFile contains a specific row or row-col cell.
  • A website’s anti-fraud system can use bloom filters to reject banned users effectively.
  • The Google Chrome web browser used to use a Bloom filter to identify malicious URLs.

Skiplist

A skip-list is essentially a linked list that allows you to binary search on it. The way it accomplishes this is by adding extra nodes that will enable you to ‘skip’ sections of the linked-list. Given a random coin toss to create the extra nodes, the skip list should have O(logn) searches, inserts and deletes.
Usecases
  • LevelDB MemTable
  • Redis SortedSet
  • Lucene inverted index

B tree vs. B+ tree

B tree vs. B+ tree
Pros of B tree
  • Data associated with each key ⟶ frequently accessed nodes can lie closer to the root, and therefore can be accessed more quickly.
Pros of B+ tree
  • No data associated in the interior nodes ⟶ more keys in memory ⟶ fewer cache misses
  • Leaf nodes of B+ trees are linked ⟶ easier to traverse ⟶ fewer cache misses



Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts