Saturday, April 13, 2019

Big Data Algorithms and Data Structures

https://puncsky.com/hacking-the-software-engineer-interview

Bloom Filter
A Bloom filter is a data structure used to detect whether an element is in a set in a time and space efficient way.

False positive matches are possible, but false negatives are not – in other words, a query returns either “possibly in set” or “definitely not in set”. Elements can be added to the set, but not removed (though this can be addressed with a “counting” bloom filter); the more elements that are added to the set, the larger the probability of false positives.

Usecases
Cassandra uses Bloom filters to determine whether an SSTable has data for a particular row.
An HBase Bloom Filter is an efficient mechanism to test whether a StoreFile contains a specific row or row-col cell.
A website’s anti-fraud system can use bloom filters to reject banned users effectively.
The Google Chrome web browser used to use a Bloom filter to identify malicious URLs.

Skiplist

A skip-list is essentially a linked list that allows you to binary search on it. The way it accomplishes this is by adding extra nodes that will enable you to ‘skip’ sections of the linked-list. Given a random coin toss to create the extra nodes, the skip list should have O(logn) searches, inserts and deletes.

Usecases

LevelDB MemTable
Redis SortedSet
Lucene inverted index

B tree vs. B+ tree

Pros of B tree

Data associated with each key ⟶ frequently accessed nodes can lie closer to the root, and therefore can be accessed more quickly.

Pros of B+ tree

No data associated in the interior nodes ⟶ more keys in memory ⟶ fewer cache misses
Leaf nodes of B+ trees are linked ⟶ easier to traverse ⟶ fewer cache misses

Saturday, April 13, 2019

Big Data Algorithms and Data Structures

Bloom Filter

Skiplist

B tree vs. B+ tree

Labels

Popular Posts