Thursday, October 4, 2018

Java primitive collections



http://www.mapdb.org/blog/better_primitive_collections_proposal/
So there is need for new primitive collection. It should store data in-memory, but also be swappable to disk if free memory runs low. On-heap long[] is not really swappable, Garbage Collector might randomly touch an array and load it from swap. Also data should be persisted between JVM restarts to save import time (without crash protection). That sounds like a job for memory mapped files.
Bit of theory: Collections such as HashSet<Long> store data in an array: Long[]. Primitive collections remove boxing: long[]. But one could also use raw binary storage such as byte[]DirectByteByfferRandomAccessFile or MappedByteBuffer. Access to longs will get slightly complicated, but most storage options provide methods such as setLong(index,value) or getLong(index) and are very similar to arrays.
Primitive collections relay on code generators to combine various types together. There are many classes such as IntObjectHashMap<V> or IntLongHashMap<V>. So it should be possible to modify code generator and add one extra parameters for storage. Than there will be classes such as MmapIntObjectHashMap<V> or HeapIntLongHashMap<V>.


Trove primitive sets use one array to store set values (using open indexed approach) and one more array of bytes to save each cell status: open, used or deleted. Besides, there are THashSet<E> and TLinkedHashSet<E>, which can replace java.util.HashSet and java.util.LinkedHashSet from JDK. These object sets don’t use a separate array of bytes to keep the slot state (unlike primitive sets).
Both JDK sets are based on the HashMap<E,Object>, where Object values are just cell state flags. HashMaps, in turn, are based on array of Map.Entry. As a result, JDK HashSet uses three Objectsper each stored values. This is especially expensive in case when you need to store primitive values in a set.
JDK LinkedHashSet is built on top of HashSet by adding double linked references to each map entry to keep order in which records have been inserted
Note that for a set of integer values (short/int/long) it may be worth to replace such set with a bit set. Read "Bit sets" article for more details.

https://dzone.com/articles/whats-wrong-with-java-boxed-numbers
http://java-performance.info/bit-sets/
Do not forget about bit sets when you need to map a large number of integer keys to boolean flags.
Sets of integer values should be replaced with bit sets in a lot of cases in order to save a lot of memory.

LongBitSet to deal with BitSet problems

Let’s try to deal with these problems. We will implement a LongBitSet – a class similar to java.util.BitSet. The only difference is that its keys will be long instead of int.
LongBitSet will be backed by Map<Long, BitSet>. The idea is to use high bits of a key as a key to this map and low bits as a key to a value BitSet. Thus we will support both partitioning and long keys. Problem is solved.
What about size of each BitSet? It depends on your data, but keeping up to a million keys in each BitSet is usually a good choice. Each bit set will require no more than 128K, keeping total size of a LongBitSet rather small

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts