Saturday, September 19, 2015

Variable Length Integer



http://vpri.org/fonc_wiki/index.php?title=Variable_Length_Integer
Variable Length Integers. A variable length integer allows storing an integer in a variable number of bytes. Doing this takes advantage of the fact that most often integer values tend towards 0, as well as allowing arbitrarily large values to be expressed without needing to make fundamental alterations to the file format.

VLI/UVLI (Unsigned VLI)

RangePattern
0..12700..7F
128..1638380..BF XX
16384..2097151C0..DF XX XX
2097152..268435455E0..EF XX XX XX
268435456..34359738367F0..F7 XX XX XX XX
34359738368...F8...
Where XX is the low-order bits, proceeding from high-to-low ordering.
The pattern will break at 64 bits, allowing either a complete 64-bit value in 9 bytes (FF followed by 8 bytes), or requiring 10 bytes and allowing the pattern to continue up to 128 bits before another such pattern break.

UTF-8

UTF-8 is itself a form of VLI.
RangeEncoding
0..1270xxxxxxx
127..2047110xxxxx 10yyyyyy
2047..655351110xxxx 10yyyyyy 10zzzzzz
65536..209715111110xxx 10yyyyyy 10zzzzzz 10wwwwww
2097152......
However, UTF-8 is limited to a maximum value of U+10FFFF or 1114111, despite the encoding allowing for the representation of larger values.
https://en.wikipedia.org/wiki/Variable-length_quantity
If A is 0, then this is the last VLQ octet of the integer. If A is 1, then another VLQ octet follows.
  • Represent the value in binary notation (e.g. 137 as 10001001)
  • Break it up in groups of 7 bits starting from the lowest significant bit (e.g. 137 as 0000001 0001001). This is equivalent to representing the number in base 128.
  • Take the lowest 7 bits and that gives you the least significant byte (0000 1001). This byte comes last.
  • For all the other groups of 7 bits (in the example, this is 000 0001), set the MSB to 1 (which gives 1000 0001 in our example). Thus 137 becomes 1000 0001 0000 1001 where the bits in boldface are something we added. These added bits denote if there is another byte to follow or not. Thus, by definition, the very last byte of a variable length integer will have 0 as its MSB.
Another way to look at this is to represent the value in base-128, and then set the MSB of all but the last base-128 digit to 1.



Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts