Wednesday, January 31, 2018

ID generation



Related:
http://massivetechinterview.blogspot.com/2015/06/twitteridsnowflake-iteye.html
http://massivetechinterview.blogspot.com/2015/10/generate-unique-sequence-number-using.html
http://massivetechinterview.blogspot.com/2016/06/buttercola-fast-id-generator.html

https://www.quora.com/Scalability-Why-Is-database-ID-generation-a-bottleneck
Imagine the classic simple case of a single table with an auto-incrementing primary key (or, if you prefer, a SEQUENCE). Now imagine there are very many connections to the database server, each of which is inserting records on this table. At some point, those inserts must be serialised (because of the rules that the sequence of IDs must follow: sequential, no gaps). This is achieved by a lock which amounts to a critical section (i.e. only one connection can increment the sequence at a time), and every connection needs to acquire that lock before obtaining the next ID (and inserting a record).

 Many systems don't see Id generation as problem, which is because they don't have a large number of requests. Let's say you want to allocate an ID to each user and the ID should be unique and incremental by the registration date. Once you are receiving tons of registration requests every second or even millisecond, you will find it extremely easy to have duplicate IDs.

https://stackoverflow.com/questions/22883304/why-is-auto-increment-pattern-bad-when-scaling-in-mongodb
In order to guarantee that an auto-increment value is unique, the ID creation must occur on a single thread on a single host (even if multiple threads are used, the point of ID creation must block other threads). So, in a cluster of 100 servers, IDs must be created on 1 thread on 1 out the 100 servers. This not just a performance bottleneck, it is possible that the creation of 2 auto-increment IDs might block each other, which is the race condition noted in the quotation you've cited.
It should be noted that transactional RDBMS systems like Oracle and SQL Server have solved the race condition problem, but there is no solution to the performance bottleneck.
So: no, don't use auto-increment in non-primary keys if you anticipate the need to scale your system.

Clock synchronization

We ignored a crucial problem in the above analysis. In fact, there’s a hidden assumption that all ID generation servers have the same clock to generate the timestamp, which might not be true in distributed systems.
In reality, system clocks can drastically skew in distributed systems, which can cause our ID generators provide duplicate IDs or IDs with incorrect order. Clock synchronization is out of the scope of this discussion, however, it’s important for you to know such issue in this system. There are quite a few ways to solve this issue, check NTP if you want to know more about it.



Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts