Tuesday, July 1, 2014

Interview Question: Database Normalization



Normalization is the process of organizing data into a related table; it also eliminates redundancy, undesirable characteristics like Insertion, Update and Deletion Anomalies and increases the integrity.

Benefits of Normalization 
• Less storage space 
• Quicker updates 
• Less data inconsistency 
• Clearer data relationships 
• Easier to add data 
• Flexible Structure 
Update Anomaly - Cannot update information without changing information in many places. To update customer information, it must be updated for each sales order the customer has placed.

DE- NORMALIZATION: 
Denormalization is the process of adding redundant data to speed up complex queries involving multiple table JOINS. One might just go to a lower form of Normalization to achieve Denormalization and better performance. Data is included in one table from another in order to eliminate the second table which reduces the number of JOINS in a query and thus achieves performance.


First Normal Form (1NF)
If all attribute values are atomic: no repeating group, no composite attributes.
DPT_NO EMP_NO

Second Normal Form (2NF) - No partial dependency
(a) It is in 1st Normal Form.
(b) If no non-key attribute is dependent on a part of composite key, that is all the attributes must be dependent on the whole composite key not just a part of it.
There must not be any partial dependency of any column on primary key.

2nd Normal Form can be applied only to the table which has any composite key.
{StudentSubject}, Age 

Third Normal Form (3NF): No transitive dependency
(a) It is in 2NF.
(b) It doesn't contain any transitive dependencies.
Remove transitive dependencies: remove columns that are not dependent upon the primary key.
Everything is dependent on the key or is in the key.
Each column must depend on *directly* on the primary key.

Boyce Codd Normal Form (BCNF):
Every determinant is a candidate key
No overlapping candidate keys
A relation is said to be in BCNF if and only if the determinants are candidate keys. BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.

A relation is in BCNF, if and only if, every determinant is a candidate key. 
The difference between 3NF and BCNF is that for a functional dependency A -> B, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key, 
whereas BCNF insists that for this dependency to remain in a relation, A must be a candidate key. 

Student, Course, Teacher: SC is the primary key, but teacher -> course.
When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF.
3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys
i.e. composite candidate keys with at least one attribute in common.
BCNF is based on the concept of a determinant.
A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent.

A relation is in BCNF is, and only if, every determinant is a candidate key.

A 3NF relation won't be in BCNF if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes, and (c) there are common attributes between the keys.
http://psoug.org/reference/normalization.html
Fourth Normal FormAn entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF) and additionally:
  • Has no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independent of each other.
  • Many:many relationships are resolved independently.
Fifth Normal FormAn entity is in Fifth Normal Form (5NF) if, and only if, it is in 4NF and every join dependency for the entity is a consequence of its candidate keys.
A candidate key is a combination of attributes that can be uniquely used to identify a database record without any extraneous data. Each table may have one or more candidate keys. One of these candidate keys is selected as the table primary key.


References

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts