Normalization
Normalization, developed by E.F. Codd, is a series of formal steps used to design relational databases that eliminate redundant data, prevent update anomalies, and closely model real-world relationships.
The Goal of Normalization
When tables are poorly structured, they suffer from anomalies:
- Insertion Anomalies: Unable to insert data without providing unrelated data.
- Update Anomalies: Modifying single data points requires changing multiple redundant rows.
- Deletion Anomalies: Deleting a row accidentally destroys entirely unrelated data.
Normalization decomposes a single large table into smaller, tightly focused tables and links them via Foreign Keys.
Normal Forms (1NF to 5NF)
1. First Normal Form (1NF)
A table is in 1NF if:
- It contains no repeating groups.
- All column values are atomic (indivisible).
If a table cell contains a list (e.g., ["Science", "Math"]), it violates 1NF and must be split into multiple rows.
2. Second Normal Form (2NF)
A table is in 2NF if:
- It is in 1NF.
- There are no Partial Dependencies. Every non-key attribute must be entirely dependent on the entire Primary Key.
If you have a composite primary key layout, and a field relies on only half of the key, it breaches 2NF and should be moved to a newly formed relation.
3. Third Normal Form (3NF)
A table is in 3NF if:
- It is in 2NF.
- There are no Transitive Dependencies.
A non-key attribute must not depend on another non-key attribute. If A -> B and B -> C, remove the B -> C relation into a new table.
4. Boyce-Codd Normal Form (BCNF)
BCNF is a stricter version of 3NF.
- It states that for any dependency
A → B,Amust be a Candidate Key. BCNF covers complex scenarios where a table contains multiple overlapping candidate keys.
5. Fourth Normal Form (4NF)
A table is in 4NF if:
- It is in BCNF.
- It has no Multi-Valued Dependencies.
6. Fifth Normal Form (5NF)
A table is in 5NF if:
- It is in 4NF.
- It cannot be losslessly decomposed any further. It deals with purely isolated Join Dependencies.
Denormalization
Normalization is an excellent theoretical baseline, but the "perfectly" normalized database might require 20 table joins to load a single user profile, destroying read performance.
Denormalization is the process of intentionally re-introducing redundancy into a normalized database to optimize query read performance at the cost of slight storage redundancy and update overhead.