NORMALIZATION AND ITS TYPES (1NF,2NF,3NF,BCNF,4NF AND DKNF)
Normalization is the process of organizing data in a relational database to reduce redundancy and dependency, resulting in a well-structured database schema. There are several normal forms, each addressing specific types of anomalies that can occur in a database.
Need of Normalization
Normalization helps to minimize duplicate data by organizing the database into tables with clear relationships. This not only saves storage space but also makes the database more efficient to manage.
Normalization minimizes update anomalies, which occur when data inconsistencies arise due to partial updates.
Normalization helps prevent deletion anomalies, which occur when deleting data leads to unintended loss of other valuable data.
Normalization helps to minimize duplicate data by organizing the database into tables with clear relationships. This not only saves storage space but also makes the database more efficient to manage.
Normalization minimizes update anomalies, which occur when data inconsistencies arise due to partial updates.
Normalization helps prevent deletion anomalies, which occur when deleting data leads to unintended loss of other valuable data.
Types of Normalization
First Normal Form (1NF):
Definition: A relation is in 1NF if it contains only atomic (indivisible) values, and each column has a unique name. Table should not contain multivalued attributes.
Example:
S_Id |
S_Name |
Course |
1 |
Arun |
C/C++ |
2 |
Bikash |
Python |
3 |
Chetan |
DBMS/C |
To convert to 1NF, we would break the Courses column into individual rows:
S_Id |
S_Name |
Course |
1 |
Arun |
C |
1 |
Arun |
C++ |
2 |
Bikash |
Python |
3 |
Chetan |
DBMS |
3 |
Chetan |
C |
Second Normal Form (2NF):
Definition: A relation is in 2NF if it is in 1NF and all non-prime attributes are fully functionally dependent on the primary key (no partial dependency on any part of the composite key if the primary key is composite or candidate key).
Table should not contain any partial dependency.
Non-Prime Attribute: Attributes that do not participate in formation of Candidate Key .
Prime Attribute: Attributes that participate in formation of Candidate Key .
Example:
Student_ID |
Course_ID |
Course_Fee |
1 |
110 |
2000 |
2 |
112 |
5000 |
3 |
105 |
2500 |
Prime Attribute= StudentID and Course_ID
Non Prime Attribute=Course_Fee
As attribute Course_Fee only depend on Course_ID and not on Student_ID , it doesnot follow 2nf rule as a non-prime attribute only depends on one prime attribute.
To convert to 2NF, we break above table into two tables:
Student_ID |
Course_ID |
1 |
110 |
2 |
112 |
3 |
105 |
Course_ID |
Course_Fee |
110 |
2000 |
112 |
5000 |
105 |
2500 |
Now, Non-prime attribute Course_Fee is fully dependent on the Course_ID, hence in 2NF.
Third Normal Form (3NF):
Definition: A relation is in 3NF if it is in 2NF, and there are no transitive dependencies or no non-prime attribute depends on another non-prime attribute.
Any non-prime attribute should only depend on prime attribute.
Example:
Here,
Prime Attribute=Student_ID
Non-Prime Attribute=State, City
In this table , City is dependent on State and both are Non-Prime attributes .
Also, City
is dependent on State
, which creates a transitive dependency because City
is indirectly dependent on Student_ID
through State
.
Student_ID |
State |
City |
1 |
CA |
Birganj |
2 |
SA |
Pokhara |
3 |
TA |
Kathmandu |
To convert to 3NF, we would break above table:
State Table
State |
City |
CA |
Birganj |
SA |
Pokhara |
TA |
Kathmandu |
Student Table
Student_ID |
State |
1 |
CA |
2 |
SA |
3 |
TA |
Now, both tables are in 3NF:
City
is directly dependent on State
.State
is directly dependent on Student_ID
.Boyce-Codd Normal Form (BCNF):
Boyce-Codd Normal Form (BCNF) is an advanced version of the Third Normal Form (3NF). A table is in BCNF if it is in 3NF and, for every functional dependency (A → B), A is a superkey. A superkey is a set of one or more columns that uniquely identify a row in a table.
Example of BCNF:
Consider a table that records which course a student is enrolled in and who teaches that course:
Student_ID |
Course |
Instructor |
1 |
C |
P |
2 |
C++ |
B |
3 |
DBMS |
A |
The table is already in 1NF because all values are atomic and there are no repeating groups.
The table is also in 2NF. Here, StudentID
and Course
together can be the composite key. There are no partial dependencies as all non-key attributes depend on the entire composite key.
The table is in 3NF as well because there are no transitive dependencies; all non-key attributes (Instructor) depend directly on the composite key (StudentID, Course
).
However, this table is not in BCNF. Here's why:
The functional dependency Course → Instructor, here Course alone is not a Super Key in the above table.
To convert the table to BCNF, we need to decompose it so that every determinant is a superkey. We split the table into two:
Course Table
Course |
Instructor |
C |
P |
C++ |
B |
DBMS |
A |
Enrollment Table
Student_ID |
Course |
1 |
C |
2 |
C++ |
3 |
DBMS |
Now, let's check if these tables meet the BCNF criteria:
Course
is a superkey, and it determines Instructor
(Course → Instructor
).StudentID, Course
) uniquely identifies each row, and there are no partial or transitive dependencies.Domain-Key Normal Form (DKNF):
Definition: A relation is in DKNF if it is in BCNF and all constraints are expressed in terms of domain constraints and key constraints.
A domain constraint specifies the permissible values for a given attribute, while a key constraint specifies the attributes that uniquely identify a row in a given table.
Example
Consider a table called Rich Person, the domain for Status consists of the values 'Millionaire' and 'Billionaire'; and the domain for Net Worth in Dollars consists of all integers greater than or equal to 1,000,000
The constraint dictates that a Millionaire will have a net worth of 1,000,000 to 999,999,999 inclusive, whilst a Billionaire will have a net worth of 1,000,000,000 or higher. This constraint is neither a domain constraint nor a key constraint.
The DKNF violation could be eliminated by removing the Status column. The wealthy person's status as a millionaire or billionaire is determined by their Net Worth in Dollars, as defined in the Status table, so no useful information is lost.
Rich_Person Table
PersonName |
Status |
Net Worth in Dollars |
A |
Millionaire |
424,553,621 |
B |
Billionaire |
8,533,228,793 |
C |
Billionaire |
5,829,562,999 |
D |
Millionaire |
792,545,211 |
Person Table
PersonName |
Net Worth in Dollars |
A |
424,553,621 |
B |
8,533,228,793 |
C |
5,829,562,999 |
D |
792,545,211 |
Status Table
Status |
Minimum |
Maximum |
Millionaire |
1,000,000 |
999,999,999 |
Billionaire |
1,000,000,000 |
999,999,999,999 |