DATABASE MANAGEMENT SYSTEM

NORMALIZATION AND ITS TYPES (1NF,2NF,3NF,BCNF,4NF AND DKNF)

Normalization is the process of organizing data in a relational database to reduce redundancy and dependency, resulting in a well-structured database schema. There are several normal forms, each addressing specific types of anomalies that can occur in a database. 

Need of Normalization

Reducing Data Redundancy

Normalization helps to minimize duplicate data by organizing the database into tables with clear relationships. This not only saves storage space but also makes the database more efficient to manage.

Reducing Update Anomalies

Normalization minimizes update anomalies, which occur when data inconsistencies arise due to partial updates.

Preventing Deletion Anomalies

Normalization helps prevent deletion anomalies, which occur when deleting data leads to unintended loss of other valuable data.

Reducing Data Redundancy

Normalization helps to minimize duplicate data by organizing the database into tables with clear relationships. This not only saves storage space but also makes the database more efficient to manage.

Reducing Update Anomalies

Normalization minimizes update anomalies, which occur when data inconsistencies arise due to partial updates.

Preventing Deletion Anomalies

Normalization helps prevent deletion anomalies, which occur when deleting data leads to unintended loss of other valuable data.

Types of Normalization

First Normal Form (1NF):

Definition: A relation is in 1NF if it contains only atomic (indivisible) values, and each column has a unique name. Table should not contain multivalued attributes.

Example:

S_Id

S_Name

Course

1

Arun

C/C++

2

Bikash

Python

3

Chetan

DBMS/C

To convert to 1NF, we would break the Courses column into individual rows:

S_Id

S_Name

Course

1

Arun

C

1

Arun

C++

2

Bikash

Python

3

Chetan

DBMS

3

Chetan

C

Second Normal Form (2NF):

Definition: A relation is in 2NF if it is in 1NF and all  non-prime attributes are fully functionally dependent on the primary key (no partial dependency on any part of the composite key if the primary key is composite or candidate key).

Table should not contain any partial dependency.

Non-Prime Attribute: Attributes that do not participate in formation of Candidate Key .

Prime Attribute: Attributes that participate in formation of Candidate Key .

Example:

Student_ID

Course_ID

Course_Fee

1

110

2000

2

112

5000

3

105

2500

Prime Attribute= StudentID and Course_ID

Non Prime Attribute=Course_Fee

As attribute Course_Fee only depend on Course_ID and not on Student_ID , it doesnot follow 2nf rule as a non-prime attribute only depends on one prime attribute.

To convert to 2NF, we break above table into two tables:

 

Student_ID

Course_ID

1

110

2

112

3

105

 

Course_ID

Course_Fee

110

2000

112

5000

105

2500

Now, Non-prime attribute Course_Fee is fully dependent on the Course_ID, hence in 2NF.

Third Normal Form (3NF):

Definition: A relation is in 3NF if it is in 2NF, and there are no transitive dependencies or no non-prime attribute depends on another non-prime attribute.

Any non-prime attribute should only depend on prime attribute.

Example:

Here,

Prime Attribute=Student_ID

Non-Prime Attribute=State, City

In this table , City is dependent on State and both are Non-Prime attributes .

Also, City is dependent on State, which creates a transitive dependency because City is indirectly dependent on Student_ID through State.

Student_ID

State

City

1

CA

Birganj

2

SA

Pokhara

3

TA

Kathmandu

To convert to 3NF, we would break above table:

State Table

State

City

CA

Birganj

SA

Pokhara

TA

Kathmandu

Student Table

Student_ID

State

1

CA

2

SA

3

TA

Now, both tables are in 3NF:

  • In the States Table, City is directly dependent on State.
  • In the Students Table, State is directly dependent on Student_ID.

Boyce-Codd Normal Form (BCNF):

Boyce-Codd Normal Form (BCNF) is an advanced version of the Third Normal Form (3NF). A table is in BCNF if it is in 3NF and, for every functional dependency (A → B), A is a superkey. A superkey is a set of one or more columns that uniquely identify a row in a table.

Example of BCNF:

Consider a table that records which course a student is enrolled in and who teaches that course:

Student_ID

Course

Instructor

1

C

P

2

C++

B

3

DBMS

A

Step 1: First Normal Form (1NF)

The table is already in 1NF because all values are atomic and there are no repeating groups.

Step 2: Second Normal Form (2NF)

The table is also in 2NF. Here, StudentID and Course together can be the composite key. There are no partial dependencies as all non-key attributes depend on the entire composite key.

Step 3: Third Normal Form (3NF)

The table is in 3NF as well because there are no transitive dependencies; all non-key attributes (Instructor) depend directly on the composite key (StudentID, Course).

However, this table is not in BCNF. Here's why:

Identifying the problem:

The functional dependency Course → Instructor, here Course alone is not a Super Key in the above table.

 

Decomposition to BCNF:

To convert the table to BCNF, we need to decompose it so that every determinant is a superkey. We split the table into two:

Course Table

 

Course

Instructor

C

P

C++

B

DBMS

A

Enrollment Table

 

Student_ID

Course

1

C

2

C++

3

DBMS

Now, let's check if these tables meet the BCNF criteria:

  • In the Courses Table, Course is a superkey, and it determines Instructor (Course → Instructor).
  • In the Enrollments Table, the composite key (StudentID, Course) uniquely identifies each row, and there are no partial or transitive dependencies.

Domain-Key Normal Form (DKNF):

Definition: A relation is in DKNF if it is in BCNF and all constraints are expressed in terms of domain constraints and key constraints.

A domain constraint specifies the permissible values for a given attribute, while a key constraint specifies the attributes that uniquely identify a row in a given table.

Example 

Consider a table called Rich Person, the domain for Status consists of the values 'Millionaire' and 'Billionaire'; and the domain for Net Worth in Dollars consists of all integers greater than or equal to 1,000,000

The constraint dictates that a Millionaire will have a net worth of 1,000,000 to 999,999,999 inclusive, whilst a Billionaire will have a net worth of 1,000,000,000 or higher. This constraint is neither a domain constraint nor a key constraint.

The DKNF violation could be eliminated by removing the Status column. The wealthy person's status as a millionaire or billionaire is determined by their Net Worth in Dollars, as defined in the  Status table, so no useful information is lost.

Rich_Person Table

PersonName

Status

Net Worth in Dollars

A

Millionaire

424,553,621

B

Billionaire

8,533,228,793

C

Billionaire

5,829,562,999

D

Millionaire

792,545,211

Person Table

PersonName

Net Worth in Dollars

A

424,553,621

B

8,533,228,793

C

5,829,562,999

D

792,545,211

Status Table

Status

Minimum

Maximum

Millionaire

1,000,000

999,999,999

Billionaire

1,000,000,000

999,999,999,999