Understanding Database Normal Forms

Normalization in Database With ExampleIn order to achieve  different levels of Database ‘Normal Forms’, a normalization process is used to organize or decompose the data in database efficiently by examining the relations (tables) for anomalies and removing them if detected. This leads to removal of redundant data and data dependencies from database. As a result, a database design is achieved by having  smaller and well structure relations that require less storage space for data. Finally, a trade-off is made between the redundant data and performance since normalization of data leads to separate relations. The joins are mostly used to retrieve the data, which requires more processing power.

Interested in learning more about database design? Look at an online database course at Udemy.com.

What is Database Normal Forms?

Theoretically, Database ‘Normal Forms’ (NF) is a process of defining database relations in such a way that they are invulnerable to logical inconsistencies and anomalies like insertion, edits and deletion. Each relation has the Highest Normal Form, but this does not mean that it is the maximum level of ‘Normal Form’. The ‘Normal Form’ is associated with each relation; therefore to achieve the entire Database Normal Form it is necessary that each relation attains the highest normal form.

A ‘Normal Form’ is a simplified state of a relation on which rules concerning functional dependencies (relationship between attributes) are applied. The following are the rules or different levels of ‘Normal Forms’.

  • ‘First Normal Form’ (1NF) results in a relation with no multivalued attributes.
  • ‘Second Normal Form’ (2NF) results in a relation with no partial functional dependencies.
  • ‘Third Normal Form’ (3NF) results in a relation with no transitive functional dependencies (functional dependency between two non-keys attributes).
  • ‘Boyce-Codd Normal Form’ is achieved by removing the remaining function dependencies after 3NF.
  • ‘Fourth Normal Form’ results in a relation with no multivalued dependencies.
  • ‘Fifth Normal Form’ is achieved with removing of all the remaining anomalies.

Only first three normal forms are discussed since it covers all the basic problems while designing the database.

First Normal Form (1NF)

When there is no multi-valued attribute present in a relation, then a relation is said to be in ‘First Normal Form’. Therefore, a relation that is in 1NF meets all the required properties in relation definition. Important properties are each attribute value must only contain a single value and of the same type, each attribute has unique name. The order is insignificant. In other words, no two rows in a relation can be identical. The following tables show the multi-valued attributes and resultant 1NF relation of Student.

StudentIDNameDeptNameDeptLocationCourseTitleDateCompleted
1Mr. JohnSoftwareNorthDBMSJava01/10/20XX06/10/20XX
2Mr. DoeTelecomSouthC++Java01/10/20XX06/10/20XX
3Mr. CrossSoftwareNorthDBMSJava01/10/20XX06/10/20XX

 

StudentIDNameDeptNameDeptLocationCourseTitleDateCompleted
1Mr. JohnSoftwareNorthDBMS01/10/20XX
1Mr. JohnSoftwareNorthJava06/10/20XX
2Mr. DoeTelecomSouthC++01/10/20XX
2Mr. DoeTelecomSouthJava06/10/20XX
3Mr. CrossSoftwareNorthDBMS01/10/20XX
3Mr. CrossSoftwareNorthJava06/10/20XX

 

In the first table, a multi-valued attribute ‘CourseTitle’ exists, which is why it is not a valid relation. The second table removes that multi-valued attribute by filling the empty places that create new rows and make that attribute a part of a primary key to uniquely identify each row in a table. This makes the second table a valid relation in ‘First Normal Form’.

For more database design strategies, check out this Udemy.com course

Second Normal Form (2NF)

When a relation is already in ‘First Normal Form’ and every non-key attribute in a relation is fully functional and dependent on the primary key, then the relation is said to be in ‘Second Normal Form’. The following rules are used to identify a 2NF relation:

  • Only one attribute is assigned as a primary key in a relation.
  • All attributes in a relation make up a primary key.
  • Every non-key attribute in a relation is fully functional dependent on all the components of the primary key.

In the last table of ‘First Normal Form’ where primary key is a composite key, the relation is not in 2NF because ‘Name’ and ‘DeptName’ are only functionally dependent on the part of primary key that is ‘StudentID’. The same goes for ‘DateCompleted’, which is on ‘CourseTitle’ only. This is called the partial functional dependency. Therefore, partial functional dependency exists in the 1NF Student relation, which creates data redundancy and anomalies like removing the course details also removes the Student’s department information. To convert the 1NF Student relation into 2NF, the relation is decomposed into smaller and well-structured relation that satisfies one of the above rules.

Table: Student_Basic_Info

StudentIDNameDeptNameDeptLocation
1Mr. JohnSoftwareNorth
2Mr. DoeTelecomSouth
3Mr. CrossSoftwareNorth

 

Table: Student_Course_Info

StudentIDCourseTitleDateCompleted
1DBMS01/10/20XX
1Java06/10/20XX
2C++01/10/20XX
2Java06/10/20XX
3DBMS01/10/20XX
3Java06/10/20XX

 

The above tables ‘Student_Basic_Info’ satisfy the first rule of 2NF where there is only one attribute ‘StudentID’ assigned as primary key. The ‘Student_Course_Info’ satisfy the third rule as the ‘DateCompleted’ fully functional dependent on each component of a primary key. Also, from this decomposition of tables it is clear that some redundant data is removed.

Third Normal Form (3NF)

When a relation is already in 2NF and no functional dependencies exist between two or more non-key attributes (also called a transitive dependency), a relation is in ‘Third Normal Form’.

The 2NF table name ‘Student_Basic_Info’ is not in 3NF because the ‘DeptLocation’ (a non-key attribute) is functionally dependent on the non-key attribute ‘DeptName’, a transitive dependency exist. Due to this dependency, data redundancy exists since with the department name of each student, a location of department, is repeated. Some anomalies are also present. A modification to update the location of department needs to change multiple records. To insert a new department is not possible without the ‘StudentID’ and finally, to delete the ‘StudentID’ of ‘Mr. John’ also deletes the department location information. To achieve the 3NF, a further decomposition is required. The following are the resultant tables:

 

Table: Student_Basic_Info

StudentIDNameDeptName
1Mr. JohnSoftware
2Mr. DoeTelecom
3Mr. CrossSoftware

 

Table: Department_Info

DeptNameDeptLocation
SoftwareNorth
TelecomSouth

 

To further explore database normal forms, take a design course at Udemy.com.