Normalization in Database (With Example)

Normalization in Database With ExampleDatabase normalization is the process of making the data in a database available in the most organized way possible. It is one of the first concepts you will learn when studying database management, as you might in a course like SQL Database For Beginners.

When you’re normalizing a database, there are two things you need to consider: whether the information in the database has internal redundancies, and whether the dependencies across the different tables in the database are logically organized.

The term normalization comes from the concept of normal forms, which describe just how organized the information is in the database. But normal forms were developed around the concept of a table-based relational database, which you need to learn about in order to understand database normalization. You might want to take a course like this one in RDBMS Concepts to get started. Normal forms are designed to enable the most straightforward approaches is for searching across information when it is structured in this manner.

As an example, let’s imagine we’re creating a database of the children in a class, and the pets they have. When starting to build this database, the first approach might be to create a simple table with all of the information in one place, and one row for each student.

TABLE: STUDENTS

| Name                | Age | Pet | Pet Name |
----------------------------------------------
| Heather             | 10  | Dog | Rex      |
----------------------------------------------
| Rachel              | 10  | Cat | Fluff    |
----------------------------------------------
| Jimmy               | 11  | Dog | Kimba    |
----------------------------------------------
| Lola                | 1o  | Cat | Thomas   |
----------------------------------------------

This works until you realize that Heather might have two pets, a dog and a cat. How would you represent that in this table? In order to do that, we need to introduce first normal form.

First Normal Form

To achieve first normal form for a database, you need to make sure that no table contains multiple columns that you could use to get the same information. Each table should be organized into rows, and each row should have a primary key that distinguishes it as unique. The primary key is usually a single column, but sometimes more than one column can be combined to create a single primary key.

Using the rules of first normal form, there may be redundant information across multiple rows, but each row will be unique.

TABLE: STUDENTS

| Name                | Age | Pet | Pet Name |
----------------------------------------------
| Heather             | 10  | Dog | Rex      |
----------------------------------------------
| Heather             | 10  | Cat | Thomas   |
----------------------------------------------
| Rachel              | 10  | Cat | Fluff    |
----------------------------------------------
| Jimmy               | 11  | Dog | Kimba    |
----------------------------------------------
| Lola                | 10  | Cat | Thomas   |
----------------------------------------------

Second Normal Form

In this first example there are two rows for Heather, with changes to the second row to show the fact that there are multiple pets. While this is searchable, and follows first normal form, it is an inefficient use of space. To achieve second normal form, it would be helpful to split out the pets into an independent table, and match them up using the student names as foreign keys.

TABLE: STUDENTS

| Name                | Age |  
-----------------------------  
| Heather             | 10  |  
-----------------------------  
| Rachel              | 10  |  
-----------------------------  
| Jimmy               | 11  |  
-----------------------------  
| Lola                | 10  |  
-----------------------------  

TABLE: PETS

| Type         | Pet Name  | Owner    |
---------------------------------------
| Dog          | Rex       | Heather  |
---------------------------------------
| Cat          | Thomas    | Heather  |
---------------------------------------
| Cat          | Fluff     | Rachel   |
---------------------------------------
| Dog          | Kimba     | Jimmy    |
---------------------------------------
| Cat          | Thomas    | Lola     |
---------------------------------------

Third Normal Form

This is a cleaner organization for the information, and avoids repeating the age of the student with two pets, but the dogs and cats are repeated several times in the pets table. We can see that it is uncertainty when there are two pets with the same type and the same name. Is the primary key the owner and the type, the name and the type, or the owner and the name? Third normal form would suggest making sure each non-key element in each table provides information about the key in the row.

In order to establish an unambiguous unique identifier for each pet, it is useful to include a unique primary key that distinguishes each pet from the others. A similar issue could occur if there were two students with the same name, so creating unique primary key values for the student table is also a good idea. Often these will be added automatically when database rows are set to autoincrement. If you need to learn how to do that, you can take a course to learn Database Design with MySQL, take a course to learn Oracle database administration, or take a course to learn Microsoft SQL fundamentals and find out exactly how to autoincrement table rows in the environment you use.

TABLE: STUDENTS

| ID | Name                | Age |  
----------------------------------  
| 00 | Heather             | 10  |  
----------------------------------  
| 01 | Rachel              | 10  |  
----------------------------------  
| 02 | Jimmy               | 11  |  
----------------------------------  
| 03 | Lola                | 10  |  
----------------------------------  

TABLE: PETS

| ID | Type         | Pet Name  | Owner ID |
--------------------------------------------
| 00 | Dog          | Rex       | 00       |
--------------------------------------------
| 01 | Cat          | Thomas    | 00       |
--------------------------------------------
| 02 | Cat          | Fluff     | 01       |
--------------------------------------------
| 03 | Dog          | Kimba     | 02       |
--------------------------------------------
| 04 | Cat          | Thomas    | 03       |
--------------------------------------------

Fourth Normal Form

Now it is clear that which pet and which student are associated, the fact that one student may have more than one pet, or that one pet may be shared among more than one student, needs to be represented. For this example, imagine that we needed to show that the cat named Thomas as actually shared between Lola and Heather. To represent this clearly, we need to add a third table of relationships.

TABLE: STUDENTS

| ID | Name                | Age |  
----------------------------------  
| 00 | Heather             | 10  |  
----------------------------------  
| 01 | Rachel              | 10  |  
----------------------------------  
| 02 | Jimmy               | 11  |  
----------------------------------  
| 03 | Lola                | 10  |  
----------------------------------  

TABLE: PETS

| ID | Type         | Pet Name  |
---------------------------------
| 00 | Dog          | Rex       |
---------------------------------
| 01 | Cat          | Thomas    |
---------------------------------
| 02 | Cat          | Fluff     |
---------------------------------
| 03 | Dog          | Kimba     |
---------------------------------

Table: PETS-STUDENTS

| Pet ID | Owner ID |
---------------------
| 00     | 00       |
---------------------
| 01     | 00       |
---------------------
| 02     | 01       |
---------------------
| 03     | 02       |
---------------------
| 01     | 03       |
---------------------

Now we have a flexible and searchable structure in fourth normal form that can represent all the available information about each of the students, each of the pets, and the relationships among them. As you learn more about databases, you will come to recognize the challenges of structuring the data according to your actual needs.

An important thing to keep in mind as you learn to structure your data is that the normal forms are not set in stone as goals for every database. Each one is a guideline that you may choose to use as a reference point when considering the best way to organize your personal database to meet the needs of your business. Sometimes the most performant or appropriate organization for your database may not be the most logical one from a normalization standpoint.

If you’re ready to master database normalization, check out a course like Hitesh Choudhary’s Database Design Master Course to learn all about it.