We use database to store information we need on a regular basis. This information could be about our products and services, our employees, our regular clients, or just our personal Blu-ray collection. Database lets us keep this information in a single location that we can retrieve at any time. However, these things can grow out of hand very quickly. As we put more and more information in them, they can get difficult to use and a hassle to store. We need a way to tame our databases when they become monsters tearing up our hard drives and our sanity. We need to implement normalization in our SQL.
Relational Databases and SQL
Normalization in SQL transforms a database into something more compact, easy to use, and highly scalable. However, you need a database you can normalize. For most people, that will be a relational database.
Relational databases are databases where their tables are related in some way. When you retrieve data from these databases, your inquiry takes information from all of these tables at the same time. Microsoft SQL Server, MySQL, and Oracle are examples of this type of database, and you can transform other types of databases such as Microsoft Access into a relational database with a few tweaks.
Relational databases mimic how our minds store and retrieve data. To understand something, you often break it down into smaller pieces to wrap your mind around. Once you understand the parts, you establish relationships between then in order to understand how they all work together within the whole package. Relational databases just create containers for those small pieces.
For example, let’s say you are designing the catalog for an online store. Catalogs are one application for a database. Your store your product info in the database and use your web site software to display it to your customers. For instance, you might want to store the following information in your database for a sporting goods store: product IDs, product names, product colors, product sizes, the prices for each product and product descriptions.
You could store this information in a table, but that can be tedious if you have multiple options for each product. A much better idea is to use a relational database. This way you create a master product table, and then separate tables for the colors, sizes, and prices. You then linked them together with SQL through the product IDs.
In essence, normalization is a way to make these tables and your links between them more efficient and effective.
Flat tables are like spreadsheets. They have many, many columns. These tables are standalone objects with everything you need right in the tables themselves. While this is okay for a small number of records, it is highly inefficient and consumes more physical space than you probably want. Normalization tries to eliminate a lot of the natural redundancy that comes with flat tables. This makes retrieving records easier while freeing up your hard drive resources in the process.
First Normal Form
There are as many normalization rules sets and normal table forms as there are applications that use them. However, most people only need to use the first three.
The first normal form has only two normalization rules.
- Eliminate repeated information
- Create separated tables for related data
These rules require you to go through your flat table and look for piece of information that repeats from record to record. Usually, this information comes from data that is common between the various items and people listed in the database. You then create separate tables for this information, one for each repeated piece of info.
The key component to all this is the key fields. Key fields are numeric data that does nothing else but identify records in your database. Each of your tables should have at least one key column that identifies its records. You then add these keys to your master table creating what is known as a one-to-many database scheme. That is one table has relationships with many other tables in your database.
Second Normal Form
The second rule adds one more rule to the mix:
- No non-key column depends on a portion of the primary key
Primary keys are the main identifier for their tables. Each table can only have one. The second normalization rule dictates that you identify any non-key information that could be branched off to their own table. This information may not be repetitive now, but it might be at some time in the future. For instance, when you add new types of products to your catalog, you might find that these new products share characteristics with your old ones that were once unique. The second rule tries to find this data long before you need them.
The second normalization often creates many-to-many relationships between your tables. Many-to-many relationships indicate that multiple pieces of data are shared between two or more tables. Generally, this means you might have to create mapping tables which are tables that essentially map other tables together.
Third Normal Form
The third form finishes the job with one more extra rule:
- No columns depend on other non-key columns
This means all your non-key information must not be tied to another non-key column in your database table. Generally, this means you have to go over your data to see if there is anything that should be split off into its own table. This extra information tends to be data you need to know but does not directly relate to any other piece of information. For instance, in a database of college students and their registered courses, you want to create a separate table for the professors. You need to know who teaches each course, but the professors normally teach more than one course, and can be switched as needed.
Once you completed the third normalization, your SQL database is usually as efficient as you can get it. There are more normalization rules than the ones I presented here, but most of them are more trouble than their worth. Either way, you retrieve your data out of the database by writing the relationships between your tables in your SQL statements. Each type of database has its own rules for this, so you will need to research the right formula for your particular database, but the output will be the same. You get a single table bearing just the information you need when you need it.