MySQL Database Normalization

December 28, 2011
Developers
LinkedInTwitterFacebookEmail

I was recently creating a database that was complicated and was wondering what the best way to organize it was. So I took a look at what smarter (and geekier) people have written.

The first thing I came across was the 5 steps of Database Normalization – 1NF, 2NF, 3NF, 4NF and 5NF. I had never read these before, but found that I was using most of them when I created databases. I think it’s helpful though to see them written down and explained like this though.

1NF

If your database meets these requirements, it will have no duplicate columns and it will group relative information together and use a primary key. Honestly, this one is almost a no-brainer. Of course you won’t have two columns that say the exact same information in one table: that would be ridiculous. And you probably already group information into different tables based on what they’re dealing with. And hopefully you use primary keys/IDs for each row.

2NF

A little bit harder, 2NF deals with grouping information more drastically, as well as the requirements of 1NF. if there are subsets of data that apply to multiple rows in the database, it is better to move those to another table. For example, if you have a table with customer’s orders. It would include the ID, the item, the customer name, and the customer address. But wait, if the customer orders something more than once, that customer address will show up in another row in the table. It would be better organized, then, if you took out the customer’s address and put it into another table. So now you have two tables:

#1 – ID, item, customerID    <– This is a foreign key. (It speaks with an Australian accent)
#2 – customerID, name, address    <— the above foreign key references this table’s primary key, customerID

Now, when we change a customer’s address, we only need to change one field in one database, not multiple fields. Which makes things happier, brighter, and adds rainbows in the sky.

3NF

Now it really starts to get tricky. When you get to this level, you must be 1NF compliant and 2NF compliant AND you are now parring down the rows in the database: Remove columns not fully dependent on the Primary Key.

Let’s say you have a table with the following columns: ID, name, price, # of items, total

The column “total” can be gotten to by multiplying the price and the # of items, so this column is not fully dependent on the Primary Key.

Also any column that you would need to update in multiple rows to update completely in the database, should be put in another table, just like 2NF. For example:

#1 – ID, Tournament, Winner, B-day of winner

Take the Birthday of the winner and put it in another table:

#1 – ID, Tournament, WinnerID
#2 – WinnerID, Birthday

Why do this? Remember, you want any information in the database to be stored ONLY once. You should never have to update multiple rows for one item, AND there is no reason to keep the results of calculations in the database…just do the calculations when you pull the information out.

4NF

If the last one was tricky, this one is downright annoying. Now, not only are we looking for dependencies, we are looking for multi-valued dependencies. So, here goes:

#1 – Name, Backpack color, school

Row 1 – Rob has a purple backpack and goes to High school
Row 2 – Rob has a blue backpack and goes to Junior college.
Row 3 – Rob has a purple backpack and goes to Junior college.
Row 4 – Rob has a blue backpack and goes to High school.

Obviously, we are repeating some ideas. First, although it is 3NF compliant, the blue backpack is not dependent on the School name, but rather the person’s name. Same for the school. You can spot this one easiest because items are being repeated. It would be better to create two tables and list the information once in each table:

#1 –  Name, Backpack color
#2 – Name, school

Organizing a table like that will help prevent mistakes when updating current data.

5NP

And what, you ask is 5NP? “also known as Project-join normal form (PJ/NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys. A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R.” (http://en.wikipedia.org/wiki/Fifth_normal_form)

Imagine you’re managing a school’s inventory of textbooks. You have a table with columns for book title, author, subject, edition, and available copies. This works well until a teacher orders multiple copies of the same book for different classes. Suddenly, you’re repeating data: “Calculus” by “Adams” appears twice, once for Mr. Smith’s math class and again for Ms. Jones’s.

5NF tackles this redundancy. It says each table should represent a single, independent concept. In this case, “book” and “class” are separate concepts. So, you’d split the table:

Books: title, author, subject, edition (primary key)
Classes: class name, teacher (primary key)
Book-Class: book ID (foreign key to Books), class ID (foreign key to Classes), quantity

Now, “Calculus” by “Adams” only appears once in the “Books” table. The “Book-Class” table connects it to specific classes with their quantities, eliminating redundancy and simplifying updates. This is the essence of 5NF: ensuring each table reflects a single clear concept, avoiding data duplication and improving data integrity.

Summary

This might seem like extra work, but it’s an investment in clarity and efficiency. No more duplicate entries, no more confusion about where to find the information. Updates become easier, and your data stays squeaky clean. So, the next time you’re organizing information, remember 5NF – the guardian of order and data integrity!

Join the Discussion

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>