|
Okay. There are books on the subject, but there are basically two main benefits to normalizing a database design: (1) eliminating redundant data and (2) eliminating inconsistency. I would also add flexibility as I mentioned in an earlier post.
(1) Redundant data: This basically means that information is repeated unnecessarily. Let's say you have a list of products. You're smart, so each product has a product_id and a product_description. In your design you have a SALES table which holds among other things each product_id along with it's product_description. Let's also say that you have a total of 35 different products at the moment. If you were to make a separate PRODUCT table, you would only need to store the product_id in the SALES table. Repeating the description over and over again for the 35 unique products is unnecessary.
(2) Eliminating inconsistency: This is closely related to having redundant data. It really refers to how you maintain your system. If we go back to product_description, there is a good chance that it is stored in other tables...say INVENTORY for example. If the description for product_id 23001 changes, you have to update every table that includes the product_id and product_description combination. If you update some, but not others, your data becomes inconsistent. Over time, it may not be clear which product_description is the correct one.
So, having data in a single master table (in this case, PRODUCT) requires less storage over all and makes maintenance a lot easier. Changing the product_description for a product_id 23001 means updating a single row.
An obvious consequence of all this is that to display product_description in the "normalized" version requires joining to the PRODUCT table. All other things being equal, a two table query will always be slower than a comparable one table query. Slower does not have to mean slow, however. Indexing both tables on product_id will minimize the amount of extra work the database has to do to retrieve the product_description. Normalization alone is not enough. You also need to take advantage of the performance enhancing features of the particular database environment.
There are different degrees of normalization. So called third normal form is a very popular stopping point for a lot of database designers. You'll also see BCNF (Boyce-Codd Normal Form) mentioned which falls between third normal and fourth normal forms. Search for any of these terms in google for a more complete description. There are tons of articles, books, etc. on the subject. If the author starts to talk in mathematical terms about relational databases, skip it and find another article. You want plain english. You can always come back to the other articles if you're at all interested after you actually understand what's going on.
|