Sunday, January 16, 2011

Natural vs. Surrogate Keys, that is the question

The question had been discussed many times in database forums, especially SQL Server forums. The latest article about the topic was published month ago on SQL-Server-Performance.com. I have pretty strong opinion, that surrogate keys are clearly better choice in almost any case. In a few cases, where you might be no wrong using natural key, you can also use surrogate key without regretting your decision in the future, so why bother even considering natural keys. That is the choice you would regret 99 times out of 100, so it is better not to spend time considering them.

After I stated my position, I am going to discuss specific points from the article mentioned I disagree with, even though I think the article is well written overall...
Which side is right?  Neither.  Both natural and surrogate keys have their own place, and a developer who doesn’t use both as the situation demands is shortchanging both himself and his applications.
...but as you can see above, the conclusion is wrong IMHO.
The value of a surrogate must never be exposed to the outside world.   Users should never be allowed to see the key, under any conditions.   Display the value of a surrogate key on a report, allow it to be viewed on a form or even used as a search term – these are all forbidden.   Once you expose a surrogate key, it immediately begins acquiring business meaning. 
I disagree. As long as you make sure the key value never change you can expose it. Usually you don't do that, but I don't think you must not. I might be wrong here, but I would like to see an example showing a disaster happening when you expose the value of surrogate key.
Note: if your data already contains meaningful product codes or other keys such as those described above, then they are simply natural keys and the above caveat doesn’t apply.     It’s a smart key only when the value is constructed by the developer.
I agree smart key is a huge mistake. However, smart keys are sometimes created by business people, not developers. Pharmaceutical company Galenika Belgrade used to to have smart codes of their products. Donna Karan New Your does the same. Why it is bad to use such codes as natural keys? Galenika in 90s had 10s of thousands maybe even over 100 thousands products and decided to change the structure of their smart key. Can you imagine disaster happened in their transaction database and all distributed databases from their partners? The consequence of using natural product primary key was having that smart product code in each single product transaction, history tables, link tables etc.
Immutability is certainly a desirable feature for a key, but it’s by no means a requirement.   Using this as an excuse is the trademark of a lazy developer.  Data changes.   If the key changes, update it.  If it’s being used as a foreign key in another table – update that table too.  Cascading updates exist for a reason, after all.
Very wrong and dangerous! The reason is mentioned in the following paragraph:
Obviously if a key changes very often, or will be used as a FK for many other tables, there can be performance or concurrency implications to making it the primary key .  In this case, you do want toconsider a surrogate, and use it if appropriate. 
The problem in Galenika was not that a key was changed very often. It was enough that its structure changed once and for all products. I don't know in how many tables that code was foreign key, however the main problem, and huge one itself, was that it was foreign key in sales transactions table. So, my recommendation is not to consider a surrogate, but to use a surrogate and not even think about "natural" key. Even if you don't have "FK for many other tables", one can be bad enough and even if it is not big deal from the beginning it can become disaster later. I  saw too many applications developed as a small tool for single user that evolved into enterprise wide application that was not designed as such.
Replacing a wide key with a narrower value means smaller indexes, and more values retrieved from every index page read.  This does boost performance.  However,  you’ll usually retain the index on the natural key (to enforce uniqueness if nothing else) and that means another index to maintain.  If your table is very narrow, the additional column for the surrogate can noticeably impact performance and storage requirements.
This is really not a problem. For tables containing transactions, usually there is no "natural" key or the "natural" primary key is multi-column key, so even according to the author, surrogate key should be used. If the  table contains reference data, its size is is insignificant, so one more index  is not an issue. However, wider index on  FK column in transaction table is an issue.

To finish, the main problem with "natural" keys is they are subject to change, usually wide, often composite and when changes happen you have to waste a lot of resources for updates all over database. Formally, "natural" keys do not break any normal form, but they break one of main reasons normal forms were introduced. Instead of changing one piece of info in one place, you have to do it on many others.

The primary key is used in RDBMS to identify row and establish relationship. If you overload the primary key with additional information (value of natural key), whenever you change a value of primary key column, you have to update all related rows, even though the intention was not to change relationship between rows in master and child table. That is the reason why the value of primary key column should be meaningless.

Sunday, January 9, 2011

Selling is a lot like fishing

You can not expect that just putting a hook in nearby water is enough to be actuall successful in fishing. You have to learn about the habbits, preferences and behaviors of the specific fish you want to catch. Your fish may not ever come to the place where your hook is, even if you cast your lure frequently. You first need to see where your fish swim while looking for a food. Off course an un-baited hook would not do the trick. You have to have the right bait the specific fish is looking for at that time. And that depends on the season, weather conditions and other environmental factors. The influence of environmental factors on fish behavior can change over the time.

When you are in the right place at the right time, you can expect moderate success. Your cat will be happy all the time and maybe even your wife who is wating for fresh fish for dinner that you promised. However, it might not be enough to brag about your success and offer dinner to your neighbours.

The  fish is not checking only in the place your hook is, so you need to attract the fish to investigate your location often and even to gather at the time you are fishing. You need to use a proper lure frequently enough at the time you are fishing. In some cases it is even possible to have your lure at the place days in advance, which would attract fish to gather regularly at "your" place.

The thicknes of the line, the size and shape of the hook, the fishing rod and the rest of the fishing equipment will greatly influence you success. Without the proper equipment, most of the fish might escape or might not even bite at all.

*

This is complete text that appears in flash animation on Argo Team web site. This is the story I told the owner of Blue Rabbit s.r.o. to persuade him to try data driven marketing campaigns in order to increase sales from their customers with discount cards.

*

Blog post from Mirko Marović in English.