Thursday, July 10, 2008

GUIDS and Clustered Indexes

GUIDS should never ever be used as a clustered index.

Wrong.  GUIDS can be used as a clustered index if they are not an MBUN: meaningless but unique number  I've talked about this before, but it seems to be difficult for people to understand so I'm going to take this slowly and give some examples.  For these examples, let's assume that you are running an online bookstore.

Naturally you want to keep track of orders and you have a table in SQL Server that does just this.  For each order you assign a GUID.  Should you cluster on this GUID?  Probably not and the reason is that the GUID, in this table, is meaningless.  It has no context with any other data and as a result it is truly an MBUN.  Indeed, you would probably want to sort on the customer number instead as you are more likely to retrieve all of a customers orders than you are sequential GUIDs.

Hmmmm.  We're keeping track of orders, but not the items on the orders.  Let's have an OrderItem table.  In order to connect the rows in the OrderItem table with the Order table we would probably link them via the GUID that you assigned to the Order itself.  To show the individual items within the order we would need to create another column, let's call it an ItemNumber.  This could be a GUID, but it makes more sense just to make it an integer.  For this table the clustered index should be the Order GUID and the ItemNumber.

Why?  Well,. when we create a clustered index we are telling the database that we want records that appear next to each other in the index to actually be stored next to each on the disk.  In the first example, using the GUID made no sense as there was no reason for the records to appear next to each other.  In the second example it makes perfect sense to use the GUID because we want all of the items for an order to appear next to each other on the disk.  This speeds up the retrieval of data and makes more effective use of memory.  In the second case the GUID is no longer an MBUN as it actually has meaning: it is the order number that we are tracking.

So you see, if the GUID has meaning it can be part of the clustered index.  If the GUID has no meaning, don't put it in the clustered index.  Any questions?

1 comment:

Unknown said...

Finally, someone who thinks like I do. I am tired of the "blanket statements" that Clustered indexes with GUIds is "always wrong". I agree there are times where it provides value, like your example.

Another example, in an "established" application (meaning many years of development) where the original GUId (in this case a MBUN as you define it) which is used as in a foreign key, I believe the parent table containing the GUId should be retained as the Clustered Index. This is because MOST of the original code already joins on the GUID and can read any of the columns in one read (instead of the two required by a non-clustered index).

In this case, it will require considerable work to either define a new field to cluster the index on and update the many, unknown and untouched lines of code (increasing the risk of human error), or to determine which non-clustered indexes now need a list of included fields to "cover" the index.

As always, the answer is "it depends."