Thursday, May 24, 2007

DataSets

DataSets are an amazing construct. They allow you to pass back an entire result set, regardless of how many tables, to the caller and have them work with the data in the fashion that they want. Relationships can be established between tables, queries can be run against the dataset and updates can be made to the table.

You remember the saying "If something is too good to be true, it is"? Guess what? It's true with DataSets as well.

First of all, let me confess up front that I am not now, nor have I ever been, a very big fan of datasets, but probably not for the reasons you're thinking. I trust you will keep this in mind as you read.

DataSets provide a wealth of functionality to the developer, but it does come at a significant cost as well. DataSets hide much of the complexity associated with databases, particularly in the area of updating and populating fields on a web page or a report. While this hiding is, in some respects, quite welcome, it lulls the developer into a false sense of complacency and prevents the developer from truly understanding what is happening. I cannot tell you how many times over the past few years I have seem developers pass around hundreds of megabytes of data in a single dataset because they could. DataSets make it easier to be blissfully unaware of the consequences of doing something because of the fact that they hide things so well.


When I was younger I learned IBM 360 Assembler, both at the U of A and at NAIT. This low level language made me very conscious of the amount of data I was using and the most effective ways of manipulating it. Even earlier versions of Visual Basic (1.0 through 5.0) were fairly good at making you aware of what you were doing and the impact. As the complexity of the lower levels of programming has been covered up and hidden by successive updates to the languages and frameworks that support them, the ability of developers to understand the impact of what they are doing has been lowered. Today we have cases where hundreds of megabytes or even gigabytes of data are routinely moved from process to process because it is so simple to do so. The underlying impact, however, can bring a server to it's knees.


While I'm not advocating that everyone learns IBM 360 Assembler, I am advocating that developers fully understand the objects they are using and the impact of using those objects. If you aren't sure of the impact of what you are doing, experiment a little more, read a little more, learn a little more. The more you understand what you are doing with the languages you are using, the more productive you will be.

No comments: