Aug 16 2008
ColdFusion, Java, Object Oriented Design (OOP), Performance, SQL
I've been staring at my computer monitor trying start this paragraph for about 10 minutes now, so I'm just going to start typing. I've been thinking about system design. Organization versus performance. Design patterns versus efficiency. We adopt some code without thought to its performance because we're talking about milliseconds, and the readability, organization, and structure gains our apps enjoy are well worth it. I'm not sure that is always the case though, and we don't notice it until too late.When building something, I like to periodically step back from my work and think about what I have created. What is it doing? How straight of a line is it stringing between my two points? In one particularly large app I worked on, we modularized things to the Nth degree. Dozens of individual include files existed that displayed a small set of related data. In order to make them all self contained, they ran all the queries necessary to display their data. You could include any file on any screen and it would work. After some time, we took a break to examine some performance aspects of our app. We found some pages running many database calls that ideally would have been combined into a single trip to the database, or worse, the exact same SQL being run multiple times in different modules. From a holistic perspective all the data on a given screen could have been extracted in just a small number of efficient database calls, but given the dynamic nature of many pages, it would have been very difficult to maintain the self sufficiency and agnostic nature of each module. Our ultimate solution was to find the most used queries and employ a short lived caching mechanism to cut down on the repetitive calls in the same page. It's like we had cursed ourselves. The beauty AND downfall of our code was that the right hand didn't know what the left hand was doing. Had we created ultimate flexibility, or needless redundancy? So, let's get back around to my title, "Does OOP Encourage RBAR?" If you don't know what OOP stands for, you've been living under a rock. You may have heard of RBAR, or Row By Agonizing Row. The term describes when you tackle a problem one record at a time in a very procedural way instead of dealing with it in a set-based fashion. Let's say you need to take all open orders that are more than 6 months old and cancel them. The most procedural way you could accomplish that would be to query the entire contents of the orders table, loop over every record one at a time, and if the created data was old enough run a single update statement for each order to change the status one record at a time. The set based way to accomplish this would be a single update statement that updated the order status column with a where clause that filtered on open orders older than 6 months. But your business logic doesn't belong in the database though. No, you deal only with objects. You ask them to do the work for you. You have an order component with a getOrderDate() method and a setOrderStatus() method. Our good coding conventions have taught us to never directly access the database from all over our app, but to encapsulate all that code in a single, reusable component. Now, additional logic such as archiving off data for non-active orders and updating the last tracking point on the order can all be taken care of by the singular piece of business logic in our component-- our perfect little world of code that knows exactly how to cancel a single order. That's the problem though-- our component only handles one order at a time. Now, we need to cancel 300,000 orders and suddenly it is taking forever. We are opening and committing more transactions that we really need to. Our audit triggers are being run 300,000 times with a single record, instead of once with 300,000 records. Our transaction logs are filling up faster. We're bugging our database 300,000 times when we could just tell it something once and affect the same set of data. We could create some sort of order maintenance component which would have the capability of updating multiple orders at once, but now it must be endowed with all the logic that the order component has-- archiving, last tracking points, etc. That would be duplicated logic though. So what's the solution to this? I'm certainly not of the mind to write off OO design patterns. I just keep finding this disconnect that seems to happen between my perfect little objects on the web server, that somehow at the end of the day need to find their way back to the database; a set-based storage system that most always performs best when dealing with your data as a set-- not Row By Agonizing Row. Perhaps you could have a DAO of sorts that would accept multiple instances of an object and find the most efficient way to store them all at once. Maybe most people don't care if they make 300,000 separate updates, instead of one. Maybe ORM design patterns help solve some of this. (I'm not too familiar with all that yet though.) Open my eyes, please. Tell me how you deal with this.