Sorry I've been quiet for the past few days. My Flex adventures took a quick detour through the massive land of our AS400 and DB2. Now that I had a prototype of my pretty line, bar, and pie charts I needed some real data. My job was to write a process to fetch our sales data from the AS400 server here at work nightly and populate some SQL Server tables with it. Easy Peasy, I thought. I didn't know what I was in for.
The first hurdle out of the gate was learning the structure of the files I was dealing with. (Apparently "tables" are called "files" in AS400 speak) I've been quite spoiled with extremely long and (mostly) descriptive table names in the SQL Server world. On the '400 however, file names are limited to only 10 characters. This leads to VERY creative abbreviations which make perfect sense to all the guys who have worked here for 10-20 years. :)
Once I had the data figured out, I was on my way again. We have a linked server set up in MS SQL 2000 which connects to the database on the '400 as a standard ODBC data source. My queries were running a little less than fast. In fact, they were dog slow. It would take anywhere from 20 to 30 minutes to return about 1000 records from 8 tables!
I started digging through my execution plans and was horrified to find that SQL Server was bringing over the ENTIRE contents of each table and performing all the joins and where clauses locally. That was over 9 Million records!
To prove it, I tested a simple select like:
Sure enough, the entire contents of the table (millions) would get transferred over the network (taking 5-10 minutes), and THEN the where clause would get applied and the handful of records I wanted where returned.
This tech note from Microsoft discusses the many reasons why SQL Server may not trust a remote data source to do its own processing. This ranges from data types in use, having a top clause, and compatible collations.
My solution was simple and effective. It was the openquery() function that SQL Server supports. It "executes the specified pass-through query on the specified linked server." That basically means I can specify the AS400 linked server and directly pass it a string of SQL for it to process and hand back the results. Now, this SQL string must be SQL in the native dialect of your linked server. For instance, I have to pass a DB2 formatted query. Furthermore, the openquery() function has a really annoying "feature" of NOT accepting variables for an input. I'm not sure what Microsoft programmer fell asleep at his desk the day he was supposed to program that part. Regardless, this is what the code looks like now:
2 FROM OPENQUERY(as400, '
3 SELECT *
4 FROM table
5 WHERE column = ''value''
Biggest downside-- there is no code formatting to the SQL being passed in but hey, it's just a string that doesn't mean any thing to SQL Server, right? Biggest benefit-- it runs about 300x faster! This is because the remote server is doing all the joining, indexes and filtering and only passing back what matters to SQL Server.