You may have a full-time DBA where you work, but a lot of us share the CF developer and DB developer hats. Your SQL Server's performance can be an easy thing to overlook if your database is very small, or your website gets very little traffic. The easy (but not always correct) answer to most performance problems is "add an index". Some indexes are an obvious help, but how do you tell if the less obvious ones are really being used? It is possible to have too many indexes. In addition to bloating the size of your database, they take time to update which can actually slow your application DOWN on inserts and updates.
It is always difficult to determine how much of the SQL performance iceberg to scrape off in a single post; especially since there is already a host of information out there on the topic. What I would like to focus on here is being able to accurately analyze your existing indexes. You are doing a disservice to yourself if you use the "shotgun" approach and litter your database with indexes without really needing half of them.
Ok, how do I get the data?This post, like many will be pretty specific to SQL Server 2005. Mainly because 2005 is the first version to offer this data and I don't know how to get it out of other DBMS's. SQL Server 2005 has a host of information you can retrieve about the usage of your indexes. It is very important to note that this data is erased every time you restart your SQL server. What you need to do is let your application run for a good amount of time ensuring an even and typical usage of all your apps functions have been performed.
Below is some SQL I have thrown together. I think it is mostly bug-free, but I don't have a particularly large database to test it on right now. Paste it into Management Studio and run it against a database of your choice.
2 (table_name varchar(100),
3 index_name varchar(300),
4 index_type varchar(50),
5 is_primary_key bit,
6 seeks int,
7 scans int,
8 lookups int,
9 updates int,
10 records_in_index int,
11 index_size_in_MB float)
13 INSERT @Index_stats
15 t.name AS table_name,
16 i.name AS index_name,
17 i.type_desc AS index_type,
19 isnull(u.user_seeks,0) AS seeks,
20 isnull(u.user_scans,0) AS scans,
21 isnull(u.user_lookups,0) AS lookups,
22 isnull(u.user_updates,0) AS updates,
23 sum(CASE WHEN a.type = 1 THEN p.rows ELSE 0 END) AS records_in_index, -- Only count the rows once
24 (sum(a.total_pages) * 8) / 1024.00 AS index_size_in_MB -- Pages are 8 Bytes in size
25 FROM sys.indexes i
26 INNER JOIN sys.tables t ON i.object_id = t.object_id
27 INNER JOIN sys.partitions p ON i.object_id = p.object_id
28 INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
29 AND i.index_id = p.index_id
30 LEFT OUTER JOIN sys.dm_db_index_usage_stats u ON u.index_id = i.index_id
31 AND u.object_id = i.object_id
32 WHERE t.type_desc = 'USER_TABLE'
33 AND i.type_desc in ('CLUSTERED','NONCLUSTERED')
34 and t.name not like 'sys%' -- Some system tables show as user tables (?)
35 GROUP BY t.name, i.name, i.type_desc, i.is_primary_key, u.user_seeks, u.user_scans, u.user_lookups, u.user_updates, u.object_id, u.index_id
37 -- Most used indexes
38 SELECT TOP 30 PERCENT *
39 from @index_stats
40 ORDER BY seeks + scans + lookups DESC
42 -- Least used indexes
43 SELECT TOP 30 PERCENT *
44 from @index_stats
45 WHERE seeks + scans + lookups > 0
46 ORDER BY seeks + scans + lookups
48 -- COMPLETELY unused indexes
49 SELECT *
50 from @index_stats
51 WHERE seeks + scans + lookups = 0
53 -- Most updated indexes
54 SELECT TOP 30 PERCENT *
55 from @index_stats
56 ORDER BY updates DESC
58 -- Most common Bookmark Lookups
59 SELECT TOP 30 PERCENT table_name, lookups
60 from @index_stats
61 ORDER BY lookups DESC
Please feel free to play around with this to see the data you want. you can filter for certain tables etc. The SQL above is just a starting point for you.
What does the data mean?Here are the columns in use and what they mean:
- table_name - The table name that the index is on
- index_name - The name of the index
- index_type - CLUSTERED (only one per table, most commonly on primary keys) or NONCLUSTERED
- seeks - Number of times an index seek has been used on this index. Seeks are used to find a unique value and are the fastest. (Imagine thumbing through the Yellow pages directly to the R's.)
- scans - Number of times an index scan has been used on this index. Scans are commonly used to find a range of values.
- lookups - Applies only to Clustered Indexes. This is the number of times this index was used in a bookmark lookup. This happens in an un-covered query where the records were found using another index, but additional columns are required to be returned, so SQL Server turns to this index to get the remaining data using a "bookmark" pointer stored in the non clustered index.
- updates - This is the number of times this index was updated due to an insert or update on the table. Updates are a requirement, but they are a big trade off since they are sometimes costly.
- records_in_index - This is the number of records in the index. Index scans and updates slow down as this number increases.
- index_size_in_MB - This is the size of the data in the index in MBs. If database size is important to you, large indexes should be noted.
Now, let's break down each result set to see what they could mean for you.
Most used indexesThese are your favorite indexes. They are saving your database a lot of work. What you need to look at here though are index scans versus index seeks. If you are commonly selecting out a range of values, then scans are inevitable.
2 FROM sales
3 WHERE order_total between 200 AND 500
Be careful though as string manipulation and unnecessary data type conversion can turn a perfectly good seek into a scan.
2 FROM person
3 WHERE ltrim(rtrim(last_name)) = 'Smith'
Least used indexes
These are the indexes that are used, but not that much. They are potential candidates for being dropped. What process is using them? If they are used by a once-a-month report and they make the report 100 times faster they could very well be worth keeping around just for that use. Since their benefit is low, what is their cost? Are they taking up a lot of room on your disk? Even more importantly, how often are they being updated? A table which is modified regularly and selected from rarely will often leave you with index that are updated constantly, but never used.
COMPLETELY unused indexes
These are ripe for trimming. Either they aren't used at all, or they haven't been used since you last restarted your server (which clears out this usage data). Once again, see how often they are updated. If it is a lot, then nix them. Of course, the big caveat to this is if the indexes are a primary key index (which is why I included the is_primary_key column). Indexes on a primary key cannot be removed if you want to ensure the uniqueness of the data. At best, they can be dropped and re-added as non-clustered, but in general I would leave any index acting as a primary key.
Most updated indexes
These are your most costly indexes. SQL server is constantly needing to stop and update the data in these indexes. You need to examine how often they are being used and the benefit they bring when used to justify their cost. The second thing you need to do is examine their fragmentation and how often they get rebuilt. Fill factor is an important and and commonly overlooked setting when creating an index. For tables which are updated very often, you probably want a lower fill factor. This uses more pages, but leaves room for additional rows to be added without causing page splits which is where the real cost of index updates comes in. Check your database maintenance plan to see how often your indexes are being rebuilt.
Most common Bookmark Lookups
This is a tricky one. I only included the table name in this select, because the index itself is not directly to blame for a bookmark lookup. If you have tables which a large amount of these, check the other indexes on the table as well as the SQL statements that cause the lookups. You might be able to add a couple columns from the table into another non-clustered index to create a covered query. A covered select is one in which all the data needed can be retrieved from a single index and a second trip back to the table to gather the rest of the columns is not needed. This is a big reason not to use select * if you don't need it.
In conclusion, don't be alarmed and start dropping indexes left and right just because they show up in this data. This is just mean to enable you to make informed decisions and the indexes in your database.