Even the most green DBAs know that indexes are a good thing. A database won’t run well without them. But it is important to remember that indexes aren’t free. They have to be created, stored, backed up, consistency checked, updated, defragmented, and have their statistics updated. That’s a lot of work! This extra work associated with indexes is sometimes called the “write penalty”. It is why having too many indexes can be just as bad as not having enough.
The goal of the DBA should be to have as many high-value indexes — indexes that will be used often by the workload — as possible while limiting indexes that won’t often be used.
How can a DBA be certain that they have the right indexes? This post will cover one method of reviewing indexes at a very macro level for an entire SQL Server database in an effort to find the best values for the workload.
SQL Server provides 2 groups of DMVs (DMOs if you ask some people…Allen) to help with indexing decisions. One group reports indexes that don’t exist which the query optimizer thinks it would use if they did exist. The second object reports the number of times an index that does exist has been used as well as how many times it has been updated — a big portion of the write penalty.
Important note: The counters covered in this post all reset to zero upon every SQL Server service restart and start counting up from there. If the database being analyzed has only been up for a few days the information returned is probably not a complete view of the workload. Personally, I only like to consider these values if the server has been up for a month or more.
This group of DMVs records every scan and large key lookups. When the optimizer declares that there isn’t an index to support a query request it generally performs a scan. When this happens a row is created in the missing index DMV showing the table and columns that were scanned. If that exact same index is requested a second time, by the same query or another similar query, then the counters are increased by 1. That value will continue to grow if the workload continues to call for the index that doesn’t exist. It also records the cost of the query with the table scan and a suspected percentage improvement if only that missing index had existed. The below query calculated those values together to determine a value number.
--Only the top 10 because we are only interested in the highest value indexes SELECT TOP 10 --An estimate of the overall value of this index CAST(user_seeks * avg_total_user_cost * (avg_user_impact * 0.01) AS INT) AS [index_advantage] --The last time another seek was added to this row. , migs.last_user_seek , mid.[statement] AS [Database.Schema.Table] --Columns that were part of an exact comparison , mid.equality_columns --Columns that were part on any other comparison including <, >, <>, or BETWEEN , mid.inequality_columns --Columns return, but not part of a comparison , mid.included_columns --Number of times this index would have been used if it had existed , migs.user_seeks --Cost of query without the index , migs.avg_total_user_cost --Estimated improvment of previous column cost if this index is created , migs.avg_user_impact , 'CREATE INDEX IX_' + LEFT(RIGHT(mid.STATEMENT, CHARINDEX('[',REVERSE(mid.[statement]))-1), LEN(RIGHT(mid.STATEMENT, CHARINDEX('[',REVERSE(mid.[statement]))-1))-1) + '_' + CASE WHEN mid.inequality_columns IS NULL THEN REPLACE(REPLACE(REPLACE(mid.equality_columns, ', ', '_'), '[',''), ']','') WHEN mid.equality_columns IS NULL THEN REPLACE(REPLACE(REPLACE(mid.inequality_columns, ', ', '_'), '[',''), ']','') ELSE REPLACE(REPLACE(REPLACE(mid.equality_columns, ', ', '_'), '[',''), ']','') + '_' + REPLACE(REPLACE(REPLACE(mid.inequality_columns, ', ', '_'), '[',''), ']','') END + ' ON ' + mid.STATEMENT + ' (' + CASE WHEN mid.inequality_columns IS NULL THEN mid.equality_columns WHEN mid.equality_columns IS NULL THEN mid.inequality_columns ELSE mid.equality_columns + ', ' + mid.inequality_columns END + ')' + CASE WHEN ISNULL(mid.included_columns, '') <> '' THEN ' INCLUDE (' + mid.included_columns + ')' ELSE '' END AS CREATE_STATEMENT FROM sys.dm_db_missing_index_group_stats AS migs INNER JOIN sys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle WHERE mid.database_id = DB_ID() ORDER BY index_advantage DESC;
This DMV returns EVERY table scan since the service started. It is very important to remember that, as a DBA, the goal is to only add high-value indexes. When this DMV asks for 500 indexes it is wildly unlikely that many of those are high-value. Only consider the highest value missing indexes.
This DMV considers every unique index request separately. It will ask for an index on the orders table for the customer column. Then it will ask for the customer and the date. 2 requests. 1 need. Be sure to merge the results with each other and again with existing indexes.
The results of index changes will be felt immediately. If the changes implemented are good they’ll immediately leave the missing indexes DMV query.
This DMV contains a row for every existing index in the database. It records every update to those indexes. Whenever a row is added to the parent table, a row (probably) needs to be added to each non-clustered index. The opposite happens on a delete. Whenever a row is updated all indexes that include any of the columns of the update statement will also need to be updated. All of these actions are tracked in the index usage DMV. They are all part of the write penalty which can now be measured.
The same process that the optimizer uses to track missing indexes when a suitable index is NOT found will increase the usage counters of THIS DMV when a suitable index IS found. A DBA can compare these index use statistics (the reward of the index) with the writes (the cost) to determine the overall value. Maybe there are some indexes that are so low in value that they can simply be dropped.
SELECT TOP 10 OBJECT_NAME(s.[object_id]) AS [Table Name] , i.name AS [Index Name] --This is the number of times the index has been updated due to inserts, updates, or deletes of the parent table. , user_updates AS [Total Writes] --This is the total number of times it has been used. , user_seeks + user_scans + user_lookups AS [Total Reads] , user_updates - (user_seeks + user_scans + user_lookups) AS [Difference] FROM sys.dm_db_index_usage_stats AS s INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id] AND i.index_id = s.index_id WHERE OBJECTPROPERTY(s.[object_id],'IsUserTable') = 1 AND s.database_id = DB_ID() AND i.is_primary_key = 0 ORDER BY [Difference] DESC , [Total Writes] DESC , [Total Reads] ASC;
Be careful not to remove clustered indexes even if they aren’t being used. It will cause a negative ripple effect to other indexes in the table.
Indexes that are supporting keys, foreign or primary, cannot be removed.
Indexes with many updates and zero seeks or scans are great candidates for removal.
Beware index hints! While that is generally considered a poor coding practice it does happen. If there are index hints then removing the hinted index will cause a run time error every subsequent execution to the index drop. Confirm that the target database does not have index hints before dropping any indexes.
If an index appears a drop candidate because usage patterns have changed there is sometimes a concern that the index may become needed again in the future if the workload reverts back. If that happens, the index is likely to reappear in the missing index DMVs as a candidate for creation.
This should not be a one time operation. Usage patterns change over time. I like to revisit these DMVs 2 to 4 times per year to make sure that the highest value indexes are consistently part of the database.