Getting the right indexes without guessing

Even the most green DBAs know that indexes are a good thing. A database won’t run well without them. But it is important to remember that indexes aren’t free. They have to be created, stored, backed up, consistency checked, updated, defragmented, and have their statistics updated. That’s a lot of work! This extra work associated with indexes is sometimes called the “write penalty”. It is why having too many indexes can be just as bad as not having enough.

The goal of the DBA should be to have as many high-value indexes — indexes that will be used often by the workload — as possible while limiting indexes that won’t often be used.

How can a DBA be certain that they have the right indexes? This post will cover one method of reviewing indexes at a very macro level for an entire SQL Server database in an effort to find the best values for the workload.

SQL Server provides 2 groups of DMVs (DMOs if you ask some people…Allen) to help with indexing decisions. One group reports indexes that don’t exist which the query optimizer thinks it would use if they did exist. The second object reports the number of times an index that does exist has been used as well as how many times it has been updated — a big portion of the write penalty.

Important note: The counters covered in this post all reset to zero upon every SQL Server service restart and start counting up from there. If the database being analyzed has only been up for a few days the information returned is probably not a complete view of the workload. Personally, I only like to consider these values if the server has been up for a month or more.

Missing Indexes

This group of DMVs records every scan and large key lookups. When the optimizer declares that there isn’t an index to support a query request it generally performs a scan. When this happens a row is created in the missing index DMV showing the table and columns that were scanned. If that exact same index is requested a second time, by the same query or another similar query, then the counters are increased by 1. That value will continue to grow if the workload continues to call for the index that doesn’t exist. It also records the cost of the query with the table scan and a suspected percentage improvement if only that missing index had existed. The below query calculated those values together to determine a value number.

--Only the top 10 because we are only interested in the highest value indexes
SELECT TOP 10
--An estimate of the overall value of this index
CAST(user_seeks * avg_total_user_cost * (avg_user_impact * 0.01) AS INT) AS [index_advantage]
--The last time another seek was added to this row.
, migs.last_user_seek
, mid.[statement] AS [Database.Schema.Table]
--Columns that were part of an exact comparison
, mid.equality_columns
--Columns that were part on any other comparison including <, >, <>, or BETWEEN
, mid.inequality_columns
--Columns return, but not part of a comparison
, mid.included_columns
--Number of times this index would have been used if it had existed
, migs.user_seeks
--Cost of query without the index
, migs.avg_total_user_cost
--Estimated improvment of previous column cost if this index is created
, migs.avg_user_impact
, 'CREATE INDEX IX_' + LEFT(RIGHT(mid.STATEMENT, CHARINDEX('[',REVERSE(mid.[statement]))-1), LEN(RIGHT(mid.STATEMENT, CHARINDEX('[',REVERSE(mid.[statement]))-1))-1) + '_' +
CASE WHEN mid.inequality_columns IS NULL THEN REPLACE(REPLACE(REPLACE(mid.equality_columns, ', ', '_'), '[',''), ']','')
WHEN mid.equality_columns IS NULL THEN REPLACE(REPLACE(REPLACE(mid.inequality_columns, ', ', '_'), '[',''), ']','')
ELSE REPLACE(REPLACE(REPLACE(mid.equality_columns, ', ', '_'), '[',''), ']','') + '_' + REPLACE(REPLACE(REPLACE(mid.inequality_columns, ', ', '_'), '[',''), ']','') END +
' ON ' + mid.STATEMENT + ' (' +
CASE WHEN mid.inequality_columns IS NULL THEN mid.equality_columns
WHEN mid.equality_columns IS NULL THEN mid.inequality_columns
ELSE mid.equality_columns + ', ' + mid.inequality_columns END + ')' +
CASE WHEN ISNULL(mid.included_columns, '') <> '' THEN ' INCLUDE (' + mid.included_columns + ')' ELSE '' END AS CREATE_STATEMENT
FROM
sys.dm_db_missing_index_group_stats AS migs
INNER JOIN
sys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle
INNER JOIN
sys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle
WHERE
mid.database_id = DB_ID()
ORDER BY
index_advantage DESC;

Important considerations:

This DMV returns EVERY table scan since the service started. It is very important to remember that, as a DBA, the goal is to only add high-value indexes. When this DMV asks for 500 indexes it is wildly unlikely that many of those are high-value. Only consider the highest value missing indexes.

This DMV considers every unique index request separately. It will ask for an index on the orders table for the customer column. Then it will ask for the customer and the date. 2 requests. 1 need. Be sure to merge the results with each other and again with existing indexes.

The results of index changes will be felt immediately. If the changes implemented are good they’ll immediately leave the missing indexes DMV query.

UPDATE: Read about an update to this process in another post.

Low-Value Indexes

This DMV contains a row for every existing index in the database. It records every update to those indexes. Whenever a row is added to the parent table, a row (probably) needs to be added to each non-clustered index. The opposite happens on a delete. Whenever a row is updated all indexes that include any of the columns of the update statement will also need to be updated. All of these actions are tracked in the index usage DMV. They are all part of the write penalty which can now be measured.

The same process that the optimizer uses to track missing indexes when a suitable index is NOT found will increase the usage counters of THIS DMV when a suitable index IS found. A DBA can compare these index use statistics (the reward of the index) with the writes (the cost) to determine the overall value. Maybe there are some indexes that are so low in value that they can simply be dropped.

SELECT TOP 10 
 OBJECT_NAME(s.[object_id]) AS [Table Name]
, i.name AS [Index Name]
--This is the number of times the index has been updated due to inserts, updates, or deletes of the parent table.
, user_updates AS [Total Writes]
--This is the total number of times it has been used.
, user_seeks + user_scans + user_lookups AS [Total Reads]
, user_updates - (user_seeks + user_scans + user_lookups) AS [Difference]
FROM 
 sys.dm_db_index_usage_stats AS s
 INNER JOIN 
 sys.indexes AS i ON s.[object_id] = i.[object_id] AND i.index_id = s.index_id
WHERE 
 OBJECTPROPERTY(s.[object_id],'IsUserTable') = 1
 AND 
 s.database_id = DB_ID()
 AND 
 i.is_primary_key = 0
ORDER BY 
 [Difference] DESC
, [Total Writes] DESC
, [Total Reads] ASC;

Important considerations:

Be careful not to remove clustered indexes even if they aren’t being used. It will cause a negative ripple effect to other indexes in the table.

Indexes that are supporting keys, foreign or primary, cannot be removed.

Indexes with many updates and zero seeks or scans are great candidates for removal.

Beware index hints! While that is generally considered a poor coding practice it does happen. If there are index hints then removing the hinted index will cause a run time error every subsequent execution to the index drop. Confirm that the target database does not have index hints before dropping any indexes.

If an index appears a drop candidate because usage patterns have changed there is sometimes a concern that the index may become needed again in the future if the workload reverts back. If that happens, the index is likely to reappear in the missing index DMVs as a candidate for creation.

Final thoughts

This should not be a one time operation. Usage patterns change over time. I like to revisit these DMVs 2 to 4 times per year to make sure that the highest value indexes are consistently part of the database.

6 Comments

Using DMVs To Plan Out Your Indexes – Curated SQL

7 years ago

Hi, Thanks for the interesting post.

One important point that I try always to emphasize is that the index hints are getting generated based on the query execution plan that SQL server decided to use. But what if it is about a bad execution plan? Some times, it is better to use a different execution plan to avoid the need on yet another index on that problem table. The other point is the fact that optimizer do not take the other queries into account and each hint is only focused to optimize the given query where the hint is generated based on.

MGhazvinizadeh

7 years ago


1. With respect to an index hint and this blog, Eric’s point was that when it is used and when its index is dropped, a run-time error will be raised by any query that references the hint. As a corollary, when an index hint is used, Eric’s missing index query will report the index as having been used, even when the index is no longer useful. In other words, if index hints are used, I think it is important to periodically check the usefulness of the index, unless the distribution and count of its indexed values do not change, over time.
  
  Not quite on topic, but if an index hint is being used due to the optimizer being ‘unable’ to choose a “good” plan, I would consider why a “good” plan was not chosen. A first step is to inspect the output from the undocumented query optimization trace flags (Paul White blogged about them), and consider why the optimizer made the choices it did. One secondary step is to consider whether the data being joined has an implicit correlation (i.e. a normalization design problem) and consider whether a more normalized design would avoid a need for an index hint. Another secondary step is to consider whether parameter sniffing is relevant, and there are may blogs about that concern (with some relief being offered by Microsoft in SQL Server in 2017)
  
  Bill
  
  7 years ago
  
  
HI,

Iam gettimng this eror
Msg 4145, Level 15, State 1, Line 28
An expression of non-boolean type specified in a context where a condition is expected, near ‘;’.

Jeroen

7 years ago


1. Author
  
  It looks like WordPress changed some <> into &lt &gt
  
  I’ll see if I can get that to come up correctly. Thanks for the heads up.
  
  EDIT: I have corrected the script to include the correct characters.
  
  Eric Blinn
  
  7 years ago
  
  
Needed to send you one bit of word in order to give many thanks once again for the pleasant basics you’ve shown on this website. This is so particularly generous of people like you in giving freely what a lot of folks would’ve supplied for an e-book to get some dough for their own end, specifically considering the fact that you might well have done it in case you desired. Those pointers as well acted to be the fantastic way to recognize that other people online have a similar passion similar to my own to know the truth very much more on the subject of this issue. I know there are a lot more enjoyable times up front for many who browse through your site.

2018goup

6 years ago

Just another SQL Blog by just another DBA

Getting the right indexes without guessing

Missing Indexes

Low-Value Indexes

Final thoughts

6 Comments

Leave a Reply Cancel reply