Saturday, May 28, 2005

Choosing between an ascending or descending index

I was recently asked whether making an index ascending or descending would improve query performance in SQL Server. In almost all cases the direction would make absolutely no difference to the speed of retrieval. Performance of queries is usually best measured by examining the number of logical page reads required to find the data.

Searching a table for a specific row
Any search across a single non-clustered / clustered index to locate a specific row would always result in a similar number of pages read. In the non-clustered case the time to retrieve the row would be navigating the index pages through the binary trees root index page and then through the intermediate level index pages to find the leaf level index page which matches. Then if no clustered index existed on the table the index page would point to the data page which would then be read to obtain the data.

If a clustered index existed on the table the leaf level non clustered index page would point to the clustered index key. The clustered index key would then be examined in a similar way to the non clustered index to find the leaf level index page. However for a clustered index the data exists on the leaf level index page so no further page reads would occur.

Searching for a range of rows
Even cases when searches are performed for a range of rows e.g. ID BETWEEN 100 AND 105 direction of the index would make no difference. A single column index can be scanned well in both directions because the index pages at a given level are linked by next and prior pointers pointing to the next and previous index pages at that level. The query optimiser may decide to search for the data either by starting at ID 105 and moving backwards through the index pages, or from 100 and working forwards.

When does direction make a difference?
The direction you specify when creating an index is used to determine the order of the leaf level index pages. E.g. names Adamson, Bamber, Doherty, Roche would appear exactly in that order in an ascending index, and in a descending index they would appear as Roche, Doherty, Bamber and Adamson.

The only case I can think of when the direction would have the most impact is a query where a composite index was used and a query was submitted which
Covers all columns used in the index. This means all fields specified in the SELECT statement can be found in the index page.

  1. Covers all columns used in the index. This means all fields specified in the SELECT statement can be found in the index page.
  2. Specifies a different direction in the ORDER BY clause in the SELECT statement of the query to the direction the index was created in.

Consider a query

SELECT SupplierNo, CreatedDate
FROM Invoices
ORDER BY SupplierNo, CreatedDate DESC

WHERE CreatedDate BETWEEN ‘1/5/05’ AND ‘10/5/05’

If either two single non clustered indexes or a single composite index on SupplierNo ASC, CreatedDate ASC was created a Sort operator would still be used in the execution plan of the query. If we can remove the sort operator the query time should be reduced.

If you created a single composite non clustered index on SupplierNo ASC, CreatedDate DESC the index pages could be navigated without the sort operator being used and only an index scan performed. As the index is covered because all the columns requested existed on the index pages then no data pages would have to be navigated from the leaf level index pages.

If you decided to change the query and add a column which didn’t exist on the index such as a reference number, the data page would have to be located for each matching row on the index page. Depending on the total number of index pages and data page scans required to get all the data, the query optimiser may even decide not to use an index and simply perform a table scan.

This example cites the usefulness of the query execution plan to show you how slow a specific query is. This is the only effective method to diagnose potentially slow running queries and improve the performance in a methodical manner. You can either show an execution plan for a query directly in query analyser or capture the event from a SQL profiler trace in a production environment. Simply executing the query in the query analyser window and trying to time it manually will not yield good results unless the speed differences before and after are that noticeable.

No comments: