A database index is very much like the index in a book: the book index has an alphabetized list of topics with page numbers to the location of the data.
A database index has an ordered list of values (made up of one or more table columns), with pointers to the row in which the value and its corresponding data reside.
Without indexes, any query or data modification causes the SQL engine to search the referenced tables from the top down. This is akin to searching for a piece of information in a book by reading it from page 1. A single well-placed index can shorten your query time from dozens of minutes to under a second.
There are two kinds of indexes in SQL Server: clustered and nonclustered.
A table can only have one clustered index, because the clustered index sorts the rows in the table itself.
Every table in the database should have a well-chosen clustered index to aid data retrieval and modification.
Ideally a clustered index should be:
· Small (of a small data type)
The clustered index key is the pointer contained in each clustered index. If you therefore have a clustered index key that is large – for example, a 16 bit UNIQUEIDENTIFIER – indexes will take up much more space than if the clustered index key were smaller (e.g., a 4 bit INT).
· Unique or highly selective – The more selective an index, the more efficient.
· Ever-increasing – The clustered index orders the rows in the table. If the clustered index key is ever increasing, new rows are added to the end of the table. Otherwise, new rows are inserted in the middle of the table, and the database engine must reorganize the data on disk more often.
· Static – A frequently changing clustered index key will cause rows to be reordered within the table, causing unnecessary overhead.
A nonclustered index is a separate physical structure from the underlying table. It contains the values for the included columns – called index keys – along with pointers back to the corresponding table row. On a table that has a clustered index, each nonclustered index’s pointer is the clustered index key.
Note that a nonclustered index is ordered, but it does not alter the order of the rows in the table.
There are few hard and fast rules for indexing. You have to see what works for your database over time. There are whole books dedicated to indexing strategies.
Here are a few general indexing guidelines:
· Each table should have a clustered index that is (ideally) small, selective, ever increasing, and static. (Note that a table without a clustered index is called a heap.)
· Implement nonclustered indexes on foreign key relationships – in other words, on columns that are likely to be used in JOINs.
· Implement nonclustered indexes on columns that are frequently used in WHERE clauses.
· Do not implement single-column indexes on every column in a table. This will take up space unnecessarily, and cause high overhead on INSERTs and UPDATEs.
· In multi-column indexes, list the most selective (nearest to unique) first in the column list. For example, when indexing an employee table for a query on social security number (SSN) and last name (lastName), your index declaration should be
CREATE NONCLUSTERED INDEX ix_Employee_SSN
ON dbo.Employee (SSN, lastName);
· For the most often-used queries, consider a covering nonclustered index. A covering index is one that contains all the columns requested from a table.
CREATE INDEX IX_BORK_222 ON dbo.BORK (col3, col1, col2)
Convering indexes risk being wide objects and taking up lots of space.
To understand INCLUDE, you first must understand (a little bit) the structure of an index.
An index is organized in a b-tree hierarchy – each node of data (in the case of our index IX_BORK_111, col3) has two nodes beneath it – the left node higher in the sort range, and the right node lower, like this:
…3……….7…. child nodes of 5
1…4……6…9.. leaf nodes
3 and 7 are the child nodes of 5. 1 and 4 are the child nodes of 3.
The lowest level nodes are called leaf nodes, so 1,4,6,and 9 are the leaf nodes.
The SQL engine walks through the tree to find the values it needs…
If we create our covering index like this:
CREATE INDEX IX_BORK_222 ON dbo.BORK (col3, col1, col2)
then each node in the tree will contain values for col3, col1, and col2. That can take up a lot of space.
But in our query, we don’t need to search based on col1 and col2….we only search based on col3.
The other two columns are just there to be returned in the SELECT statement.
INCLUDE lets us make the index smaller, while still supplying our SELECTed columns.
First, here’s the index declaration: CREATE INDEX IX_BORK_333 ON dbo.BORK (col3) INCLUDE (col1, col2)
Each node in the tree will contain the value for col3.
But only the leaf level nodes will hold the values for col1 and col2.
That’s how INCLUDE makes the index smaller – the rest of the tree doesn’t contain extra, unused information.
The SQL engine performs its search based on col3, finds the leaf level node(s) that match, and there at the leaf level is the rest of the information we need.
Key Ideas Review
Key ideas mentioned:
· Clustered index – small, unique (or highly selective), ever increasing, static index on a table. (only one CI can exist on a table)
· Clustered Index Key – the pointer to the row in the table.
· Indexes let queries find data faster.
· Nonclustered indexes are separate objects from tables.
· Index Key – the values contained in the nonclustered index.
· A covering index is ideal – it gives the query everything it needs, without having to touch the table itself (that’s called a bookmark lookup).
· Nonclustered indexes take up space, so you don’t necessarily want a lot of really wide covering indexes. Use with discretion.
One final point: Remember that indexes must be updated every time data is updated or inserted into the table. The more indexes, the longer your inserts and updates will take.
Indexing is a balance between supplying enough to support your read operations, and keeping insert/update overhead low.