Indexes
A
database index
is very much like the index in a book: the book index has an alphabetized list
of topics with page numbers to the location of the data.
A
database index has an ordered list of values (made up of one or more table
columns), with pointers to the row in which the value and its corresponding
data reside.
Without
indexes, any query or data modification causes the SQL engine to search the
referenced tables from the top down. This is akin to searching for a piece of
information in a book by reading it from page 1. A single well-placed index can
shorten your query time from dozens of minutes to under a second.
There are two kinds of indexes in SQL Server: clustered and nonclustered.
Clustered
A
table can only have one clustered
index, because the clustered index sorts the rows in the table itself.
Every
table in the database should have a well-chosen clustered index to
aid data retrieval and modification.
Ideally a clustered
index should be:
·
Small (of a small data type)
The clustered index key is the pointer contained in each
clustered index. If you therefore have a clustered index key that is large –
for example, a 16 bit UNIQUEIDENTIFIER – indexes will take up much more space
than if the clustered index key were smaller (e.g., a 4 bit INT).
·
Unique or highly selective – The more selective
an index, the more efficient.
·
Ever-increasing – The clustered index orders the
rows in the table. If the clustered index key is ever increasing, new rows are
added to the end of the table. Otherwise, new rows are inserted in the middle
of the table, and the database engine must reorganize the data on disk more
often.
·
Static – A frequently changing clustered index
key will cause rows to be reordered within the table, causing unnecessary
overhead.
Nonclustered
A nonclustered
index is a separate physical structure from the underlying table. It contains
the values for the included columns – called index keys – along with pointers back to the corresponding table
row. On a table that has a clustered index, each nonclustered index’s pointer
is the clustered index key.
Note that a nonclustered
index is ordered, but it does not alter the order of the rows in the table.
There are few hard and fast rules for indexing. You
have to see what works for your database over time. There are whole books
dedicated to indexing strategies.
Here are a few general indexing guidelines:
·
Each table should have a clustered index that is
(ideally) small, selective, ever increasing, and static. (Note that a table
without a clustered index is called a heap.)
·
Implement nonclustered indexes on foreign key
relationships – in other words, on columns that are likely to be used in JOINs.
·
Implement nonclustered indexes on columns that
are frequently used in WHERE clauses.
·
Do not implement single-column indexes on every
column in a table. This will take up space unnecessarily, and cause high
overhead on INSERTs and UPDATEs.
·
In multi-column indexes, list the most selective
(nearest to unique) first in the column list. For example, when indexing an
employee table for a query on social security number (SSN) and last name
(lastName), your index declaration should be
CREATE
NONCLUSTERED INDEX ix_Employee_SSN
ON dbo.Employee
(SSN, lastName);
·
For the most often-used queries, consider a
covering nonclustered index. A covering index is one that contains all the columns
requested from a table.
CREATE INDEX IX_BORK_222 ON dbo.BORK (col3, col1, col2)
Convering
indexes risk being wide objects and taking up lots of space.
Include
To understand INCLUDE, you first must understand
(a little bit) the structure of an index.
An index is organized in a b-tree hierarchy – each
node of data (in the case of our index IX_BORK_111, col3) has two nodes beneath
it – the left node higher in the sort range, and the right node lower, like
this:
……….5……..
…3……….7…. child nodes of 5
1…4……6…9.. leaf nodes
3 and 7 are the child nodes of 5. 1 and 4 are the child nodes of 3.
The lowest
level nodes are called leaf nodes, so 1,4,6,and 9 are the leaf nodes.
The SQL engine walks through the tree to find the
values it needs…
If we create our covering index like this:
CREATE
INDEX IX_BORK_222 ON dbo.BORK (col3, col1, col2)
then each node in the tree will contain
values for col3, col1, and col2. That
can take up a lot of space.
But in our query,
we don’t need to search based on col1 and col2….we only search based on
col3.
The other two columns are just
there to be returned in the SELECT statement.
About INCLUDE
INCLUDE lets us make the index smaller, while still
supplying our SELECTed columns.
First, here’s the index declaration: CREATE INDEX
IX_BORK_333 ON dbo.BORK (col3) INCLUDE (col1, col2)
Each node in the tree will contain the value for
col3.
But only the leaf level nodes will hold the values for col1 and
col2.
That’s how INCLUDE makes the index
smaller – the rest of the tree doesn’t contain extra, unused information.
The SQL engine performs its search based on
col3, finds the leaf level node(s) that match, and there at the leaf level is
the rest of the information we need.
Key Ideas Review
Key ideas mentioned:
·
Clustered index – small, unique (or highly
selective), ever increasing, static index on a table. (only one CI can exist on
a table)
·
Clustered Index Key – the pointer to the row in
the table.
·
Indexes let queries find data faster.
·
Nonclustered indexes are separate objects from
tables.
·
Index Key – the values contained in the
nonclustered index.
·
A covering index is ideal – it gives the query
everything it needs, without having to touch the table itself (that’s called a
bookmark lookup).
·
Nonclustered indexes take up space, so you don’t
necessarily want a lot of really wide covering indexes. Use with discretion.
One final point: Remember that indexes must be
updated every time data is updated or inserted into the table. The more
indexes, the longer your inserts and updates will take.
Indexing is a balance between supplying enough to
support your read operations, and keeping insert/update overhead low.