It is a goal of databases to run queries efficiently.
But to do so so work is often required, first from the database designer, sometimes also from the user
who can aim to write queries in more optimal ways.
From the designer's point of view it often is necessary to tune databases in a particular way
to assist the database engine in constructing optimal execution plans for a particular query.
The main technique here is the definition of appropriate table indexes.
We describe this very important concept in some more detail on this page.
Note that this is a field that requires quite some more
understanding of database functionality than most first time users will not possess.
Whenever you have questions about badly performing queries, please mail us
the SQL at we will try to improve it if possible.
An index can be seen as a kind of table that contains a subset of the columns of the main table,
ordered in a particular way. Each row in such an index points back to the row in the main table from which it was
extracted. If a query contains a constraint on a subset of columns that is contained in such an index,
the query engine may first find the rows obeying the constraint in the index and then retrieve the required data form
the main table. As the index is ordered one can use fast algorithms to find the requested data, often in "logarithmic time".
This is to be compared to a linear scan over a table of O(1 billion) rows, with total size ~400GB which may take minutes.
The user does not need to use the indexes explicitly in her queries.
The database "knows" of the definition of the indexes and their relation to the table and
will use this knowledge in the compilation of a user query into an execution plan.
It is however often useful to know which indexes exist on a table to write a query
in such a way that indexes can be used. In our databases a typical example is that users should
use the "snapnum" column rather
then the "redshift" column that are often both available in the tables.
Whenever we created indexes that allow more efficient querying for objects at a certain snapshot,
we used the integer snapnum column, rather than the floating-point redshift.
(See the discussion on
this page for some
more details on the relation between the snapnum and redshift columns.)
Indexes are often added to tables while it is up an running, hence a static documentation of them is soon out of date.
On the main web page though there is a button labelled "Show Indexes" that can help one find all indexes for a given table.
Clicking that button will produce the following query in the query window:
use [database-name]
select t.name as table_name
, ind.name as index_name
, col.name as column_name
, ic.index_column_id as column_rank
from sys.indexes ind
, sys.index_columns ic
, sys.columns col
, sys.tables t
where ind.object_id = ic.object_id
and ind.index_id = ic.index_id
and ind.is_primary_key = 0
and ind.is_unique = 0
and ind.is_unique_constraint = 0
and t.is_ms_shipped = 0
and ic.object_id = col.object_id
and ic.column_id = col.column_id
and ind.object_id = t.object_id
order by t.name, ind.name, ic.index_column_id
After substituting the name of a particular database for the [database-name] and clicking
"Query (browser)" button a result will appear like the following example below which shows some indexes
defined on tables in the Guo2010a database.
The table shows first the table on which an index has been defined, then the name of the index.
Then the columns in the index shown in the order in which they have been defined in the index,
sepcified by the column_rank.
This is important. The index is ordered by columns in the order of their definition.
For example the index ix_guo2010a_mr_snapnum_sampling defined on Guo2012a..MR contains columns
(snapnum, fofID, centralMvir, type, stellarmass,...) in this order, which implies that it is ordered first by snapnum, than fofID, then centralMvir and so on.
This helps when a query has a WHERE_cluase that include constraints on on snapnum and fofId.
Now a binary search (for example, DB will use more optimal algorithms) can very quickly find the range of rows
that fulfills the constraint.
But when the query only constrains stellarMass the whole index will still have to scanned before the apporpriate rows are
found. (Though note that the reduced width of the index compared will lead to some improvement compared to scanning the whole table).
It is further important to realize that when a query requests only columns that all appear in the index,
this can lead to a further performance improvement. In this case once the DB engine has found the rows obeying the constraints,
all the required data can be read off from the rows in the index itself.
I.e. it is not necessary to do a lookup in the main table itself.
This so called "bookmark lookup" (please ask Google for more details) can kill the performance improvement
the use of an index, especially if large amounts of rows obey the constraints.
For this purpose many of our indexes contain more columns than could be useful in a sorting algorithm.
For example the centralMvir in the index discussed above, being a floating point value, will
rarely have values shared by many rows. Hence its ordering will not produce intervals of constant values
that can be further ordered by subsequent values such as the stellarMass. However if one wants to obtain the stellarMass
of galaxies in a given snapshot and a given fofGroup, one needs look no further than this index.
A bookmark lookup is not required.
NB the index under discussion is itself not optimal. For it contains both fofId and snapnum.
But a fofID identifies a single friends-of-friends group which is always contained in a single snapnum.
I.e. the snapnum is actually constrained by the fofID.
Hence including the snapnum is only of interest if one has NO constraint on fofId.
MR | ix_guo2010a_mr_snapnum_sampling | snapnum | 1 |
MR | ix_guo2010a_mr_snapnum_sampling | fofID | 2 |
MR | ix_guo2010a_mr_snapnum_sampling | centralMvir | 3 |
MR | ix_guo2010a_mr_snapnum_sampling | type | 4 |
MR | ix_guo2010a_mr_snapnum_sampling | stellarMass | 5 |
MR | ix_guo2010a_mr_snapnum_sampling | bulgeMass | 6 |
MR | ix_guo2010a_mr_snapnum_sampling | gDust | 7 |
MR | ix_guo2010a_mr_snapnum_sampling | rDust | 8 |
MR | ix_mr_descendantid | descendantId | 1 |
MR | ix_mr_descendantid | galaxyID | 2 |
MR | ix_mr_descendantid | lastProgenitorId | 3 |
MR | ix_mr_descendantid | stellarMass | 4 |
MR | ix_mr_descendantid | type | 5 |
MR | ix_mr_descendantid | snapnum | 6 |
MR | ix_mr_fofid_type | fofID | 1 |
MR | ix_mr_fofid_type | type | 2 |
MR | ix_mr_haloid_tree | haloID | 1 |
MR | ix_mr_haloid_tree | type | 2 |
MR | ix_mr_haloid_tree | galaxyID | 3 |
MR | ix_mr_haloid_tree | lastProgenitorId | 4 |
|