3.2.4 : Identifiers

All tables have columns that contains a unique identifier for the objects stored in the table. For example MPAHaloTrees..MHalo has a haloId column, DGalaxies..Bower2006a has a GalaxyID column and MillenniumII..FOF a fofId. These columns are also the "primary key" of the tables, which for our database means that the rows in the tables are sorted on this column's values. Pointers into the table, or "foreign keys" as they are called in relational database parlance, in general use the primary key to identify the referenced object. Thus for example the MillenniumII..SubHalo table has a fofId column that is used to identify the FOF group stored in the MillenniumII..FOF table that the subhalo belongs to.

For purposes of identification alone we might have used arbitrary algorithms to allocate values to these primary key columns as long as the resulting values were unique in the table. However we have added extra structure on most of these column that allow for certain interesting queries to be phrased in terms of these values alone, which, combined with the ordering of the columns, often can greatly speed up their execution. The case of the primary key columns on the tables containing merger trees has been described already elsewhere. Here we describe this and other algorithms together with ways to use these.

MField

MField..FOF

The value of the fofId column is derived from the snapshot index (snapnum), the index of the file the FOF group was stored in originally (between 0 and 511), and the rank of the group in that file (starting at 0):

fofId = 10¹²*snapnum+10⁸*<file-index>+<rank-in-file>

This implies that when requesting all FOF groups with a certain snapnum, say 50, one might use:

select *
  from MField..FOF
 where fofId between 50e12 and 51e12-1

Because the FOF table is ordered by fofId this requires a single sequential read of the table. This will in general be much faster than the more obvious

select *
  from MField..FOF
 where snapnum=50

That query needs an index lookup to find the halos with the given snapnum and then a "bookmark lookup" to get all the data from the table. The size of the table, O(15 million) FOF groups per snapshot, means that the former query will likely time out as well though. To work around this one may use the file-index to ask a number of smaller queries that will step through the complete snapshot in a manner that is guaranteed not to lead to overlaps and will still be fast. Using :FILENR and :STEP as a place-holders for literal values one can run

select *
  from MField..FOF
 where fofId between 50e12+:FILENR*1e8 and 50e12+(:FILENR+:STEP)*1e8-1

When writing a simple script that loops :FILENR from 0-511 in steps of a size (:STEP) that can be found with some experimentation, this will walk through the table in an efficient manner.

MField..FOFSubHalo

The structure of the subhloId columns is equivalent to that of FOF groups and the same use can be made of these:

 
subhaloId = 10¹²*snapnum+10⁸*<file-index>+<rank-in-file>

Note that the subhaloId column in the MillenniumII..Subhalo table has a different form as described below.

MField..MField

The density fields stored in the MField table are identified using 2 columns, snapnum and phKey. Only the combination is unique! Consequently the combination (snapnum,phKey) on other tables should be interpreted as a foreign key and both should be used to join these tables to the MField table. For more information about the use of the phKey as a spatial index see here.

MPAHaloTrees..MHalo

As described here the identifiers of merger trees are assigned using a depth-first ordering of the subhalos in a merger tree. The full haloId column is assigned as follows:

haloId = 10¹²*<file-index>+10⁶*<tree-index>+<depth-first-index>

here file-index takes again values 0-511 and tree-index is the order of the L-galaxies tree in the indicated file. The file-index can be used when a query is supposed to scan the whole table and will therefore time out. Similar to the case above a series of queries of the following form can easily be sent in sequence using a simple script:

select *
  from MPAHalotrees..MHalo
 where haloId between :FILENR*1e12 and (:FILENR+:STEP)*1e12-1 
   and ...

With :STEP=1 the number of rows that is scanned per query is roughly of the same size as the complete millimil..MPAHalo table.

MPAGalaxies

The galaxyId column in the three tables in this database, DeLucia2006a, DeLucia2006a_SDSS2MASS and Bertone2007a are all assigned in the same manner as the haloId in the MPAHalotrees..MHalo table:

galaxyId = 10¹²*<file-index>+10⁶*<tree-index>+<depth-first-index>

DHaloTrees..DHalo

The DHaloID column is structured exactly the same as the MPAHaloTrees..MHalo table:

DHaloID = 10¹²*<file-index>+10⁶*<tree-index>+<depth-first-index>

DGalaxies..Bower2006a

The structure of the GalaxyID column is similar to, but not identical with that of the MPAGalaxies tables:

GalaxyID = 10¹⁴*lt&;file-index> + 10⁸*<tree-index>+<depth-first-index>

The special query examples above should be updated accordingly.

MillenniumII

FOF

The fofId column is similar to the one for MField...FOF, but not identical:

fofId = 10¹⁰*snapnum+10⁶*<file-index>+<rank-in-file>

<file-index> now takes values in 0-2047, for the rest the same "tricks" can be used as above.

SubHalo

This table has two identifying columns; subhaloFileId is based on an algorithm similar to the one for MField..FOFSubHalo, but with adjusted values similar to MillenniumII..FOF:

subhaloFileId = 10¹⁰*snapnum+10⁶*<file-index>+<rank-in-file>

The primary key of the SubHalo table is the subHaloId column, and this column should be used in foreign keys and in optimised queries. It's structure is based on the fofId of the FOF group the subahlo belongs to, and the rank of the subhalo in that group:

subhaloId = 10⁶ * fofID + <rank in FOF group> 

          = 10¹⁶*snapnum+10¹²*<file-index>+10⁶<fof-rank-in-file> + <rank-in-FOF-group>

The dependence on fofId means that queries to find all subhalos in a FOF group (e.g. 670020005352)can be most efficiently written as:

select *
  from MillenniumII..SubHalo
 where subhaloId between 670020005352*1000000 and (670020005352+1)*1000000-1

HaloTree

Similar to the MPAHaloTrees..MHalo table, but different in details:

haloId = 10¹⁵*<file-index>+10⁹*<tree-index>+<depth-first-index>