- Why have the ID columns such complex values?
The raw particle data for the Millennium Simulation was saved at 64 snapshots.
Each snapshot was distributed over 512 approximately equal-sized files.
At each snapshot, a friends-of-friends (FOF) groupfinder with linking length b=0.2 was run.
During post-processing, each FOF group was searched for bound substructure using the SUBFIND algorithm.
For each snapshot, there are 512 files that contain the results of SUBFIND.
Merger trees were built based on these subhalos.
The merger trees were split over 512 files, each of which contains approximately 14,000 trees.
In the design of the various tables, the structure of these files is retained in the subhalo (and FOF halo) IDs as follows:
Note: the structure for the Millennium-II Simulation is different in the details!!!
- MPAHaloTrees..MHALO::haloId = 1012 * treeFile + 106 * (rank of tree in file) + (depth-first ordering of subhalo in tree)
- MPAGalaxies..Delucia2006a/Bertone2007a::galaxyId = 1012 * (halotreeFile) + 106 * (rank of tree in file) + (depth-first ordering of galaxy in tree)
- MField..FOF::fofID = 1012 * snapnum + 108 * fileNr + (rank in file)
- MField..FOFSubHalo::subhaloId = 1012 * snapnum + 108 * fileNr + (rank in file)
- DHaloTrees..DHalo::DHaloId = 1012 * treeFile + 108 * fileNr + (rank in file)
- DGalaxies..Bower2006a::galaxyId = 1014 * treeFile + 108 * (rank of tree in file) + (rank in file)
- How do I work around timeouts?
A query times out most often because the database is asked to handle datasets that are so
large that the required disk I/O by itself exceeds the maximum allowed query time.
Very often there is a straightforward manner to work around this, at the costs of having to issue multiple queries.
Here as example how to deal with queries that will require a scan of the complete galaxies table.
Suppose a user would like to run the following query (arbitrary):
where blackHoleMass > coldgas
and mag_b - mag_v between .2 and .8
For these constraints the database decides that a full scan of the delucia2006a table is required, which will time out.
An alternative is to run the following set of queries
where galaxyid between :START*1e12 and (:START+:BIN)*1e12-1
and blackHoleMass > coldgas
and mag_b - mag_v between .2 and .8
where one should substitute integer values for the variables :START and :BIN.
For example with :BIN=1 and :START ranging from 0-511 one can sample the complete table in 512 separate queries,
something that can be easily coded in a script, using
wget for example for executing the query.
This works because the table is sorted on galaxyId and therefore the database can quickly ("in logarithmic time")
find the range of galaxies given by
the first constraint. As a consequence of the structure in the IDs described above this set
is about as large as the the millimil..delucia2006a table.
In general the :BIN size can easily be set to a higher value, 10 or possibly larger.
NB the -1 is required to ensure that consecutive bins are non overlapping: the BETWEEN clause is inclusive.
Why are queries using constraint on x,y,z so slow?
Queries requesting a subset of the full volume using constrints on the position often time out.
This page gives hints how to rewrite the queries using special purpose columns and associated indexes.
Can I get simulation particles?
The current database servers are not large enough to contain the full raw output of the Millennium simulaiton.
Howevere we have added the particles for the milli-Millennium simulaiton.
These are stored in the MMSnapshots database.
how do I get all particles for a FOF group?
The MMSnapshots database contains a table that links FOF groups and SubHalos to the particles they consist of.
This association is stored in the
Note that not all particles are part of a FOF group, or SubHalo. And not all particles in a FOF gorup need to be
part of a subhalo. A query to get all particles for FOF groups of mass around 10000 particles would be:
from millimil..fof fof
, mmsnapshots..millimilsnapshotids id
, mmsnapshots..millimilsnapshots p
where fof.np between 10000 and 10010
and id.fofid = fof.fofid
and p.id = id.particleid
how do I get information about indexes in the DB?
In the future the index definitions will be part of the table documentation.
Until that time the following query returns all indexes for all tables in the Millennium-II database,
with the columns and the order these appear in the index. Note that this query ONLY works for the Microsoft
select o.name as table_name
, i.name as index_name
, c.name as column_name
, ic.key_ordinal as rank
from millenniumii.sys.tables o
, millenniumii.sys.indexes i
, millenniumii.sys.index_columns ic
, millenniumii.sys.columns c
where i.object_id = o.object_id
and ic.index_id = i.index_id
and ic.column_id = c.column_id
and ic.object_id = c.object_id
and c.object_id = o.object_id
order by table_name,index_name,rank
How do I find all FOF groups/sub-halos/galaxies at a given redshift?
All tables containing objects have a redshift column. BUT users should refrain from using this column in where clauses.
Instead they should use the snapnum column. The reason is that the tables are in general indexed on snpanum, and not on redshift.
The link between the two is stored in the
Snapshots table (available both in millimil and in MField).
So instead of writing
select * from mpagalaxies..delucia2006a where redhsift=0
one should use
select * from mpagalaxies..delucia2006a where snapnum=63
what is the difference between MPAHaloTrees and DhaloTrees?
The MPAHaloTrees database contains subhalo merger trees constructed according to an algorithm developed at MPA.
See this page for a description.
The DHaloTrees database contains merger trees constructed accroding to an algorithm developed in Durham,
Whereas the nodes in the MPA trees are individual SUBFIND sub-halos, the nodes in the Durham trees are "DHalo-s",
which may contain >1 subhalos.
table links the DHalo
to the SubHalo table