6.1  :  FAQ Millennium

  • Why have the ID columns such complex values?
  • The raw particle data for the Millennium Simulation was saved at 64 snapshots. Each snapshot was distributed over 512 approximately equal-sized files. At each snapshot, a friends-of-friends (FOF) groupfinder with linking length b=0.2 was run. During post-processing, each FOF group was searched for bound substructure using the SUBFIND algorithm. For each snapshot, there are 512 files that contain the results of SUBFIND. Merger trees were built based on these subhalos. The merger trees were split over 512 files, each of which contains approximately 14,000 trees. In the design of the various tables, the structure of these files is retained in the subhalo (and FOF halo) IDs as follows:
    • MPAHaloTrees..MHALO::haloId = 1012 * treeFile + 106 * (rank of tree in file) + (depth-first ordering of subhalo in tree)
    • MPAGalaxies..Delucia2006a/Bertone2007a::galaxyId = 1012 * (halotreeFile) + 106 * (rank of tree in file) + (depth-first ordering of galaxy in tree)
    • MField..FOF::fofID = 1012 * snapnum + 108 * fileNr + (rank in file)
    • MField..FOFSubHalo::subhaloId = 1012 * snapnum + 108 * fileNr + (rank in file)
    • DHaloTrees..DHalo::DHaloId = 1012 * treeFile + 108 * fileNr + (rank in file)
    • DGalaxies..Bower2006a::galaxyId = 1014 * treeFile + 108 * (rank of tree in file) + (rank in file)
    Note: the structure for the Millennium-II Simulation is different in the details!!!
  • How do I work around timeouts?
  • A query times out most often because the database is asked to handle datasets that are so large that the required disk I/O by itself exceeds the maximum allowed query time. Very often there is a straightforward manner to work around this, at the costs of having to issue multiple queries. Here as example how to deal with queries that will require a scan of the complete galaxies table.
    Suppose a user would like to run the following query (arbitrary):
       select * 
         from mpagalaxies..delucia2006a
        where blackHoleMass > coldgas
          and mag_b - mag_v between .2 and .8
       
    For these constraints the database decides that a full scan of the delucia2006a table is required, which will time out. An alternative is to run the following set of queries
       select * 
         from mpagalaxies..delucia2006a
        where galaxyid between :START*1e12 and (:START+:BIN)*1e12-1
          and blackHoleMass > coldgas
          and mag_b - mag_v between .2 and .8
       
    where one should substitute integer values for the variables :START and :BIN. For example with :BIN=1 and :START ranging from 0-511 one can sample the complete table in 512 separate queries, something that can be easily coded in a script, using wget for example for executing the query.
    This works because the table is sorted on galaxyId and therefore the database can quickly ("in logarithmic time") find the range of galaxies given by the first constraint. As a consequence of the structure in the IDs described above this set is about as large as the the millimil..delucia2006a table. In general the :BIN size can easily be set to a higher value, 10 or possibly larger.
    NB the -1 is required to ensure that consecutive bins are non overlapping: the BETWEEN clause is inclusive.
  • Why are queries using constraint on x,y,z so slow?
  • Queries requesting a subset of the full volume using constrints on the position often time out. This page gives hints how to rewrite the queries using special purpose columns and associated indexes.
  • Can I get simulation particles?
  • The current database servers are not large enough to contain the full raw output of the Millennium simulaiton. Howevere we have added the particles for the milli-Millennium simulaiton. These are stored in the MMSnapshots database.
  • how do I get all particles for a FOF group?
  • The MMSnapshots database contains a table that links FOF groups and SubHalos to the particles they consist of. This association is stored in the MillimilSnapshotIDs table. Note that not all particles are part of a FOF group, or SubHalo. And not all particles in a FOF gorup need to be part of a subhalo. A query to get all particles for FOF groups of mass around 10000 particles would be:
       select p.*
         from millimil..fof fof 
         ,    mmsnapshots..millimilsnapshotids id 
         ,    mmsnapshots..millimilsnapshots p 
        where fof.np between 10000 and 10010 
          and fof.snapnum=63 
          and id.fofid = fof.fofid 
          and p.id = id.particleid 
          and p.snapnum=fof.snapnum
       
  • how do I get information about indexes in the DB?
  • In the future the index definitions will be part of the table documentation. Until that time the following query returns all indexes for all tables in the Millennium-II database, with the columns and the order these appear in the index. Note that this query ONLY works for the Microsoft SQLServer database.
    select o.name as table_name
    ,      i.name as index_name
    ,      c.name as column_name
    ,      ic.key_ordinal as rank
      from millenniumii.sys.tables o
      ,    millenniumii.sys.indexes i
      ,    millenniumii.sys.index_columns ic
      ,    millenniumii.sys.columns c
     where i.object_id = o.object_id
       and ic.index_id = i.index_id
       and ic.column_id = c.column_id
       and ic.object_id = c.object_id
       and c.object_id = o.object_id
     order by table_name,index_name,rank
       
  • How do I find all FOF groups/sub-halos/galaxies at a given redshift?
  • All tables containing objects have a redshift column. BUT users should refrain from using this column in where clauses. Instead they should use the snapnum column. The reason is that the tables are in general indexed on snpanum, and not on redshift. The link between the two is stored in the Snapshots table (available both in millimil and in MField). So instead of writing
       select * from mpagalaxies..delucia2006a where redhsift=0
       
    one should use
       select * from mpagalaxies..delucia2006a where snapnum=63
       
  • what is the difference between MPAHaloTrees and DhaloTrees?
  • The MPAHaloTrees database contains subhalo merger trees constructed according to an algorithm developed at MPA. See this page for a description. The DHaloTrees database contains merger trees constructed accroding to an algorithm developed in Durham, see here. Whereas the nodes in the MPA trees are individual SUBFIND sub-halos, the nodes in the Durham trees are "DHalo-s", which may contain >1 subhalos. The DSubHalo table links the DHalo to the SubHalo table