For many science questions it is strictly speaking not necessary to use the
full Millennium database. And for performance reasons it would often be
preferrable to use a smaller dataset that is in some way a good representation
of the complete set.
The milli-Millennium could be used for these purposes, however the problem with it is
that it misses the long wavelength modes of the initial perturbation field and does therefore
not correspond to a random volume of the same size from the full Millennium.
It's use is mainly in the opportunity it offers to test SQL queries that will work on the
full Millennium database, while giving much faster return times.
To offer an easy option of drawing proper random samples from the Millennium run we have added
a column to some of the datasets which contains an integer random number generated using the
java.lang.Math.random() function. For the Millennium datasets the number ranges between 0 and 1000000,
for the milli-Millennium data between 0 and 1000.
For example, to choose a roughly "1 in 1000" sample of all galaxies at the final snaphot one
may choose to have all galaxies with this random number in the interval [0,1000].
Note that this does not directly offer the an option to have a random sample of a prescribed size,
but as the example below shows it will be not hard to do so either.
Currently (2006-08-01) this column is only added to MPAGalaxies..MGalaxy and millimil..MMGalaxy.
The following queries show how one might use this feature.
The first takes a random sample that returns roughly 1% of all the galaxies from the MPA
Millennium galaxy catalogue at the final snapshot. This should roughly produce 200000 galaxies:
select *
from mpagalaxies..mgalaxy
where snapnum=63
and random between 0 and 10000
By choosing different intervals of the same size different smaples will be chosen.
The following query picks a random sample of exactly 1000 entries:
select top 1000 *
from mpagalaxies..mgalaxy
where snapnum=63
and random between 0 and 100
The selection on random would produce about 2000 galaxies, the top 1000
takes only the first 1000 of these.
This function will be most useful in combination with a
select ... into ... in a MyDB.
|