4.3  :  wget

Clicking the Query (csv) button on the web interface with the example query in the figure on that page is equivalent in action to the following URL:
http://gavo.mpa-garching.mpg.de/Millennium?action=doQuery&SQL=select top 10 * from millimil..mmhalo.
This example returns the following result:

#OK
#SQL= select top 10 haloid,snapnum, x,y,z,np from millimil..mmhalo
#MAXROWS UNLIMITED
#QUERYTIMEOUT 420 sec
#COLUMN 1 name=haloid JDBC_TYPE=-5 JDBC_TYPENAME=bigint
#COLUMN 2 name=snapnum JDBC_TYPE=4 JDBC_TYPENAME=int
#COLUMN 3 name=x JDBC_TYPE=7 JDBC_TYPENAME=real
#COLUMN 4 name=y JDBC_TYPE=7 JDBC_TYPENAME=real
#COLUMN 5 name=z JDBC_TYPE=7 JDBC_TYPENAME=real
#COLUMN 6 name=np JDBC_TYPE=4 JDBC_TYPENAME=int
haloid,snapnum,x,y,z,np
0,63,6.5757904,13.08604,25.33813,51984
1,62,6.587909,13.099106,25.301092,51288
2,61,6.597178,13.111782,25.252974,51052
3,60,6.615912,13.121013,25.204876,51169
4,59,6.6276503,13.1303835,25.152872,50870
5,58,6.6414022,13.1400175,25.09534,50468
6,57,6.658701,13.149509,25.03174,50168
7,56,6.642237,13.170146,24.927555,50485
8,55,6.6424794,13.18374,24.83325,49888
9,54,6.6978354,13.176765,24.781622,48275
The format of this result is explained here.
What this implies is that users could conceivably avoid using the web interface and instead construct the URL and use their favorite way of accessing the web passing in the URL. One standard way is via the command line function wget (e.g GNU wget. The following command will return the above result to standard out:
wget "http://gavo.mpa-garching.mpg.de/Millennium?action=doQuery&SQL=select top 10 haloid,snapnum, x,y,z,np from millimil..mmhalo"
To store the results in a file named "result.csv" use:
wget -O result.csv "http://gavo.mpa-garching.mpg.de/Millennium?action=doQuery&SQL=select top 10 haloid,snapnum, x,y,z,np from millimil..mmhalo"
When a log-in is required, like for the protected site http://gavo.mpa-garching.mpg.de/MyMillennium, one needs to change the command to something like:
wget --http-user=**** --http-passwd=**** "http://gavo.mpa-garching.mpg.de/MyMillennium?action=doQuery&SQL=select top 10 haloid,snapnum, x,y,z,np from millimil..mmhalo"

Session tracking

The nice feature of wget, that it can be used from within scripts, now and then causes problems as well. Now and then the load on our web server has been very large, which could be tracked down to users sending large numbers of relatively small queries using wget.

The problem is that the web server creates a session for every request that is posed to it, unless it is recognised as belonging to a previous session. We use the standard way of tracking sessions using cookies and web browsers will handle this properly.

Without extra work, wget, used as documented above does not enable this session tracking. As sessions have a finite lifetime, sending large numbers of queries can now clog up the web server.

As of version 1.10 of wget it is possible to enforce session tracking. The following example command shows how:

wget --http-user=**** --http-passwd=*** 
   --cookies=on --keep-session-cookies --save-cookies=cookie.txt --load-cookies=cookie.txt -O out.csv
  "http://gavo.mpa-garching.mpg.de/MyMillennium?action=doQuery&SQL=select top 10 * from mpagalaxies..delucia2006a" 
The new features are the various cookie parameters. The file name after --save-cookies and --load-cookies should be the same and may have to be fully qualified if you run this from within an environment such as IDL. What this does is that it writes the cookie info that comes back from the server, and reads it again upon next execution. This allows the server to work within a single session. Please also consult your local manual on wget (if you can find it) for more information.

Please use this pattern when querying using wget.
For other options, see the command line options of wget.

URL encoding

The fact that we are creating a URL implies we must beware for some URL encoding issues. In particular we must beware for using characters in the SQL that are interpreted as special characters in a URL. An example of this is the + (plus) sign, which is interpreted as a space. So a query like
select x+y from millimil..mpahalo
will cause an error, unless encoded as
select x%2By from millimil..mpahalo

The next few pages show ways using the wget command form within environments such as R and ITT's IDL.