4.2 : Data Formats

The results of queries to the database can come in a number of formats. The two main ones are the HTML table format that is produced using the "Query(browser)" button on the query page, and the comma-separated-values (CSV) format that is returned by default when the "Query (stream)" button is clicked.

Other formats are available after the HTML table has been retrieved using the "Reformat" button on the web page. The most interesting of these is the VOTable format, the others are variants of the CSV format, using seperators different from the ",". The most important datatype though is the CSV datatype and we describe that here.

CSV

We provide comma-separated values in a particular form, with a header section that provides metadata about the result. We will describe them based on different yypes of results.

For queries that produce a result and have no problems a typical result will look as follows.

#OK
#SQL= select top 10 *  
#	  from mfield..snapshots
#MAXROWS UNLIMITED
#QUERYTIMEOUT 420 sec
#DATE 2009-03-31 11:40:53.853 CEST
#QUERYTIME 2 millisec
#COLUMN 1 name=snapNum JDBC_TYPE=4 JDBC_TYPENAME=int
#COLUMN 2 name=Z JDBC_TYPE=7 JDBC_TYPENAME=real
#COLUMN 3 name=redshift JDBC_TYPE=3 JDBC_TYPENAME=decimal
#COLUMN 4 name=lookbackTime JDBC_TYPE=7 JDBC_TYPENAME=real
snapNum,Z,redshift,lookbackTime
0,127.0,127.00,13.5672
1,79.997894,80.00,13.5551
2,49.99959,50.00,13.5305
3,30.000063,30.00,13.4761
4,19.91569,19.92,13.3929
5,18.243723,18.24,13.368
6,16.724525,16.72,13.3403
7,15.343074,15.34,13.3093
8,14.085914,14.09,13.2748
9,12.94078,12.94,13.2365
#OK

#OK in the first line indicates that the SQL query was accepted and executed by the database. The actual SQL statement is repeated next, together with some configuration information(#MAXROWS and #QUERYTIMEOUT). The timestamp at which the query was handled is added in the line starting with #DATE. The #QUERYTIME indicates the time between issuing the query and receiving the first response from the database. This is not necessarily the same as the total time it takes to handle the complete result. For simple queries with large result sets it may take much longer than that before all results are returned.

The follow lines starting with #COLUMN that describe the columns in the result, e.g.

#COLUMN 1 name=snapNum JDBC_TYPE=4 JDBC_TYPENAME=int

The integer directly following the #COLUMN indicates the order. then follows the name of the column. The datatype finally is represented in two different ways. The JDBC_TYPE attribute gives an integer atht corresponds to the JDBC type in the java.sql.Types class in the Java Database Connection Specification. The JDBC_TYPENAME gives the name of that type as decided by the JDBC implementation. The official mapping between the JDBC_TYPE and the JDBC type is as indicated in the following table

BIT	-7
TINYINT	-6
SMALLINT	5
INTEGER	4
BIGINT	-5
FLOAT	6
REAL	7
DOUBLE	8
NUMERIC	2
DECIMAL	3

CHAR	1
VARCHAR	12
LONGVARCHAR	-1
DATE	91
TIME	92
TIMESTAMP	93
BINARY	-2
VARBINARY	-3
LONGVARBINARY	-4
NULL	0

OTHER	1111
JAVA_OBJECT	2000
DISTINCT	2001
STRUCT	2002
ARRAY	2003
BLOB	2004
CLOB	2005
REF	2006
DATALINK	70
BOOLEAN	16

The first line not starting with a # repeats the column names, separated by a comma. Many CSV readers use this line to name the columns. Then follow the data itself. In case a column has a string-like datatype (CHAR, VARCHAR etc), the values are contained in double quotes ("..."). This is so that possible commas in such string values are not interpreted as separators. In case the string value itself contains a double quote, that double quote is doubled. I.e. " → "".

The final line reads #OK. This indicates that the complete result is succesfully retrieved. More importantly, if it is missing this is an indication that the result is most likely not complete. This can have many possible casues. Some of these can be caught by the server, that then may return instead a line starting with #ERROR (see below). But in theory other problems can occur, such as loosing the connection somewhere between the client and the server.

Certain statements do not return a result and the CSV result is slightly different. For example

#OK NO RESULT
#SQL= select * 
#	  into mysnapshots
#	  from millimil..snapshots
#MAXROWS UNLIMITED
#QUERYTIMEOUT 420 sec
#DATE 2009-03-31 13:34:48.875 CEST
#QUERYTIME 16 millisec
#UPDATE_COUNT=-1
#OK

Here the succesful execution is indicated by a #OK NO RESULT. As there is no result there are no columns to describe. The only extra information comes in the #UPDATECOUNT, which indicates how many rows were inserted in the database, with -1 indicating that no rows were updated.

NOTE the -1 in the example above is incorrect ! It is a "feature" of the JDBC software that we are still using that in combination with SQLServer 2005 falsely indicates that no rows were inserted when using a SELECT .. INTO .. statement. If instead a CREATE TABLE MyTable .. is followed by an INSERT INTO MyTable .. rows are properly counted.

The following results show how errors are indicated.

#ERROR SQL
#SQL= select * 
#	  from millimil..snapshot
#MAXROWS UNLIMITED
#QUERYTIMEOUT 420 sec
#DATE 2009-03-31 13:56:06.656 CEST
#SQLSTATE S0002
#SQLERRORCODE 208
#SQLEXCEPTION =Invalid object name 'millimil..snapshot'.

In most cases the error will result from incorrect SQL. This is indicated by the #ERROR SQL on the first line. After the #DATE there are now various lines giving some information about the error. In particular the last line, starting with #SQLEXCEPTION replicates the error message retrieved from the database.

Other errors will be indicated similarly. A special case is when the error is caused by a timeout. This may happen after results have been obtained and sent to the client already. In that case an error message will be appended after the last data row that is sent.