SML# Document Version 4.0.0
22 SQL Expressions and Commands

22.5 SELECT queries

SML# allows to construct a SELECT query by defining each of its clauses separately and composing the clauses into one query. The clauses sqlclause of which a SELECT query consists is given below:

sqlclause ::= sqlSelectClause SELECT clauses
 | sqlFromClause FROM clauses
 | sqlWhereClause WHERE clauses
 | sqlOrderClause ORDER BY clauses
 | sqlOffsetClause OFFSET clauses
 | sqlLimitClause LIMIT clauses

The details of each clause is shown below in a subsection of this section.

A SELECT query is constructed by combining these clauses and the GROUP BY clause sqlGroupClause. The syntax of the SELECT queries is the following:

sqlselect ::= sqlSelectClause SELECT query with SELECT clause
sqlFromClause
(sqlWhereClause)?
(sqlGroupClause)?
(sqlOrderClause)?
(sqlLimitClause)?
 | select ... ( exp ) SELECT query whose SELECT clause will be embedded
sqlFromClause
(sqlWhereClause)?
(sqlGroupClause)?
(sqlOrderClause)?
(sqlLimitClause)?
 | select ... ( exp ) embedded SELECT queries

The first two syntaxes constructs a SELECT query. A SELECT query must contain a SELECT clause and FROM clause. Other clauses are optional. While some RDBMSes accepts SELECT queries without FROM clauses, SML# does not allow such queries. The meaning of select ... (exp) in the second form is decribed later.

A SELECT query has type (τ, w) SQL.query if the following conditions are met:

  • sqlFromClause has type (τ1, w) SQL.from for some τ1.

  • If sqlWhereClause exists, it must have type (τ1 -> τ1, w) SQL.whr.

  • If a GROUP BY clause exists, a row group type τ2 is calculated from τ1 (see Section 22.5.4 for details). Otherwise, τ2=τ1.

  • sqlSelectClause must have type (τ2, τ, w) SQL.select.

  • If sqlOrderClause exists, it must have type (τ -> τ, w) SQL.orderby.

  • If sqlLimitClause exists, it must have type (τ -> τ, w) SQL.limit.

The following is an example of a complete SELECT query:

val q = fn db => _sql select #t.name as name, #t.age as age
                      from #db.employee as t
                      where #t.age >= 20

The third form select...(exp) embeds the result of the SML# expression exp as a SELECT query. The type of exp must be SQL.query. For example, the following code embeds the query q in another query q2 as a subquery:

val q2 = fn db => _sql select #t.name as name
                       from (select...(q db)) as t
                       where #t.name like "%Taro%"

As seen in the second form of the SELECT query and the syntax of each clause shown in the subsections, except for some particular clauses, it is allowed to write the clause name followed by ...(exp) to embed the result of exp as a clause of the clause name. By using this nontation, the above example of a SELECT query can be decomposed into a series of val definitions as follows:

val s = _sql select #t.name as name, #t.age as age
val f = fn db => _sql from #db.employee as t
val q = fn db => _sql select...(s) from...(f db)

The exp in the ...(exp) notation is not an SQL value expression but an SML# expression. In exp, SQL keywords are not keywords but ordinary identifiers as well as ordinary SML# expressions.

22.5.1 SELECT clauses

The syntax of SELECT clauses sqlSelectClause is the following:

sqlSelectClause ::= select  (distinct_all)?  sqlSelectField, ,  sqlSelectField
distinct_all ::= distinct  | all
sqlSelectField ::= sqlexp (as lab)?

A SELECT clause consists of one or more fields of the form sqlSelectField. Each field has a label lab. If the as labi of the i-th field (the first field is 1st) is omitted, it is complemented by adding as i. No two fields may have the same label.

For sqlSelectClause with n fields, if the sqlexpi of its i-th field (1in) sqlexpi as labi has the (τ,w)τi type, the type of entire sqlSelectClause is of the type

(τ, {lab1 : τ1, , labn : τn} list, w) SQL.query.

22.5.2 FROM clauses

The syntax of the From clauses sqlFromClause is the following:

sqlFromClause ::= from sqlTable , , sqlTable
 | from ... (exp)

The first syntax constructs a FROM clause. The second form embeds the value of exp in the ontext as a FROM clause. exp must be of the SQL.from type.

The commas in the first form sometimes conflicts with those of SML#’s tuple expressions. For example,

(1, _sql from #db.t1, #db.t2, #db.t3)

is ambiguous because there are two possible interpretations: this denotes a pair of an integer and FROM clause, or a 4-tuple whose second element is a FROM clause. The FROM clause must be parenthesized as either

(1, _sql (from #db.t1, #db.t2, #db.t3))

if the expression means the former, or

(1, _sql (from #db.t1), #db.t2, #db.t3)

if it means the latter. This is why the syntactic restriction described in Section 22.2 is introduced. In contrast to the above example,

(_sql from #db.t1, #db.t2, #db.t3)

is a single FROM clause expression with no ambiguity because it begins with _sql.

A table expression sqlTable denotes a table. Its syntax is as follows:

sqlTable ::= #vid.lab reference to a table
 | sqlTable as lab labeling expressions
 | ( (_sql)? sqlselect ) table subqueries
 | sqlTable (inner)? join sqlTable on sqlexp inner join of two tables
 | sqlTable cross join sqlTable product of two tables
 | sqlTable natural join sqlTable natural join of two tables
 | (sqlTable)

The labeling expressions must has the following rules:

  • The labeling expressions has the strongest associativity among the table expressions.

  • If a table reference #vid.lab is not labeled with a labeling expression, it is labeled by the same name as the table as if as lab is specified.

The syntax of sqlTable is restricted by the following rules:

  • All as labs occurring in a sqlFromClauses must be distinct. Thus, if a table is referenced twice or more times in a FROM clause, the references of the table must be labeled differently by as.

  • Each of the sqlTable1 and sqlTable2 in a sqlTable1 natural join sqlTable2 must be neither an inner or cross join.

  • The sqlTable of a sqlTable as lab must be neither an inner or cross join.

For a a FROM clause with n labels of sqlTablei as labi (1in), if the type of each sqlTablei is τi, the type of entire FROM clause is the following:

({lab1 : τ1, , labn : τn} list, w) SQL.from

The type of sqlTablei is decided as follows:

  • The type of #vid.lab is τ where the type of vid is (τ, w) SQL.db and τ is a record type including at least a lab:τ field.

  • The type of sqlTable as lab is identical to that of sqlTable.

  • The type of (sqlselect) is τ where sqlselect has type (τ,w) SQL.query.

  • The type of sqlTable1 natural join sqlTable2 is the record type obtained by performing the natural join operation on the record types of the two tables.

SML# does not compute the type of inner and cross joins. Therefore, inner and cross joins cannot be labeled by as directly (syntactically restricted). The type of these joins are represented by the record type of the entire FROM clause.

A FROM clause computes a table obtained by joining the tables it refers to. The record type {lab1 : τ1, , labn : τn} of a FROM clause represents a row of the computed table. Each field of the record type corresponds to a labeled subset of columns. The SQL column reference expressions occurring in other clauses refer to these labels.

22.5.3 WHERE clauses

Here is the syntax of the WHERE clauses sqlWhereClause:

sqlWhereClause ::= where sqlexp
 | where ... (exp)

The first form construct a WHERE clause. Its type is (τ -> τ, w) SQL.whr if sqlexp has type (τ,w)SQL.bool3.

The second form embeds the result of exp in the current context. exp must be of the SQL.whr type.

22.5.4 GROUP BY clauses

The syntax of the GROUP BY clauses sqlGroupClause is the following:

sqlGroupClause ::= group by sqlexp, ,  sqlexp  (having sqlexp)?
 | group by ()

Different from other clauses, the ...(exp) notation is not allowed for the GROUP BY clauses and therefore they cannot be separated from the SELECT queries. Syntactically, a GROUP BY clause must occur along with a SELECT and FROM clauses.

Each sqlexpi comma-separated in a GROUP BY clause must be of type (τ,w)τi where the type of the associated FROM clause is (τ, w) SQL.from. The GROUP BY clause splits the table computed by the FROM clause to the groups of rows by using keys specified in sqlexps, as described later.

A GROUP BY clause may have just one HAVING clause. The value expression of a HAVING clause is a condition to filter the row groups; therefore, its type must be (τ,w)SQL.bool3.

The second form group by () is one of the standard SQL syntax that makes a single group of all rows. In a conventional SQL, if an aggregate function is used without a GROUP BY clause, the query aggregates the entire table implicitly. For example, the SQL query

SELECT avg(e.age) FROM employee AS e

is a correct SQL query that computes the average of the e.age column. In contrast, SML# forces such an aggregating query to have group by (). The above query must be written in SML# as follows:

fn db => _sql select avg(#e.age) from #db.employee as e group by ()

This SML# expression is evaluated to the SQL query

SELECT avg(e.age) FROM employee AS e GROUP BY ()

as in the SML# expression.

The type τ of the row groups computed by a GROUP BY clause is computed roughly in the following steps:

  1. 1.

    Let the type of rows before grouping be

    τ={k1:τ1kn:τn}

    where the type of the FROM clause is τ list.

  2. 2.

    Group the rows in lists. Hence, the group type is naturally τ list list.

  3. 3.

    Transpose the row group type {k1:τ1, , kn:τn} list to τt={k1:τ1 list, kn:τn list}.

For the transposition in the step 3, the set of columns k1,,kn must be determined at compile time, as opposed to the record-polymorphic property of queries. SML# obtains the set of columns from column reference expressions ocurring in the entire query with a GROUP BY clause. For each column reference #lab1.lab2 occurring as either a key specified in the GROUP BY clause or column reference expression referring to a grouped value, if the column reference is identical to one of the group keys, the column refers to the single key value and therefore the type of the column is not changed in the result of the GROUP BY clause. Otherwise, the type of the column is the list type of its original type. Any other columns, which are never refered to in the query, are ignored and omitted in the result type of the GROUP BY clause.

Note that this type computation is performed based on the syntactic context and hence does not consider variable references. For example,

val q = fn db => _sql select #e.department, avg(#e.salary)
                      from #db.employee as e
                      group by #e.department

is a typechecked query with a GROUP BY clause. However,

val s = _sql select #e.department, avg(#e.salary)
val q = fn db => _sql select...(s)
                      from #db.employee as e
                      group by #e.department

causes a type error because the second example does not have any SELECT clause construction but an embedding, in contrast to the first example that have both a SELECT and GROUP BY clauses in the same syntax of a query. In the first example, the type of the row groups has #e.department and #e.salary columns as expected, but in the second example, the row groups is regarded to have no column since no column reference occur in the same query syntax. As a good manner, when you write a query with group by, you should not use the ...(exp) notation for its SELECT clause.

Notes for the observant readers: even when the SELECT clause is sepearated from a query with GROUP BY, if the embedded SELECT clause only refers to the group keys, the query does not cause type errors. For example, edit the above example by removing avg(#e.salary) from the definition of s as follows:

val s = _sql select #e.department
val q = fn db => _sql select...(s)
                      from #db.employee as e
                      group by #e.department

This program does not cause type errors because #e.department is a group key and therefore it is included in the result type of GROUP BY.

22.5.5 ORDER BY clauses

The syntax of the ORDER BY clauses sqlOrderclause is the following:

sqlOrderClause ::= order by sqlOrderKey, sqlOrderKey
 | order by ... ( exp )
sqlOrderKey ::= sqlexp (asc_desc)?
asc_desc ::= asc  | desc

The first form constructs an ORDER BY clause. The second form embeds the result of exp in the current context as an ORDER BY clause. exp must be of the SQL.orderby type.

An ORDER BY clause rearranges the rows computed by the SELECT clause. For the SELECT query computing the rows of type τ, Each sqlexpi in the ORDER BY clause must be (τ,w)τi, and the entire ORDER BY clause must be of type (τ -> τ, w) SQL.orderby.

There are several variations of ORDER BY clause in SQL since it has been extended by the database vendors and standard trucks without backward compatibility. The convention SML# adopts is that an ORDER BY clause may refer to only the result of the SELECT clause. The column reference expression with anonymous table #.lab is used to refer to a column of the SELECT clause from the ORDER BY clause. The following shows an example:

fn db => _sql select #e.name as name, #.age as age
              from #db.employee as e
              order by #.age

While many database engines allows for ORDER BY to refer to columns other than those of SELECT, SML# prohibits such references. The following program causes type errors, for example:

fn db => _sql select #e.name as name, #.age as age
              from #db.employee as e
              order by #e.department

22.5.6 OFFSET or LIMIT clauses

The OFFSET and LIMIT clauses have same meaning; they returns the rows only in the given range. The major difference of these clauses is who specify them: the OFFSET clause is specified in the standard SQL, and the LIMIT clause is a widely-accepted vender extention. SML# supports both.

The syntax of them sqlOffsetOrLimitClause is as follows:

sqlOffsetOrLimitClause ::= sqlOffsetClause  | sqlLimitClause
sqlOffsetClause ::= offset sqlatexp row˙rows  (sqlFetchClause)?
sqlFetchClause ::= fetch first˙next sqlatexp  row˙rows only
row_rows ::= row  | rows
first_next ::= first  | next
sqlLimitClause ::= limit sqlexp  (sqlLimitOffsetClause)?
 | limit all  (sqlLimitOffsetClause)?
sqlLimitOffsetClause ::= offset sqlexp

Note that the keyword offset is used in two ways in the above syntax: offset is either the subclause for a LIMIT clause or the main clause of a OFFSET clause that may have a FETCH subclause. These subclauses are not interchangable. Note also that if an OFFSET clause or its FETCH subclause has a non-constant expression, the expression must be parenthesized.

The type of sqlexp and sqlatexp must be ({},w)𝚒𝚗𝚝. Therefore, no column reference may appear in these clauses.

Several database engines allows these clauses and subclauses to appear more than one times in a qeury in any order. SML# follows the standard SQL and hence a main clause must be followed by its subclause. A query may contain only one of these clauses.

22.5.7 Corelated Subqueries

If more than one SELECT queries are nested, the inner query is a subquery of the outer query. A subquery is referred to as a corelated subquery if it refers to columns introduced by the FROM clause of its outer query. The following is an example of a corelated subquery in the standard SQL:

SELECT e.department AS department, e.name AS name
FROM empoloyee AS e
WHERE e.salary > (SELECT avg(#t.salary)
                  FROM employee as t
                  WHERE t.department = e.department
                  GROUP BY ())

In SML#, a subquery may appear as an value expression sqlexp (see also Section 22.4.8) and in a FROM clause (Section 22.5.2). A subquery may be a corelated subquery if the following conditions are met:

  1. 1.

    If a subquery itself and all of its outer queries have FROM clauses not of the form from...(exp), the subquery may be a corelated subquery.

This is a translation of the above SQL query into SML#.

fn db => _sql select #e.department as department, #e.name as name
              from #db.empoloyee as e
              where #e.salary > (select avg(#t.salary)
                                 from #db.employee as t
                                 where #t.department = #e.department
                                 group by ())

If one of the nested query has from...(exp), subqueries are not to be corelated. For example, if the above example is editted as follows by replacing its FROM clause with from...(exp)

let
  val f = fn db => _sql from #db.employee as t
in
  fn db => _sql select #e.department as department, #e.name as name
                from #db.empoloyee as e
                where #e.salary > (select avg(#t.salary)
                                   from...(f db)
                                   where #t.department = #e.department
                                   group by ())
end

the e of #e.department in the subquery is not interpreted as the e of from #db.employee as e in the outer query, but e to be introduced by from ...(x db). Therefore, x db must have e. Concequently, this example causes a type error. Also in the case when the outer query has from...(exp), such as

let
  val f = fn db => _sql from #db.empoloyee as e
in
  fn db => _sql select #e.department as department, #e.name as name
                from...(f db)
                where #e.salary > (select avg(#t.salary)
                                   from #db.employee as t
                                   where #t.department = #e.department
                                   group by ())

the subquery is not regarded as corelated one and therefore the e of #e.department is undefined.

Corelated subqueries and GROUP BY clauses can be combined as expected. For example, the following queries the youngest person who have incomes greater than the average saraly of his/her department:

fn db => _sql
  select #e.department, (select min(#t.age)
                         from #db.employee as t
                         where (#t.department = #e.department
                                and (Some) #t.salary > min(#e.salary))
                         group by ())
  from #db.employee as e
  group by #e.department

The inner query aggregates the grouped column #e.salary of the outer query by the min aggregate function. As described in Section #e.salary, the type of GROUP BY is computed from the syntactic context. This is also true even if a query has corelated subqueries.

Notes for the observant readers: A corelated subquery must occur in a outer query, but the subquery is not tightly corelated to the outer query similarly to the static scoping of ML variables. The syntactic restriction of corelated subqueries are introduced just to make the type computation of subqueries possible in a static manner. As long as any SQL value expressions including corelated subqueries are first-class citizens, it is possible to detach a corelated subquery A from a nested query B and embed A to another nested query C by exploiting SML# features. In the resulting query, A embedded in C refers to C’s tables, not to B. Regardless of the query construction operations, if the type of those operations are consistent, then the resulting SQL query is correct.