Table Of Contents

Previous topic

Introducing jsonQL

Next topic

Formal Grammar

This Page

jsonQL Reference

jsonQL is a language for querying data that can be represented in JSON. Abstractly, a jsonQL query operates on collection of JSON objects that conform to pJSON (persistent json) semantics. More concretely, jsonQL works with a Vesper datastore, which provides a logical mapping between objects in a backend datastore to a collection of JSON objects (for example, each object might correspond to a row in table, with a property for each column). A jsonQL query operates on that mapping in a manner similar to a SQL query except that instead of returning rows it returns JSON data structures based on the pattern specified in the query.

Unless otherwise specified, the example queries here are based on the example datastore found in the jsonQL Tutorial. You can cut and paste or you can run the admin tool on the sample store.

>>> from vesper import app
>>> model1 = app.createStore(
... '''[
...   {
...     "type": "post",
...     "id": "post1",
...     "contents": "a post",
...     "author": "@user:1"
...   },
...   {
...     "contents": "a comment",
...     "type": "comment",
...     "id": "comment1",
...     "parent": "@post1",
...     "author": "@user:2"
...   },
...   {
...     "author": "@user:1",
...     "type": "comment",
...     "id": "comment2",
...     "parent": "@comment1",
...     "contents": "a reply"
...   },
...   {
...     "author": "@user:1",
...     "type": "comment",
...     "id": "comment3",
...     "parent": "@comment4",
...     "contents": "different parent"
...   },
...   {
...     "displayname": "abbey aardvaark",
...     "type": "user",
...     "id": "user:1",
...     "email": [
...       "abbey@aardvaark.com",
...       "abbey_aardvaak@gmail.com"
...     ]
...   },
...   {
...     "displayname": "billy billygoat",
...     "type": "user",
...     "id": "user:2"
...   }
... ]''')

Basic Grammar

Below is simplifed representation of the JQL grammar (the formal grammar can be found here). This reference guide will walk through each element of language and provide sample queries illustrating each feature of the language. The queries and sample results are based on the sample json used by the [tutorial] (which, btw, might be a better place to start learning about JQL).

A jsonQL query consists of a “construct pattern” that describes the JSON output, which can be any JSON type: an object, an array or a simple value like a string. The syntax for jsonQL construct patterns is:

query           ::=  constructobject
                    | constructarray
                    | constructvalue
constructobject ::=  "{" [label]
                        (objectitem | abbreviateditem [","])+
                         [query_criteria]
                      "}"
constructarray  ::=  "[" [label]
                      (propertyvalue [","])+ [query_criteria]
                     "]"
constructvalue  ::=  "("
                        expression [query_criteria]
                     ")"
objectitem      ::=   expression ":" ["["] ["omitnull"] ["maybe"] propertyvalue ["]"]
propertyvalue   ::=  expression | "*" | nestedconstruct
nestedconstruct ::=  constructarray | constructobject
abbreviateditem ::=  "ID" | "*" | ["["] ["omitnull"] ["maybe"] propertyname ["]"]
propertyname    ::=  NAME | "<" CHAR+ ">"
query_criteria  ::=  ["WHERE" expression]
                     ["GROUP BY" (expression[","])+]
                     ["ORDER BY" (expression ["ASC"|"DESC"][","])+]
                     ["LIMIT" number]
                     ["OFFSET" number]
                     ["DEPTH" number]
                     ["MERGEALL"]
                     ["NAMEMAP" "=" namemapdict]
namemapdict     ::=  "{" [((NAME | STRING) ":" (STRING | namemapdict) ","?)+] "}"

The syntax for jsonQL expressions is:

expression        ::=  expression "and" expression
                       | expression "or" expression
                       | "maybe" expression
                       | "not" expression
                       | expression operator expression
                       | filterset
                       | atom
                       | "(" expression ")"
operator          ::=  "+" | "-" | "*" | "/" | "%" | "=" | "=="
                       | "<" | "<=" | ">" | "=>" | ["not"] "in"
filterset         ::=  "{" [label] expression "}"
atom              ::=  label | bindvar | constant | objectreference
                       | functioncall | propertyreference
label             ::=  "?"NAME
bindvar           ::=  ":"NAME
objectreference   ::=  "@"NAME | "@<" CHAR+ ">"
propertyreference ::=  [label"."]propertyname["."propertyname]+
functioncall      ::=  NAME([expression[","]]+ [NAME"="expression[","]]+)
constant          ::=  STRING | NUMBER | "true" | "false" | "null"

Construct Patterns

There are three top level constructions depending on whether you want to construct results as JSON objects (dictionaries), arrays (lists) or simple values (such as a string or number).

A jsonQL query consists of a pattern describes a JSON object (dictionary), a list (array) or simple value – executing query will construct a list of objects that match the pattern. This example returns a list of all the objects that have properties named “displayname” and “type”:

{
     "displayname" : displayname,
     "type" : type
     }
>>> model1.query(
... '''{
...     "displayname" : displayname,
...     "type" : type
...     }
... ''')
[
  {
    "type": "user",
    "displayname": "abbey aardvaark"
  },
  {
    "type": "user",
    "displayname": "billy billygoat"
  }
]

Both the property name and value are expressions. In this example, the property names is simply string constants while the property value are property references. In the next example, the property name is the object id and property value is a more complex expression. It uses the MERGEALL option to return a single dictionary that is a merge of the results.

{
   id : upper(displayname)
   MERGEALL
 }
>>> model1.query(
... '''{
...   id : upper(displayname)
...   MERGEALL
... }''')
[
  {
    "user:1": "ABBEY AARDVAARK",
    "user:2": "BILLY BILLYGOAT"
  }
]

Abbreviated properties: objectitem

When a single property name appears instead of a name-value pair, it is treated as a name-value pair where the name is the name of the property and the value is a reference to the property. So the following example is equivalent to the first query:

{ displayname, type }
>>> model1.query(
... '''{ displayname, type }''')
[
  {
    "type": "user",
    "displayname": "abbey aardvaark"
  },
  {
    "type": "user",
    "displayname": "billy billygoat"
  }
]

constructarray

You can also construct results as arrays (lists) instead of objects. This query selects the same objects but it formats each result as a list not an object.

[displayname, type]
>>> model1.query(
... '''[displayname, type]''')
[
  [
    "abbey aardvaark",
    "user"
  ],
  [
    "billy billygoat",
    "user"
  ]
]

constructvalue

You can select simple values (strings or numbers) by wrapping an expression in parentheses. For example:

(displayname)
>>> model1.query(
... '''(displayname)''')
[
  "abbey aardvaark",
  "billy billygoat"
]

Property Names and id

Name tokens not used elsewhere in the grammar are treated as a reference to object properties. You can specify properties whose name match reserved keywords or have illegal characters by wrapping the property name with “<” and “>”. For example, <where> or <a property with spaces>.

id is a reserved name that always refers to the id of the object, not a property named “id”. Such a property can written as <id>.

{ 'key' : id, <id>, <a property with spaces>}
>>> from vesper import app
>>> model2 = app.createStore(
... '''[
...   {
...     "a property with spaces": "this property name has spaces",
...     "namemap": {
...       "id": "key"
...     },
...     "key": "1",
...     "id": "a property named id"
...   }
... ]''')

>>> model2.query(
... '''{ 'key' : id, <id>, <a property with spaces>}''')
[
  {
    "id": "a property named id",
    "key": "@1",
    "a property with spaces": "this property name has spaces"
  }
]

Property wildcard (‘*’)

The “*” will expand to all properties defined for the object. For example, this query retrieves all objects in the store:

{*}
>>> model1.query(
... '''{*}''')
[
  {
    "parent": "@post1",
    "type": "comment",
    "id": "comment1",
    "contents": "a comment",
    "author": "@user:2"
  },
  {
    "type": "user",
    "displayname": "abbey aardvaark",
    "email": [
      "abbey@aardvaark.com",
      "abbey_aardvaak@gmail.com"
    ],
    "id": "user:1"
  },
  {
    "type": "post",
    "id": "post1",
    "contents": "a post",
    "author": "@user:1"
  },
  {
    "parent": "@comment1",
    "type": "comment",
    "id": "comment2",
    "contents": "a reply",
    "author": "@user:1"
  },
  {
    "parent": "@comment4",
    "type": "comment",
    "id": "comment3",
    "contents": "different parent",
    "author": "@user:1"
  },
  {
    "type": "user",
    "displayname": "billy billygoat",
    "id": "user:2"
  }
]

“forcelist” syntax

You can use wrap the property value with brackets to force the value of a property to always be a list, even when the value just as one value or is null. If the value is null, an empty list ([]) will be used. For example, compare the results of the following two examples which are identical except for the second one’s use of “forcelist”:

{ id, mixed }
>>> from vesper import app
>>> model3 = app.createStore(
... '''[
...   {
...     "a_list": [
...       "a",
...       "b"
...     ],
...     "id": "1"
...   },
...   {
...     "a_list": "c",
...     "id": "1"
...   },
...   {
...     "mixed": [
...       "a",
...       "b"
...     ],
...     "a_list": null,
...     "id": "1"
...   },
...   {
...     "mixed": "c",
...     "id": "2"
...   },
...   {
...     "mixed": null,
...     "id": "3"
...   }
... ]''')

>>> model3.query(
... '''{ id, mixed }''')
[
  {
    "mixed": [
      "a",
      "b"
    ],
    "id": "1"
  },
  {
    "mixed": null,
    "id": "3"
  },
  {
    "mixed": "c",
    "id": "2"
  }
]
{ id, [mixed] }
>>> model3.query(
... '''{ id, [mixed] }''')
[
  {
    "mixed": [
      "a",
      "b"
    ],
    "id": "1"
  },
  {
    "mixed": [],
    "id": "3"
  },
  {
    "mixed": [
      "c"
    ],
    "id": "2"
  }
]

Null values and optional properties

results will only include objects that contain the property referenced in the construct list, For example, the next example just returns one object because only one has a both a displayname and auth property.

{displayname, email}
>>> model1.query(
... '''{displayname, email}''')
[
  {
    "displayname": "abbey aardvaark",
    "email": [
      "abbey@aardvaark.com",
      "abbey_aardvaak@gmail.com"
    ]
  }
]

If property references are modified “maybe” before them then objects without that property will be included in the result. For example:

{displayname, maybe email}
>>> model1.query(
... '''{displayname, maybe email}''')
[
  {
    "displayname": "abbey aardvaark",
    "email": [
      "abbey@aardvaark.com",
      "abbey_aardvaak@gmail.com"
    ]
  },
  {
    "displayname": "billy billygoat",
    "email": null
  }
]

This query still specifies that “auth” property appears in every object in the result – objects that doesn’t have a “auth” property defined have that property value set to null. If you do not want the property included in that case, you can use the the OMITNULL modifier instead:

{displayname, omitnull maybe email}
>>> model1.query(
... '''{displayname, omitnull maybe email}''')
[
  {
    "displayname": "abbey aardvaark",
    "email": [
      "abbey@aardvaark.com",
      "abbey_aardvaak@gmail.com"
    ]
  },
  {
    "displayname": "billy billygoat"
  }
]

The above examples illustrate using MAYBE and OMITNULL on appreviated properties. Specifically maybe property is an abbreviation for ‘property’ : maybe property and omitnull property is an abbreviation for ‘property’ : omitnull property.

omitnull must appear before the property name and omits the property whenever its value evaluates to null. For example, here’s a silly query that specifies a “nullproperty” property with a constant value but it will never be included in the result because of the “omitnull”.

{displayname, "nullproperty" : omitnull null}
>>> model1.query(
... '''{displayname, "nullproperty" : omitnull null}''')
[
  {
    "displayname": "abbey aardvaark"
  },
  {
    "displayname": "billy billygoat"
  }
]

The “forcelist” syntax can be combined with MAYBE or OMITNULL. For example:

{displayname, [maybe email]}
>>> model1.query(
... '''{displayname, [maybe email]}''')
[
  {
    "displayname": "abbey aardvaark",
    "email": [
      "abbey@aardvaark.com",
      "abbey_aardvaa@gmail.com"
    ]
  },
  {
    "displayname": "billy billygoat",
    "email": []
  }
]

Sub-queries (nested constructs)

The value of a property or array item can be another query instead of an expression. These sub-query can construct objects or arrays (constructobject or a constructarray) – constructvalue queries are not allowed as sub-queries.

If the sub-query doesn’t have a filter associated with it, the sub-query will be evaluated in the context of the parent object. For example:

If the sub-query’s filter has references to the outer query (via labels) the filter will be joined with the outer query and it will be evaluated using the rows from the resulting join. For example:

Otherwise, the sub-query will be evaluated independently for each result of the outer query. For example:

Data Types

A jsonQL implementation supports at least the data types defined by JSON and may support additional data types if the underlying datastore supports them.

The JSON data types are: (unicode) strings, (floating point) numbers, booleans (true and false) and null. Limits such max string length or numeric range and precision and semantics such as numeric overflow behavior are not specified by jsonQL, they will be dependent on the underlying datastore and implementation language. Most database support richer basic basic data types, for example integer, floating point and decimal, the implementation is responsible for appropriate promotion.

The values of JSON data types can be expressed in a query as literals that match the JSON syntax. Datastore-specific data type values can be expressed using datastore-specific query functions which construct or convert its arguments, for example, date functions.

They will be serialized as pjson. If the data type is compatible with JSON type it may converted (for example, from exact precision decimal type to JSON’s floating point number) depending on the fidelity needed. In addition, if a NAMEMAP is specified in the query customize the serialization.

Implicit type conversion, by default, is conversion is lenient [example] but the underlying datastore might be string.

null handling

Unlike sql, null value are treated as distinct values, i.e. “null = null” evaluates to true and “null != null” evaluates to false. Operators and functions generally follow SQL: if one of the operands or arguments is null the result is null.

footnote: Follow SQL for functions and operators: systems that don’t follow these null semantics, generally don’t support functions (most NO-SQL) or don’t support nulls at all (SPARQL). Also, unlike SQL null equality, these semantics is generally intuitive.

Aggregate functions, for example, count() ignores null values.

null < 0 so null go first with order by.

[null=null, null!=null, null=0, null='', 1+null, trim(null), null > 0, null < 0]
>>> from vesper import app
>>> model4 = app.createStore(
... '''[
...   {
...     "id": "1",
...     "value": null
...   },
...   {
...     "id": "2",
...     "value": ""
...   },
...   {
...     "id": "3",
...     "value": true
...   },
...   {
...     "notvalue": "a",
...     "id": "4"
...   }
... ]''')

>>> model4.query(
... '''[null=null, null!=null, null=0, null='', 1+null, trim(null), null > 0, null < 0]''')
[
  [
    true,
    false,
    false,
    false,
    null,
    null,
    false,
    true
  ]
]

pseudo-value types

matches value in the list not the list itself. The data-store may support data types that is serialized as a JSON array, the semantics will not apply. [Example]

order may not be preserved.

Objects without (public) unique identifiers can be treated as value types; they may not be queried. Note the implementation may store these as object and even provide (for example, forUpdate).

Multiple values and lists

  • list construction – multiple values are represented as lists

Note that the actually semantics of inserting pjson depends on the data store it is being inserted into. For example, does inserted a property that already exists on an object might add a new value or replace the current one.

{ id, a_list }
>>> model3.query(
... '''{ id, a_list }''')
[
  {
    "a_list": [
      "a",
      "b",
      "c",
      null
    ],
    "id": "1"
  }
]

object references and anonymous objects

If an object is anonymous it will be expanded, otherwise an object reference object will be output. This behavior can be overridden using the DEPTH directive, which will force object references to be expanded, even if objects are duplicated.

When a top-level (not embeddd) object is added to a data store without an id it is assigned an autogenerated id (cf. pjson docs). Embedded objects without ids are private and can not be referenced. [what about references amongst themselves?] Filters will not match embedded objects unless referenced through a property. [this implies no need to generate a join – but what if the property can have a reference to both public and private – need to double filtering?]

Filtering (the WHERE clause)

The where clause select which objects should appear in the result set.

In addition, if the construct clause references a property whose values are filtered, only those filters will be included in the result.

In other words, results are grouped by the object id.

value = 1 and value = 2 value in (1, 2)

  • property references in construct
  • matching lists
  • matching datatypes

all or nothing queries

Object References and Joins

labels

You can create a reference to an object creating object labels, which look this this syntax: ?identifier.

By declaring the variable

Once an objected labels, you can create joins by referencing that label in an expression.

This is example, value of the contains property will be any object that

{ ?post
     *,
     'comments' : { * where parent = ?post}
     where type = 'post'
     }
>>> model1.query(
... '''
...     { ?post
...     *,
...     'comments' : { * where parent = ?post}
...     where type = 'post'
...     }
... ''')
null

filter sets

When a filter expression is surrounded by braces ({ and }) the filter is applied separately from the rest of the expression, and is evaluated as an object reference to the object that met that criteria. These object references have the same semantics as label references. The object references can optionally be labeled and are typically used to create joins.

Note that a filter expression like {id = ?foo} is logically equivalent to labeling the group ?foo.

{ *
 where type = 'comment' and parent = { type = 'post'}
 }
>>> model1.query(
... '''
... { *
... where type = 'comment' and parent = { type = 'post'}
... }
... ''')
null

joins

maybe expressions (outer joins)

The “MAYBE” operator indicates that the expression it modifies is an optional part of the filter set. MAYBE can modify property references and join conditions; it is an error to modify any other expression. When “maybe” modifies a property reference it indicates that the existence of a property not required. When “maybe” modifies a join condition (an expression that joins two filter sets together) if the condition does not match any objects, any references to the missing objects’ id or properties will replaced with nulls (this is know as an “outer join”).

For example, object don’t

#property reference in filter prop = maybe ?label and ?label.type = ‘type’

#can also appear in the construction: { maybe foo}

{
 prop1, maybe prop2
 }
>>> model1.query(
... '''
... {
... prop1, maybe prop2
... }
... ''')
null

uncorrelated references (cross joins)

the follow() function (recursive joins)

{ ?post
     *,
     'comments' : {?comment * where ?comment in rfollow(?post, parent, true)}
     where type = 'post'
     }
>>> model1.query(
... '''
...     { ?post
...     *,
...     'comments' : {?comment * where ?comment in rfollow(?post, parent, true)}
...     where type = 'post'
...     }
... ''')
[
  {
    "id": "post1",
    "type": "post",
    "contents": "a post",
    "comments": [
      {
        "parent": "@post1",
        "type": "comment",
        "id": "comment1",
        "contents": "a comment",
        "author": "@user:2"
      },
      {
        "parent": "@comment1",
        "type": "comment",
        "id": "comment2",
        "contents": "a reply",
        "author": "@user:1"
      }
    ],
    "author": "@user:1"
  }
]

Expressions

Expressions can be evaluated in two contexts: when they appear inside the where clause and when they appear inside the construction WHAT ABOUT: order by, group by ?

If an expression contains a property reference whose value a list and the expression doesn’t contain any :ref:`aggregate functions’, the expression will be evaluated for each item in that list, resulting in a list. If the expression contains more than one property reference, the expression will be evaluated on each tuple obtained from a cartesian product of the list values, using an order based on the depth-first appearance of the property references.

Operator Precedence

Follows SQL, from highest to lowest.

( ) . unary + / unary - (right) * / % + - < <= > >= = != in not maybe and or

Operators with equal precedence are evaluated from left-to-right, except for the unary operator, which evaluate from right-to-left.

Type coercion

Built-in functions

Sorting the results: ORDER BY

Groupby and aggregate Functions

If a “group by” clause is not specified, the aggregate function will be apply

Built-in aggregate functions

count, min, max, sum, avg follow standard SQL semantics with regard to null handling, total follow the semantics sqllite’s total, described here: http://www.sqlite.org/lang_aggfunc.html

output modifiers

MERGEALL

DEPTH

DEPTH may result in duplicate objects being constructed if there are multiple reference to the same object, including circular references [hmmm... better not choose a arbitrary to number to expand all like DEPTH 1000]. Objects no properties are not serialized as objects, they will remain an object reference.

Note: expand a particular object, use ... or use DEPTH in a nested construct.

LIMIT and OFFSET

LIMIT and OFFSET are applied to the final resultset, after any GROUP BY and ORDER BY operations, but before the MERGEALL operation.

Bind variables

NAMEMAP

The value of a NAMEMAP declaration matches pjson’s namemap and is used both when parsing the query and when serializing the resultset.

The namemap applies to the construct pattern it appears in and in any nested constructs. If a nested construct has a NAMEMAP described, the effective namemap is the merger of this namemap with the effective parent namemap, as specified for pjson.