JMESPath Enhancement Proposals

The JMESPath Enhancement Proposals (JEP) process is used to modify the JMESPath language and specification. There are implementations of JMESPath in over 10 languages, and this process ensures stakeholders and community members have the opportunity to review and provide feedback before it's officially part of the specification.

You can see the list of accepted JEPs at:

https://jmespath.github.io/jmespath.jep/

Things that need a JEP

Any functional change that would require an update to the specification requires a JEP.

This includes, but is not limited to:

  • New syntax
  • New functions
  • New semantics

You can review the existing JEPs in this repo to get a sense of the type of changes that require a JEP.

Things that do not need a JEP

Anything that is specific to a JMESPath library does not need a JEP. You should defer to the specific library's contributing guide. This can include additional language specific APIs, extension points (e.g. adding custom functions), configuration options, etc.

Guidelines for proposing new features

First, make sure that the feature has not been previously proposed. If it has, make sure to reference prior proposals and explain why this new proposal should be considered despite similar proposals not being accepted.

Writing a JEP can be a lot of work, so it can help to get initial guidance before going too far. A well thought out, high quality JEP helps its chance of acceptance and helps ensure a productive review process.

Before writing a JEP, you can create an issue for initial high level feedback in order to get a sense of the likelihood of a JEP being accepted. You can also use that issue to gauge interest in the feature.

The JEP Process

  1. Fork this repository.

  2. Copy 0000-jep-template.md to proposals/0000-feature-name.md, where feature-name is a high level descriptive name of the proposal. You don't need to add a JEP number, one will be assigned during the review process.

  3. Fill in all sections of the JEP template. Be mindful of the "Motivation" and "Rationale" sections. These are an important part of driving consensus for a JEP.

  4. Submit a pull request to this repo.

  5. The JEP will be reviewed and feedback will be provided. Proposals often go through several rounds of feedback, this is a normal and expected part of the process.

  6. As you incorporate feedback, do not rebase your commits. This ensures the history and evolution of the proposal remains visible.

  7. The discussions will eventually stabilize to one of several states:

    • The JEP has consensus for both the functionality and the proposed specification and is ready to be accepted.
    • The JEP has consensus for the feature but there is not consensus with the specification.
    • The JEP does not have consensus for the feature.
    • The JEP loses steam and the discussions go stale. This will result in the PR being closed, but is subject to being reopened by anyone that wants to continue working on the JEP.
  8. Once the JEP is approved by the JMESPath core team the pull request will be merged and the JEP will be assigned a number.

  9. The relevant parts of the "Specification" section will be added to the JMESPath specification, and the tests cases from the "Test Cases" section of the JEP will be added to the jmespath.test repo.

  10. JMESPath libraries can now implement the accepted JEP.

Tenets of JMESPath

When proposing new features, keep these tenets in mind. Adhering to these tenets gives your proposal a higher likelihood of being accepted:

  • JMESPath is not specific to a particular programming language. Avoid constructs that are difficult to implement in another language.
  • JMESPath strives to have one way to do something.
  • Features are driven from real world use cases.

Nested Expressions

  • JEP: 1
  • Author: Michael Dowling
  • Created: 2013-11-27

Abstract

This document proposes modifying the JMESPath grammar to support arbitrarily nested expressions within multi-select-list and multi-select-hash expressions.

Motivation

This JMESPath grammar currently does not allow arbitrarily nested expressions within multi-select-list and multi-select-hash expressions. This prevents nested branching expressions, nested multi-select-list expressions within other multi expressions, and nested or-expressions within any multi-expression.

By allowing any expression to be nested within a multi-select-list and multi-select-hash expression, we can trim down several grammar rules and provide customers with a much more flexible expression DSL.

Supporting arbitrarily nested expressions within other expressions requires:

  • Updating the grammar to remove non-branched-expr

  • Updating compliance tests to add various permutations of the grammar to ensure implementations are compliant.

  • Updating the JMESPath documentation to reflect the ability to arbitrarily nest expressions.

Nested Expression Examples

Nested branch expressions

Given:

{
    "foo": {
        "baz": [
            {
                "bar": "abc"
            }, {
                "bar": "def"
            }
        ],
        "qux": ["zero"]
    }
}

With: foo.[baz[\*].bar, qux[0]]

Result:

[
    [
        "abc",
        "def"
    ],
    "zero"
]

Nested branch expressions with nested mutli-select

Given:

{
    "foo": {
        "baz": [
            {
                "bar": "a",
                "bam": "b",
                "boo": "c"
            }, {
                "bar": "d",
                "bam": "e",
                "boo": "f"
            }
        ],
        "qux": ["zero"]
    }
}

With: foo.[baz[\*].[bar, boo], qux[0]]

Result:

[
    [
        [
            "a",
            "c"
        ],
        [
            "d",
            "f"
        ]
    ],
    "zero"
]

Nested or expressions

Given:

{
    "foo": {
        "baz": [
            {
                "bar": "a",
                "bam": "b",
                "boo": "c"
            }, {
                "bar": "d",
                "bam": "e",
                "boo": "f"
            }
        ],
        "qux": ["zero"]
    }
}

With: foo.[baz[\*].not_there || baz[\*].bar, qux[0]]

Result:

[
    [
        "a",
        "d"
    ],
    "zero"
]

No breaking changes

Because there are no breaking changes from this modification, existing multi-select expressions will still work unchanged:

Given:

{
    "foo": {
        "baz": {
            "abc": 123,
            "bar": 456
        }
    }
}

With: foo.[baz, baz.bar]

Result:

[
    {
        "abc": 123,
        "bar": 456
    },
    456
]

Modified Grammar

The following modified JMESPath grammar supports arbitrarily nested expressions and is specified using ABNF, as described in RFC4234

expression        = sub-expression / index-expression / or-expression / identifier / "*"
expression        =/ multi-select-list / multi-select-hash
sub-expression    = expression "." expression
or-expression     = expression "||" expression
index-expression  = expression bracket-specifier / bracket-specifier
multi-select-list = "[" ( expression *( "," expression ) ) "]"
multi-select-hash = "{" ( keyval-expr *( "," keyval-expr ) ) "}"
keyval-expr       = identifier ":" expression
bracket-specifier = "[" (number / "*") "]"
number            = [-]1*digit
digit             = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "0"
identifier        = 1*char
identifier        =/ quote 1*(unescaped-char / escaped-quote) quote
escaped-quote     = escape quote
unescaped-char    = %x30-10FFFF
escape            = %x5C   ; Back slash: \
quote             = %x22   ; Double quote: '"'
char              = %x30-39 / ; 0-9
                    %x41-5A / ; A-Z
                    %x5F /    ; _
                    %x61-7A / ; a-z
                    %x7F-10FFFF

Functions

  • JEP: 3
  • Author: Michael Dowling, James Saryerwinnie
  • Created: 2013-11-27

Abstract

This document proposes modifying the JMESPath grammar to support function expressions.

Motivation

Functions allow users to easily transform and filter data in JMESPath expressions. As JMESPath is currently implemented, functions would be very useful in multi-select-list and multi-select-hash expressions to format the output of an expression to contain data that might not have been in the original JSON input. Combined with filtered expressions, functions would be a powerful mechanism to perform any kind of special comparisons for things like length(), contains(), etc.

Data Types

In order to support functions, a type system is needed. The JSON types are used:

  • number (integers and double-precision floating-point format in JSON)

  • string

  • boolean (true or false)

  • array (an ordered, sequence of values)

  • object (an unordered collection of key value pairs)

  • null

Syntax Changes

Functions are defined in the function-expression rule below. A function expression is an expression itself, and is valid any place an expression is allowed.

The grammar will require the following grammar additions:

function-expression = unquoted-string  (
                        no-args  /
                        one-or-more-args )
no-args             = "(" ")"
one-or-more-args    = "(" ( function-arg *( "," function-arg ) ) ")"
function-arg        = expression / number / current-node
current-node        = "@"

expression will need to be updated to add the function-expression production:

expression        = sub-expression / index-expression / or-expression / identifier / "*"
expression        =/ multi-select-list / multi-select-hash
expression        =/ literal / function-expression

A function can accept any number of arguments, and each argument can be an expression. Each function must define a signature that specifies the number and allowed types of its expected arguments. Functions can be variadic.

current-node

The current-node token can be used to represent the current node being evaluated. The current-node token is useful for functions that require the current node being evaluated as an argument. For example, the following expression creates an array containing the total number of elements in the foo object followed by the value of foo["bar"].

foo[].[count(@), bar]

JMESPath assumes that all function arguments operate on the current node unless the argument is a literal or number token. Because of this, an expression such as @.bar would be equivalent to just bar, so the current node is only allowed as a bare expression.

current-node state

At the start of an expression, the value of the current node is the data being evaluated by the JMESPath expression. As an expression is evaluated, the value the the current node represents MUST change to reflect the node currently being evaluated. When in a projection, the current node value MUST be changed to the node currently being evaluated by the projection.

Function Evaluation

Functions are evaluated in applicative order. Each argument must be an expression, each argument expression must be evaluated before evaluating the function. The function is then called with the evaluated function arguments. The result of the function-expression is the result returned by the function call. If a function-expression is evaluated for a function that does not exist, the JMESPath implementation must indicate to the caller that an unknown-function error occurred. How and when this error is raised is implementation specific, but implementations should indicate to the caller that this specific error occurred.

Functions can either have a specific arity or be variadic with a minimum number of arguments. If a function-expression is encountered where the arity does not match or the minimum number of arguments for a variadic function is not provided, then implementations must indicate to the caller than an invalid-arity error occurred. How and when this error is raised is implementation specific.

Each function signature declares the types of its input parameters. If any type constraints are not met, implementations must indicate that an invalid-type error occurred.

In order to accommodate type contraints, functions are provided to convert types to other types (to_string, to_number) which are defined below. No explicit type conversion happens unless a user specifically uses one of these type conversion functions.

Function expressions are also allowed as the child element of a sub expression. This allows functions to be used with projections, which can enable functions to be applied to every element in a projection. For example, given the input data of ["1", "2", "3", "notanumber", true], the following expression can be used to convert (and filter) all elements to numbers:

search([].to_number(@), ``["1", "2", "3", "notanumber", true]``) -> [1, 2, 3]

This provides a simple mechanism to explicitly convert types when needed.

Built-in Functions

JMESPath has various built-in functions that operate on different data types, documented below. Each function below has a signature that defines the expected types of the input and the type of the returned output:

return_type function_name(type $argname)
return_type function_name2(type1|type2 $argname)

If a function can accept multiple types for an input value, then the multiple types are separated with |. If the resolved arguments do not match the types specified in the signature, an invalid-type error occurs.

The array type can further specify requirements on the type of the elements if they want to enforce homogeneous types. The subtype is surrounded by [type], for example, the function signature below requires its input argument resolves to an array of numbers:

return_type foo(array[number] $argname)

As a shorthand, the type any is used to indicate that the argument can be of any type (array|object|number|string|boolean|null).

The first function below, abs is discussed in detail to demonstrate the above points. Subsequent function definitions will not include these details for brevity, but the same rules apply.

NOTE: All string related functions are defined on the basis of Unicode code points; they do not take normalization into account.

abs

number abs(number $value)

Returns the absolute value of the provided argument. The signature indicates that a number is returned, and that the input argument $value must resolve to a number, otherwise a invalid-type error is triggered.

Below is a worked example. Given:

{"foo": -1, "bar": "2"}

Evaluating abs(foo) works as follows:

  1. Evaluate the input argument against the current data:
search(foo, {"foo": -11, "bar": 2"}) -> -1
  1. Validate the type of the resolved argument. In this case -1 is of type number so it passes the type check.

  2. Call the function with the resolved argument:

abs(-1) -> 1
  1. The value of 1 is the resolved value of the function expression

    abs(foo).

Below is the same steps for evaluating abs(bar):

  1. Evaluate the input argument against the current data:
search(foo, {"foo": -1, "bar": 2"}) -> "2"
  1. Validate the type of the resolved argument. In this case "2 is of type string so the immediate indicate that an invalid-type error occurred.

As a final example, here is the steps for evaluating abs(to_number(bar)):

  1. Evaluate the input argument against the current data:
search(to_number(bar), {"foo": -1, "bar": "2"})
  1. In order to evaluate the above expression, we need to evaluate to_number(bar):
search(bar, {"foo": -1, "bar": "2"}) -> "2"
# Validate "2" passes the type check for to_number, which it does.
to_number("2") -> 2
  1. Now we can evaluate the original expression:
search(to_number(bar), {"foo": -1, "bar": "2"}) -> 2
  1. Call the function with the final resolved value:
abs(2) -> 2
  1. The value of 2 is the resolved value of the function expression abs(to_number(bar)).

Examples

ExpressionResult
abs(1)1
abs(-1)1
abs(`abc`)

avg

number avg(array[number] $elements)

Returns the average of the elements in the provided array.

An empty array will produce a return value of null.

Examples

GivenExpressionResult
[10, 15, 20]avg(@)15
[10, false, 20]avg(@)<error: invalid-type>
[false]avg(@)<error: invalid-type>
falseavg(@)<error: invalid-type>

ceil

number ceil(number $value)

Returns the next highest integer value by rounding up if necessary.

Examples

ExpressionResult
ceil(`1.001`)2
ceil(`1.9`)2
ceil(`1`)1
ceil(`abc`)null

| ### contains

boolean contains(array|string $subject, array|object|string|number|boolean $search)

Returns true if the given $subject contains the provided $search string.

If $subject is an array, this function returns true if one of the elements in the array is equal to the provided $search value.

If the provided $subject is a string, this function returns true if the string contains the provided $search argument.

Examples

GivenExpressionResult
n/acontains(`foobar`, `foo`)true
n/acontains(`foobar`, `not`)false
n/acontains(`foobar`, `bar`)true
n/acontains(`false`, `bar`)<error: invalid-type>
n/acontains(`foobar`, 123)false
["a", "b"]contains(@, `a`)true
["a"]contains(@, `a\`)true
["a"]contains(@, `b\`)false

floor

number floor(number $value)

Returns the next lowest integer value by rounding down if necessary.

Examples

ExpressionResult
floor(`1.001\`)1
floor(`1.9\`)1
floor(`1\`)1

join

string join(string $glue, array[string] $stringsarray)

Returns all of the elements from the provided $stringsarray array joined together using the $glue argument as a separator between each.

Examples

GivenExpressionResult
["a", "b"]join(`, `, @)"a, b"
["a", "b"]join(``, @) "ab"
["a", false, "b"]join(`, `, @)<error: invalid-type>
[false]join(`, `, @)<error: invalid-type>

keys

array keys(object $obj)

Returns an array containing the keys of the provided object.

Examples

GivenExpressionResult
{"foo": "baz", "bar": "bam"}keys(@)["foo", "bar"]
{}keys(@)[]
falsekeys(@)<error: invalid-type>
[b, a, c]keys(@)<error: invalid-type>

length

number length(string|array|object $subject)

Returns the length of the given argument using the following types rules:

  1. string: returns the number of code points in the string

  2. array: returns the number of elements in the array

  3. object: returns the number of key-value pairs in the object

Examples

GivenExpressionResult
n/alength(`abc`)3
"current"length(@)7
"current"length(not_there)<error: invalid-type>
["a", "b", "c"]length(@)3
[]length(@)0
{}length(@)0
{"foo": "bar", "baz": "bam"}length(@)2

max

number max(array[number] $collection)

Returns the highest found number in the provided array argument.

An empty array will produce a return value of null.

Examples

GivenExpressionResult
[10, 15]max(@)15
[10, false, 20]max(@)<error: invalid-type>

min

number min(array[number] $collection)

Returns the lowest found number in the provided $collection argument.

Examples

GivenExpressionResult
[10, 15]min(@)10
[10, false, 20]min(@)<error: invalid-type>

sort

array sort(array $list)

This function accepts an array $list argument and returns the sorted elements of the $list as an array.

The array must be a list of strings or numbers. Sorting strings is based on code points. Locale is not taken into account.

Examples

GivenExpressionResult
[b, a, c]sort(@)[a, b, c]
[1, a, c]sort(@)[1, a, c]
[false, [], null]sort(@)[[], null, false]
[[], {}, false]sort(@)[{}, [], false]
{"a": 1, "b": 2}sort(@)null
falsesort(@)null

to_string

string to_string(string|number|array|object|boolean $arg)
  • string - Returns the passed in value.

  • number/array/object/boolean - The JSON encoded value of the object. The JSON encoder should emit the encoded JSON value without adding any additional new lines.

Examples

GivenExpressionResult
nullto_string(`2`)"2"

to_number

number to_number(string|number $arg)
  • string - Returns the parsed number. Any string that conforms to the json-number production is supported.

  • number - Returns the passed in value.

  • array - null

  • object - null

  • boolean - null

type

string type(array|object|string|number|boolean|null $subject)

Returns the JavaScript type of the given $subject argument as a string value.

The return value MUST be one of the following:

  • number
  • string
  • boolean
  • array
  • object
  • null

Examples

GivenExpressionResult
"foo"type(@)"string"
truetype(@)"boolean"
falsetype(@)"boolean"
nulltype(@)"null"
123type(@)number
123.05type(@)number
["abc"]type(@)"array"
{"abc": "123"}type(@)"object"

values

array values(object $obj)

Returns the values of the provided object.

Examples

GivenExpressionResult
{"foo": "baz", "bar": "bam"}values(@)["baz", "bam"]
["a", "b"]values(@)<error: invalid-type>
falsevalues(@)<error: invalid-type>

Compliance Tests

A functions.json will be added to the compliance test suite. The test suite will add the following new error types:

  • unknown-function
  • invalid-arity
  • invalid-type

The compliance does not specify when the errors are raised, as this will depend on implementation details. For an implementation to be compliant they need to indicate that an error occurred while attempting to evaluate the JMESPath expression.

History

  • This JEP originally proposed the literal syntax. The literal portion of this JEP was removed and added instead to JEP 7.

  • This JEP originally specified that types matches should return null. This has been updated to specify that an invalid type error should occur instead.

Pipe Expressions

  • JEP: 4
  • Author: Michael Dowling
  • Created: 2013-12-07

Abstract

This document proposes adding support for piping expressions into subsequent expressions.

Motivation

The current JMESPath grammar allows for projections at various points in an expression. However, it is not currently possible to operate on the result of a projection as a list.

The following example illustrates that it is not possible to operate on the result of a projection (e.g., take the first match of a projection).

Given:

{
    "foo": {
        "a": {
            "bar": [1, 2, 3]
        },
        "b": {
            "bar": [4, 5, 6]
        }
    }
}

Expression:

foo.*.bar[0]

The result would be element 0 of each bar:

[1, 4]

With the addition of filters, we could pass the result of one expression to another, operating on the result of a projection (or any expression).

Expression:

foo.*.bar | [0]

Result:

[1, 2, 3]

Not only does this give us the ability to operate on the result of a projection, but pipe expressions can also be useful for breaking down a complex expression into smaller, easier to comprehend, parts.

Modified Grammar

The following modified JMESPath grammar supports piped expressions.

expression        = sub-expression / index-expression / or-expression / identifier / "*"
expression        =/ multi-select-list / multi-select-hash / pipe-expression
sub-expression    = expression "." expression
pipe-expression   = expression "|" expression
or-expression     = expression "||" expression
index-expression  = expression bracket-specifier / bracket-specifier
multi-select-list = "[" ( expression *( "," expression ) ) "]"
multi-select-hash = "{" ( keyval-expr *( "," keyval-expr ) ) "}"
keyval-expr       = identifier ":" expression
bracket-specifier = "[" (number / "*") "]" / "[]"
number            = [-]1*digit
digit             = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "0"
identifier        = 1*char
identifier        =/ quote 1*(unescaped-char / escaped-quote) quote
escaped-quote     = escape quote
unescaped-char    = %x30-10FFFF
escape            = %x5C   ; Back slash: \
quote             = %x22   ; Double quote: '"'
char              = %x30-39 / ; 0-9
                    %x41-5A / ; A-Z
                    %x5F /    ; _
                    %x61-7A / ; a-z
                    %x7F-10FFFF

NOTE: pipe-expression has a higher precendent than the or-operator

Compliance Tests

[{
  "given": {
    "foo": {
      "bar": {
        "baz": "one"
      },
      "other": {
        "baz": "two"
      },
      "other2": {
        "baz": "three"
      },
      "other3": {
        "notbaz": ["a", "b", "c"]
      },
      "other4": {
        "notbaz": ["d", "e", "f"]
      }
    }
  },
  "cases": [
    {
      "expression": "foo.*.baz | [0]",
      "result": "one"
    },
    {
      "expression": "foo.*.baz | [1]",
      "result": "two"
    },
    {
      "expression": "foo.*.baz | [2]",
      "result": "three"
    },
    {
      "expression": "foo.bar.* | [0]",
      "result": "one"
    },
    {
      "expression": "foo.*.notbaz | [*]",
      "result": [["a", "b", "c"], ["d", "e", "f"]]
    },
    {
      "expression": "foo | bar",
      "result": {"baz": "one"}
    },
    {
      "expression": "foo | bar | baz",
      "result": "one"
    },
    {
      "expression": "foo|bar| baz",
      "result": "one"
    },
    {
      "expression": "not_there | [0]",
      "result": null
    },
    {
      "expression": "not_there | [0]",
      "result": null
    },
    {
      "expression": "[foo.bar, foo.other] | [0]",
      "result": {"baz": "one"}
    },
    {
      "expression": "{\"a\": foo.bar, \"b\": foo.other} | a",
      "result": {"baz": "one"}
    },
    {
      "expression": "{\"a\": foo.bar, \"b\": foo.other} | b",
      "result": {"baz": "two"}
    },
    {
      "expression": "{\"a\": foo.bar, \"b\": foo.other} | *.baz",
      "result": ["one", "two"]
    },
    {
      "expression": "foo.bam || foo.bar | baz",
      "result": "one"
    },
    {
      "expression": "foo | not_there || bar",
      "result": {"baz": "one"}
    }
  ]
}]

Array Slice Expressions

  • JEP: 5
  • Author: Michael Dowling
  • Created: 2013-12-08

Abstract

This document proposes modifying the JMESPath grammar to support array slicing for accessing specific portions of an array.

Motivation

The current JMESPath grammar does not allow plucking out specific portions of an array.

The following examples are possible with array slicing notation utilizing an optional start position, optional stop position, and optional step that can be less than or greater than 0:

  1. Extracting every N indices (e.g., only even [::2], only odd [1::2], etc)

  2. Extracting only elements after a given start position: [2:]

  3. Extracting only elements before a given stop position: [:5]

  4. Extracting elements between a given start and end position: [2::5]

  5. Only the last 5 elements: [-5:]

  6. The last five elements in reverse order: [:-5:-1]

  7. Reversing the order of an array: [::-1]

Syntax

This syntax introduces Python style array slicing that allows a start position, stop position, and step. This syntax also proposes following the same semantics as python slices.

[start:stop:step]

Each part of the expression is optional. You can omit the start position, stop position, or step. No more than three values can be provided in a slice expression.

The step value determines how my indices to skip after each element is plucked from the array. A step of 1 (the default step) will not skip any indices. A step value of 2 will skip every other index while plucking values from an array. A step value of -1 will extract values in reverse order from the array. A step value of -2 will extract values in reverse order from the array while, skipping every other index.

Slice expressions adhere to the following rules:

  1. If a negative start position is given, it is calculated as the total length of the array plus the given start position.

  2. If no start position is given, it is assumed to be 0 if the given step is greater than 0 or the end of the array if the given step is less than 0.

  3. If a negative stop position is given, it is calculated as the total length of the array plus the given stop position.

  4. If no stop position is given, it is assumed to be the length of the array if the given step is greater than 0 or 0 if the given step is less than 0.

  5. If the given step is omitted, it it assumed to be 1.

  6. If the given step is 0, an error must be raised.

  7. If the element being sliced is not an array, the result must be null.

  8. If the element being sliced is an array and yields no results, the result must be an empty array.

Modified Grammar

The following modified JMESPath grammar supports array slicing.

expression        = sub-expression / index-expression / or-expression / identifier / "*"
expression        =/ multi-select-list / multi-select-hash
sub-expression    = expression "." expression
or-expression     = expression "||" expression
index-expression  = expression bracket-specifier / bracket-specifier
multi-select-list = "[" ( expression *( "," expression ) ) "]"
multi-select-hash = "{" ( keyval-expr *( "," keyval-expr ) ) "}"
keyval-expr       = identifier ":" expression
bracket-specifier = "[" (number / "*" / slice-expression) "]" / "[]"
slice-expression  = ":"
slice-expression  =/ number ":" number ":" number
slice-expression  =/ number ":"
slice-expression  =/ number ":" ":" number
slice-expression  =/ ":" number
slice-expression  =/ ":" number ":" number
slice-expression  =/ ":" ":" number
number            = [-]1*digit
digit             = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "0"
identifier        = 1*char
identifier        =/ quote 1*(unescaped-char / escaped-quote) quote
escaped-quote     = escape quote
unescaped-char    = %x30-10FFFF
escape            = %x5C   ; Back slash: \
quote             = %x22   ; Double quote: '"'
char              = %x30-39 / ; 0-9
                    %x41-5A / ; A-Z
                    %x5F /    ; _
                    %x61-7A / ; a-z
                    %x7F-10FFFF

Improved Identifiers

  • JEP: 6
  • Author: James Saryerwinnie
  • Created: 2013-12-14

Abstract

This JEP proposes grammar modifications to JMESPath in order to improve identifiers used in JMESPath. In doing so, several inconsistencies in the identifier grammar rules will be fixed, along with an improved grammar for specifying unicode identifiers in a way that is consistent with JSON strings.

Motivation

There are two ways to currently specify an identifier, the unquoted rule:

identifier        = 1*char

and the quoted rule:

identifier        =/ quote 1*(unescaped-char / escaped-quote) quote

The char rule contains a set of characters that do not have to be quoted:

char              = %x30-39 / ; 0-9
                    %x41-5A / ; A-Z
                    %x5F /    ; _
                    %x61-7A / ; a-z
                    %x7F-10FFFF

There is an ambiguity between the %x30-39 rule and the number rule:

number            = ["-"]1*digit
digit             = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "0"

It’s ambiguous which rule to use. Given a string “123”, it’s not clear whether this should be parsed as an identifier or a number. Existing implementations aren’t following this rule (because it’s ambiguous) so the grammar should be updated to remove the ambiguity, specifically, an unquoted identifier can only start with the characters [a-zA-Z_].

Unicode

JMESPath supports unicode through the char and unescaped-char rule:

unescaped-char    = %x30-10FFFF
char              = %x30-39 / ; 0-9
                    %x41-5A / ; A-Z
                    %x5F /    ; _
                    %x61-7A / ; a-z
                    %x7F-10FFFF

However, JSON supports a syntax for escaping unicode characters. Any character in the Basic Multilingual Plane (BMP) can be escaped with:

char = escape (%x75 4HEXDIG )  ; \uXXXX

Similar to the way that XPath supports numeric character references used in XML (&#nnnn), JMESPath should support the same escape sequences used in JSON. JSON also supports a 12 character escape sequence for characters outside of the BMP, by encoding the UTF-16 surrogate pair. For example, the code point U+1D11E can be represented as "\\uD834\\uDD1E".

Escape Sequences

Consider the following JSON object:

{"foo\nbar": "baz"}

A JMESPath expression should be able to retrieve the value of baz. With the current grammar, one must rely on the environment’s ability to input control characters such as the newline (%x0A). This can be problematic in certain environments. For example, in python, this is not a problem:

>>> jmespath_expression = "foo\nbar"

Python will interpret the sequence "\\n" (%x5C %x6E) as the newline character %x0A. However, consider Bash:

$ foo --jmespath-expression "foo\nbar"

In this situation, bash will not interpret the "\\n" (%x5C %x6E) sequence.

Specification

The char rule contains a set of characters that do not have to be quoted. The new set of characters that do not have to quoted will be:

unquoted-string   = (%x41-5A / %x61-7A / %x5F) *(%x30-39 / %x41-5A / %x5F / %x61-7A)

In order for an identifier to not be quoted, it must start with [A-Za-z_], then must be followed by zero or more [0-9A-Za-z_].

The unquoted rule is updated to account for all JSON supported escape sequences:

quoted-string     =/ quote 1*(unescaped-char / escaped-char) quote

The full rule for an identifier is:

identifier        = unquoted-string / quoted-string
unquoted-string   = (%x41-5A / %x61-7A / %x5F) *(  ; a-zA-Z_
                        %x30-39  /  ; 0-9
                        %x41-5A /  ; A-Z
                        %x5F    /  ; _
                        %x61-7A)   ; a-z
quoted-string     = quote 1*(unescaped-char / escaped-char) quote
unescaped-char    = %x20-21 / %x23-5B / %x5D-10FFFF
escape            = %x5C   ; Back slash: \
quote             = %x22   ; Double quote: '"'
escaped-char      = escape (
                        %x22 /          ; "    quotation mark  U+0022
                        %x5C /          ; \    reverse solidus U+005C
                        %x2F /          ; /    solidus         U+002F
                        %x62 /          ; b    backspace       U+0008
                        %x66 /          ; f    form feed       U+000C
                        %x6E /          ; n    line feed       U+000A
                        %x72 /          ; r    carriage return U+000D
                        %x74 /          ; t    tab             U+0009
                        %x75 4HEXDIG )  ; uXXXX                U+XXXX

Rationale

Adopting the same string rules as JSON strings will allow users familiar with JSON semantics to understand how JMESPath identifiers will work.

This change also provides a nice consistency for the literal syntax proposed in JEP 3. With this model, the supported literal strings can be the same as quoted identifiers.

This also will allow the grammar to grow in a consistent way if JMESPath adds support for filtering based on literal values. For example (note that this is just a suggested syntax, not a formal proposal), given the data:

{"foo": [{"✓": "✓"}, {"✓": "✗"}]}

You can now have the following JMESPath expressions:

foo[?"✓" = `✓`]
foo[?"\u2713" = `\u2713`]

As a general property, any supported JSON string is now a supported quoted identifier.

Impact

For any implementation that was parsing digits as an identifier, identifiers starting with digits will no longer be valid, e.g. foo.0.1.2.

There are several compliance tests that will have to be updated as a result of this JEP. They were arguably wrong to begin with.

basic.json

The following needs to be changed because identifiers starting with a number must now be quoted:

-            "expression": "foo.1",
+            "expression": "foo.\"1\"",
             "result": ["one", "two", "three"]
          },
          {
-            "expression": "foo.1[0]",
+            "expression": "foo.\"1\"[0]",
             "result": "one"
          },

Similarly, the following needs to be changed because an unquoted identifier cannot start with -:

-            "expression": "foo.-1",
+            "expression": "foo.\"-1\"",
             "result": "bar"
          }

escape.json

The escape.json has several more interseting cases that need to be updated. This has to do with the updated escaping rules. Each one will be explained.

-            "expression": "\"foo\nbar\"",
+            "expression": "\"foo\\nbar\"",
             "result": "newline"
          },

This has to be updated because a JSON parser will interpret the \\n sequence as the newline character. The newline character is not allowed in a JMESPath identifier (note that the newline character %0A is not in any rule). In order for a JSON parser to create a sequence of %x5C %x6E, the JSON string must be \\\\n (%x5C %x5C %x6E).

-            "expression": "\"c:\\\\windows\\path\"",
+            "expression": "\"c:\\\\\\\\windows\\\\path\"",
             "result": "windows"
          },

The above example is a more pathological case of escaping. In this example, we have a string that represents a windows path “c:\windowpath”. There are two levels of escaping happening here, one at the JSON parser, and one at the JMESPath parser. The JSON parser will take the sequence "\\"c:\\\\\\\\\\\\\\\\windows\\\\\\\\path\\"" and create the string "\\"c:\\\\\\\\windows\\\\path\\"". The JMESPath parser will take the string "\\"c:\\\\\\\\windows\\\\path\\"' and, applying its own escaping rules, will look for a key named c:\\\\windows\\path.

Filter Expressions

  • JEP: 7
  • Author: James Saryerwinnie
  • Created: 2013-12-16

Abstract

This JEP proposes grammar modifications to JMESPath to allow for filter expressions. A filtered expression allows list elements to be selected based on matching expressions. A literal expression is also introduced (from JEP 3) so that it is possible to match elements against literal values.

Motivation

A common request when querying JSON objects is the ability to select elements based on a specific value. For example, given a JSON object:

{"foo": [{"state": "WA", "value": 1},
         {"state": "WA", "value": 2},
         {"state": "CA", "value": 3},
         {"state": "CA", "value": 4}]}

A user may want to select all objects in the foo list that have a state key of WA. There is currently no way to do this in JMESPath. This JEP will introduce a syntax that allows this:

foo[?state == `WA`]

Additionally, a user may want to project additional expressions onto the values matched from a filter expression. For example, given the data above, select the value key from all objects that have a state of WA:

foo[?state == `WA`].value

would return [1, 2].

Specification

The updated grammar for filter expressions:

bracket-specifier      = "[" (number / "*") "]" / "[]"
bracket-specifier      =/ "[?" list-filter-expression "]"
list-filter-expression = expression comparator expression
comparator             = "<" / "<=" / "==" / ">=" / ">" / "!="
expression             =/ literal
literal                = "`" json-value "`"
literal                =/ "`" 1*(unescaped-literal / escaped-literal) "`"
unescaped-literal      = %x20-21 /       ; space !
                            %x23-5A /   ; # - [
                            %x5D-5F /   ; ] ^ _
                            %x61-7A     ; a-z
                            %x7C-10FFFF ; |}~ ...
escaped-literal        = escaped-char / (escape %x60)

The json-value rule is any valid json value. While it’s recommended that implementations use an existing JSON parser to parse the json-value, the grammar is added below for completeness:

json-value = "false" / "null" / "true" / json-object / json-array /
             json-number / json-quoted-string
json-quoted-string = %x22 1*(unescaped-literal / escaped-literal) %x22
begin-array     = ws %x5B ws  ; [ left square bracket
begin-object    = ws %x7B ws  ; { left curly bracket
end-array       = ws %x5D ws  ; ] right square bracket
end-object      = ws %x7D ws  ; } right curly bracket
name-separator  = ws %x3A ws  ; : colon
value-separator = ws %x2C ws  ; , comma
ws              = *(%x20 /              ; Space
                    %x09 /              ; Horizontal tab
                    %x0A /              ; Line feed or New line
                    %x0D                ; Carriage return
                   )
json-object = begin-object [ member *( value-separator member ) ] end-object
member = quoted-string name-separator json-value
json-array = begin-array [ json-value *( value-separator json-value ) ] end-array
json-number = [ minus ] int [ frac ] [ exp ]
decimal-point = %x2E       ; .
digit1-9 = %x31-39         ; 1-9
e = %x65 / %x45            ; e E
exp = e [ minus / plus ] 1*DIGIT
frac = decimal-point 1*DIGIT
int = zero / ( digit1-9 *DIGIT )
minus = %x2D               ; -
plus = %x2B                ; +
zero = %x30                ; 0

Comparison Operators

The following operations are supported:

  • ==, tests for equality.

  • !=, tests for inequality.

  • <, less than.

  • <=, less than or equal to.

  • >, greater than.

  • >=, greater than or equal to.

The behavior of each operation is dependent on the type of each evaluated expression.

The comparison semantics for each operator are defined below based on the corresponding JSON type:

Equality Operators

For string/number/true/false/null types, equality is an exact match. A string is equal to another string if they they have the exact sequence of code points. The literal values true/false/null are only equal to their own literal values. Two JSON objects are equal if they have the same set of keys (for each key in the first JSON object there exists a key with equal value in the second JSON object). Two JSON arrays are equal if they have equal elements in the same order (given two arrays x and y, for each i in x, x[i] == y[i]).

Ordering Operators

Ordering operators >, >=, <, <= are only valid for numbers. Evaluating any other type with a comparison operator will yield a null value, which will result in the element being excluded from the result list. For example, given:

search('foo[?a<b]', {"foo": [{"a": "char", "b": "char"},
                             {"a": 2, "b": 1},
                             {"a": 1, "b": 2}]})

The three elements in the foo list are evaluated against a < b. The first element resolves to the comparison "char" < "bar", and because these types are string, the expression results in null, so the first element is not included in the result list. The second element resolves to 2 < 1, which is false, so the second element is excluded from the result list. The third expression resolves to 1 < 2 which evalutes to true, so the third element is included in the list. The final result of that expression is [{"a": 1, "b": 2}].

Filtering Semantics

When a filter expression is matched, the matched element in its entirety is included in the filtered response.

Using the previous example, given the following data:

{"foo": [{"state": "WA", "value": 1},
         {"state": "WA", "value": 2},
         {"state": "CA", "value": 3},
         {"state": "CA", "value": 4}]}

The expression foo[?state == \WA`]` will return the following value:

[{"state": "WA", "value": 1}]

Literal Expressions

Literal expressions are also added in the JEP, which is essentially a JSON value surrounded by the “`” character. You can escape the “`” character via “`”, and if the character “`” appears in the JSON value, it must also be escaped. A simple two pass algorithm in the lexer could first process any escaped “`” characters before handing the resulting string to a JSON parser.

Because string literals are by far the most common type of JSON value, an alternate syntax is supported where the starting and ending double quotes are not required for strings. For example:

`foobar`   -> "foobar"
`"foobar"` -> "foobar"
`123`      -> 123
`"123"`    -> "123"
`123.foo`  -> "123.foo"
`true`     -> true
`"true"`   -> "true"
`truee`    -> "truee"

Literal expressions aren’t allowed on the right hand side of a subexpression:

foo[*].`literal`

but they are allowed on the left hand side:

`{"foo": "bar"}`.foo

They may also be included in other expressions outside of a filter expressions. For example:

{value: foo.bar, type: `multi-select-hash`}

Rationale

The proposed filter expression syntax was chosen such that there is sufficient expressive power for any type of filter one might need to perform while at the same time being as minimal as possible. To help illustrate this, below are a few alternate syntax that were considered.

In the simplest case where one might filter a key based on a literal value, a possible filter syntax would be:

foo[bar == baz]

or in general terms: [identifier comparator literal-value]. However this has several issues:

  • It is not possible to filter based on two expressions (get all elements whose foo key equals its bar key.

  • The literal value is on the right hand side, making it hard to troubleshoot if the identifier and literal value are swapped: foo[baz == bar].

  • Without some identifying token unary filters would not be possible as they would be ambiguous. Is the expression [foo] filtering all elements with a foo key with a truth value or is it a multiselect-list selecting the foo key from each hash? Starting a filter expression with a token such as [? make it clear that this is a filter expression.

  • This makes the syntax for filtering against literal JSON arrays and objects hard to visually parse. “Filter all elements whose foo key is a single list with a single integer value of 2: [foo == [2]].

  • Adding literal expressions makes them useful even outside of a filter expression. For example, in a multi-select-hash, you can create arbitrary key value pairs: {a: foo.bar, b: \some string`}`.

This JEP is purposefully minimal. There are several extensions that can be added in future:

  • Support any arbitrary expression within the [? ... ]. This would enable constructs such as or expressions within a filter. This would allow unary expressions.

In order for this to be useful we need to define what corresponds to true and false values, e.g. an empty list is a false value. Additionally, “or expressions” would need to change its semantics to branch based on the true/false value of an expression instead of whether or not the expression evalutes to null.

This is certainly a direction to take in the future, adding arbitrary expressions in a filter would be a backwards compatible change, so it’s not part of this JEP.

  • Allow filter expressions as top level expressions. This would potentially just return true/false for any value that it matched.

This might be useful if you can combine this with something that can accept a list to use as a mask for filtering other elements.

Expression Types

  • JEP: 8
  • Author: James Saryerwinnie
  • Created: 2013-03-02

Abstract

This JEP proposes grammar modifications to JMESPath to allow for expression references within functions. This allows for functions such as sort_by, max_by, min_by. These functions take an argument that resolves to an expression type. This enables functionality such as sorting an array based on an expression that is evaluated against every array element.

Motivation

A useful feature that is common in other expression languages is the ability to sort a JSON object based on a particular key. For example, given a JSON object:

{
  "people": [
       {"age": 20, "age_str": "20", "bool": true, "name": "a", "extra": "foo"},
       {"age": 40, "age_str": "40", "bool": false, "name": "b", "extra": "bar"},
       {"age": 30, "age_str": "30", "bool": true, "name": "c"},
       {"age": 50, "age_str": "50", "bool": false, "name": "d"},
       {"age": 10, "age_str": "10", "bool": true, "name": 3}
  ]
}

It is not currently possible to sort the people array by the age key. Also, sort is not defined for the object type, so it’s not currently possible to even sort the people array. In order to sort the people array, we need to know what key to use when sorting the array.

This concept of sorting based on a key can be generalized. Instead of requiring a key name, an expression can be provided that each element would be evaluated against. In the simplest case, this expression would just be an identifier, but more complex expressions could be used such as foo.bar.baz.

A simple way to accomplish this might be to create a function like this:

sort_by(array arg1, expression)

# Called like:

sort_by(people, age)
sort_by(people, to_number(age_str))

However, there’s a problem with the sort_by function as defined above. If we follow the function argument resolution process we get:

sort_by(people, age)

# 1. resolve people
arg1 = search(people, <input data>) -> [{"age": ...}, {...}]

# 2. resolve age
arg2 = search(age, <input data>) -> null

sort_by([{"age": ...}, {...}], null)

The second argument is evaluated against the current node and the expression age will resolve to null because the input data has no age key. There needs to be some way to specify that an expression should evaluate to an expression type:

arg = search(<some expression>, <input data>) -> <expression: age>

Then the function definition of sort_by would be:

sort_by(array arg1, expression arg2)

Specification

The following grammar rules will be updated to:

function-arg        = expression /
                      current-node /
                      "&" expression

Evaluating an expression reference should return an object of type “expression”. The list of data types supported by a function will now be:

  • number (integers and double-precision floating-point format in JSON)

  • string

  • boolean (true or false)

  • array (an ordered, sequence of values)

  • object (an unordered collection of key value pairs)

  • null

  • expression (denoted by &expression)

Function signatures can now be specified using this new expression type. Additionally, a function signature can specify the return type of the expression. Similarly how arrays can specify a type within a list using the array[type] syntax, expressions can specify their resolved type using expression->type syntax.

Note that any valid expression is allowed after &, so the following expressions are valid:

sort_by(people, &foo.bar.baz)
sort_by(people, &foo.bar[0].baz)
sort_by(people, &to_number(foo[0].bar))

Additional Functions

The following functions will be added:

sort_by

sort_by(array elements, expression->number|expression->string expr)

Sort an array using an expression expr as the sort key. Below are several examples using the people array (defined above) as the given input. sort_by follows the same sorting logic as the sort function.

Examples

ExpressionResult
sort_by(people, &age)[].age[10, 20, 30, 40, 50]
sort_by(people, &age)[0]{"age": 10, "age_str": "10", "bool": true, "name": 3}
sort_by(people, &to_number(age_str))[0]{"age": 10, "age_str": "10", "bool": true, "name": 3}

max_by

max_by(array elements, expression->number expr)

Return the maximum element in an array using the expression expr as the comparison key. The entire maximum element is returned. Below are several examples using the people array (defined above) as the given input.

Examples

ExpressionResult
max_by(people, &age){"age": 50, "age_str": "50", "bool": false, "name": "d"}
max_by(people, &age).age50
max_by(people, &to_number(age_str)){"age": 50, "age_str": "50", "bool": false, "name": "d"},
max_by(people, &age_str)<error: invalid-type>
max_by(people, age)<error: invalid-type>

min_by

min_by(array elements, expression->number expr)

Return the minimum element in an array using the expression expr as the comparison key. The entire maximum element is returned. Below are several examples using the people array (defined above) as the given input.

Examples

ExpressionResult
min_by(people, &age){"age": 10, "age_str": "10", "bool": true, "name": 3}
min_by(people, &age).age10
min_by(people, &to_number(age_str)){"age": 10, "age_str": "10", "bool": true, "name": 3}
min_by(people, &age_str)<error: invalid-type>
min_by(people, age)<error: invalid-type>

Alternatives

There were a number of alternative proposals considered. Below outlines several of these alternatives.

Logic in Argument Resolver

The first proposed choice (which was originally in JEP-3 but later removed) was to not have any syntactic construct for specifying functions, and to allow the function signature to dictate whether or not an argument was resolved. The signature for sort_by would be:

sort_by(array arg1, any arg2)
arg1 -> resolved
arg2 -> not resolved

Then the argument resolver would introspect the argument specification of a function to determine what to do. Roughly speaking, the pseudocode would be:

call-function(current-data)
arglist = []
for each argspec in functions-argspec:
    if argspect.should_resolve:
      arglist <- resolve(argument, current-data)
    else
      arglist <- argument
type-check(arglist)
return invoke-function(arglist)

However, there are several reasons not to do this:

  • This imposes a specific implementation. This implementation would be challenging in a bytecode VM, as the CALL bytecode will typically resolve arguments onto the stack and allow the function to then pop arguments off the stack and perform its own arity validation.

  • This deviates from the “standard” model of how functions are traditionally implemented.

Specifying Expressions as Strings

Another proposed alternative was to allow the expression to be a string type and to give functions the capability to parse/eval expressions. The sort_by function would look like this:

sort_by(people, `age`)
sort_by(people, `foo.bar.baz`)

The main reasons this proposal was not chosen was because:

  • This complicates the implementations. For implementations that walk the AST inline, this means AST nodes need access to the parser. For external tree visitors, the visitor needs access to the parser.

  • This moves what could by a compile time error into a run time error. The evaluation of the expression string happens when the function is invoked.

Improved Filters

  • JEP: 9
  • Author: James Saryerwinnie
  • Created: 2014-07-07

Abstract

JEP 7 introduced filter expressions, which is a mechanism to allow list elements to be selected based on matching an expression against each list element. While this concept is useful, the actual comparator expressions were not sufficiently capable to accomodate a number of common queries. This JEP expands on filter expressions by proposing support for and-expressions, not-expression, paren-expressions, and unary-expressions. With these additions, the capabilities of a filter expression now allow for sufficiently powerful queries to handle the majority of queries.

Motivation

JEP 7 introduced filter queries, that essentially look like this:

foo[?lhs omparator rhs]

where the left hand side (lhs) and the right hand side (rhs) are both an expression, and comparator is one of ==, !=, <, <=, >, >=.

This added a useful feature to JMESPath: the ability to filter a list based on evaluating an expression against each element in a list.

In the time since JEP 7 has been part of JMESPath, a number of cases have been pointed out in which filter expressions cannot solve. Below are examples of each type of missing features.

Or Expressions

First, users want the ability to filter based on matching one or more expressions. For example, given:

{
  "cities": [
    {"name": "Seattle", "state": "WA"},
    {"name": "Los Angeles", "state": "CA"},
    {"name": "Bellevue", "state": "WA"},
    {"name": "New York", "state": "NY"},
    {"name": "San Antonio", "state": "TX"},
    {"name": "Portland", "state": "OR"}
  ]
}

a user might want to select locations on the west coast, which in this specific example means cities in either WA, OR, or CA. It’s not possible to express this as a filter expression given the grammar of expression comparator expression. Ideally a user should be able to use:

cities[?state == `WA` || state == `OR` || state == `CA`]

JMESPath already supports Or expressions, just not in the context of filter expressions.

And Expressions

The next missing feature of filter expressions is support for And expressions. It’s actually somewhat odd that JMESPath has support for Or expressions, but not for And expressions. For example, given a list of user accounts with permissions:

{
  "users": [
    {"name": "user1", "type": "normal"", "allowed_hosts": ["a", "b"]},
    {"name": "user2", "type": "admin", "allowed_hosts": ["a", "b"]},
    {"name": "user3", "type": "normal", "allowed_hosts": ["c", "d"]},
    {"name": "user4", "type": "admin", "allowed_hosts": ["c", "d"]},
    {"name": "user5", "type": "normal", "allowed_hosts": ["c", "d"]},
    {"name": "user6", "type": "normal", "allowed_hosts": ["c", "d"]}
  ]
}

We’d like to find admin users that have permissions to the host named c. Ideally, the filter expression would be:

users[?type == `admin` && contains(allowed_hosts, `c`)]

Unary Expressions

Think of an if statement in a language such as C or Java. While you can write an if statement that looks like:

if (foo == bar) { ... }

You can also use a unary expression such as:

if (allowed_access) { ... }

or:

if (!allowed_access) { ... }

Adding support for unary expressions brings a natural syntax when filtering against boolean values. Instead of:

foo[?boolean_var == `true`]

a user could instead use:

foo[?boolean_var]

As a more realistic example, given a slightly different structure for the users data above:

{
  "users": [
    {"name": "user1", "is_admin": false, "disabled": false},
    {"name": "user2", "is_admin": true, "disabled": true},
    {"name": "user3", "is_admin": false, "disabled": false},
    {"name": "user4", "is_admin": true, "disabled": false},
    {"name": "user5", "is_admin": false, "disabled": true},
    {"name": "user6", "is_admin": false, "disabled": false}
  ]
}

If we want to get the names of all admin users whose account is enabled, we could either say:

users[?is_admin == `true` && disabled == `false]

but it’s more natural and succinct to instead say:

users[?is_admin && !disabled]

A case can be made that this syntax is not strictly necessary. This is true. However, the main reason for adding support for unary expressions in a filter expression is users expect this syntax, and are surprised when this is not a supported syntax. Especially now that we are basically anchoring to a C-like syntax for filtering in this JEP, users will expect unary expressions even more.

Paren Expressions

Once || and && statements have been introduced, there will be times when you want to override the precedence of these operators.

A paren-expression allows a user to override the precedence order of an expression, e.g. (a || b) && c, instead of the default precedence of a || (b && c) for the expression a || b && c.

Specification

There are several updates to the grammar:

and-expression         = expression "&&" expression
not-expression         = "!" expression
paren-expression       = "(" expression ")"

Additionally, the filter-expression rule is updated to be more general:

bracket-specifier      =/ "[?" expression "]"

The list-filter-expr is now a more general comparator-expression:

comparator-expression  = expression comparator expression

which is now just an expression:

expression /= comparator-expression

And finally, the current-node is now allowed as a generic expression:

expression /= current-node

Operator Precedence

This JEP introduces and expressions, which would normally be defined as:

expression     = or-expression / and-expression / not-expression
or-expression  = expression "||" expression
and-expression = expression "&&" expression
not-expression = "!" expression

However, if this current pattern is followed, it makes it impossible to parse an expression with the correct precedence. A more standard way of expressing this would be:

expression          = or-expression
or-expression       = and-expression "||" and-expression
and-expression      = not-expression "&&" not-expression
not-expression      = "!" expression

The precedence for the new boolean expressions matches how most other languages define boolean expressions. That is from weakest binding to tightest binding:

  • Or - ||

  • And - &&

  • Unary not - !

So for example, a || b && c is parsed as a || (b && c) and not (a || b) && c.

The operator precedence list in the specification will now read:

  • Pipe - |

  • Or - ||

  • And - &&

  • Unary not - !

  • Rbracket - ]

Now that these expressions are allowed as general expressions, there semantics outside of their original contexts must be defined.

And Expressions

For reference, the JMESPath spec already defines the following values as “false-like” values:

  • Empty list: []

  • Empty object: {}

  • Empty string: ""

  • False boolean: false

  • Null value: null

And any value that is not a false-like value is a truth-like value.

An and-expression has similar semantics to and expressions in other languages. If the expression on the left hand side is a truth-like value, then the value on the right hand side is returned. Otherwise the result of the expression on the left hand side is returned. This also reduces to the expected truth table:

Truth table for and expressions

LHSRHSResult
TrueTrueTrue
TrueFalseFalse
FalseTrueFalse
FalseFalseFalse

This is the standard truth table for a logical conjunction (AND).

Below are a few examples of and expressions:

Examples

search(True && False, {"True": true, "False": false}) -> false
search(Number && EmptyList, {"Number": 5, EmptyList: []}) -> []
search(foo[?a == `1` && b == `2`],
       {"foo": [{"a": 1, "b": 2}, {"a": 1, "b": 3}]}) -> [{"a": 1, "b": 2}]

Not Expressions

A not-expression negates the result of an expression. If the expression results in a truth-like value, a not-expression will change this value to false. If the expression results in a false-like value, a not-expression will change this value to true.

Examples

search(!True, {"True": true}) -> false
search(!False, {"False": false}) -> true
search(!Number, {"Number": 5}) -> false
search(!EmptyList, {"EmptyList": []}) -> true

Paren Expressions

A paren-expression allows a user to override the precedence order of an expression, e.g. (a || b) && c.

Examples

search(foo[?(a == `1` || b ==`2`) && c == `5`],
       {"foo": [{"a": 1, "b": 2, "c": 3}, {"a": 3, "b": 4}]}) -> []

Rationale

This JEP brings several tokens that were only allowed in specific constructs into the more general expression rule. Specifically:

  • The current-node (@) was previously only allowed in function expressions, but is now allowed as a general expression.

  • The filter-expression now accepts any arbitrary expression.

  • The list-filter-expr is now just a generic comparator-expression, which again is just a general expression.

There are several reasons the previous grammar rules were minimally scoped. One of the main reasons, as stated in JEP 7 which introduced filter expressions, was to keep the spec “purposefully minimal.” In fact the end of JEP 7 states that there “are several extensions that can be added in future.” This is in fact exactly what this JEP proposes, the recommendations from JEP 7.

Slice Projections

  • JEP: 10
  • Author: James Saryerwinnie
  • Created: 2015-02-08

Abstract

This document proposes modifying the semantics of slice expressions to create projections, which brings consistency with the wildcard, flattening, and filtering projections.

Motivation

JEP 5 introduced slice expressions. This added python slice semantics to JSON. Slicing does not produce a projection so expressions such as the following will always return null: myarray[:10].foo.bar.

Instead if you wanted to access foo.bar for each element in the array slice you currently have to write myarray[:10][*].foo.bar.

This JEP proposes that a slice expression will create a projection.

Rationale

A reasonable objection to this JEP is that this is unnecessary because, as shown in the example above, you can take any slice and create a projection via [\*]. This is entirely true, unlike other JEPs, this JEP does not enable any behavior that was previously not possible.

Instead, the main reason for this JEP is for consistency. Right now there are three types of array projections:

  • List Projections (foo[*].bar)

  • Filter Projections (foo[?a==b].bar)

  • Flatten Projections (foo[].bar)

Note the general form, foo[<stuff here>].<child-expr>. Each of the existing array projections have the same general semantics:

  • Take the left hand side, which is a list, and produce another list as a result of evaluating the left hand side. This newly produced list will contain elements of the original input (or elements of the elements of the original input in the case of the flatten projection).

  • Evaluate the right hand side against each element in the list produced from evaluating the left hand side.

So in general, the left hand side is responsible for creating a new list but not for manipulating individual elements of the list. The right hand side is for manipulating individual elements of the list. In the case of the list projection, every element from the original list is used. In the case of a filter projection, only elements matching an expression are passed to the right hand side. In the case of a flatten projection, sub arrays are merged before passing the expression onto the right hand side.

It’s a reasonable expectation that slices behave similar. After all, slices take an array and produce a sub array. It many ways, it’s very similar to filter projections. While filter projections only include elements that match a particular expression, slice projections only include elements from and to a specific index. Given its semantics are so close to the filter projections, slices should create projections to be consistent.

Specification

Whenver a slice is created, a projection will be created. This will be the fourth type of array projection in JMESPath. In addition to the existing array projections:

  • List Projections

  • Flatten Projections

  • Filter Projections

A new projection type, the slice projection will be added. A slice projection is evaluated similar to the other array projections. Given a slice projection which contains a left hand side containing the slice expression and a right hand side, the slice expression is evaluated to create a new sub array, and each expression on the right hand side is evaluted against each element from the array slice to create the final result.

This JEP does not include any modifications to the JMESPath grammar.

Impact

The impact to existing users of slices is minimal. Consider:

  • Existing expressions such as foo[:10].bar are currently returning null. Now they will return non null values.

  • The only impact to existing users is if someone had an expression such as foo[:10][0], which given the projection semantics will now create a list containing the 0th element from each sublist. Before this JEP, that expression is equivalent to foo[0] so the slice is unnecessary. And any users that actually had expressions like this can now just use foo[0] instead.

Lexical Scoping

  • JEP: 11
  • Author: James Saryerwinnie
  • Created: 2015-02-24

Abstract

This JEP proposes a new function let() (originally proposed by Michael Dowling) that allows for evaluating an expression with an explicitly defined lexical scope. This will require some changes to the lookup semantics in JMESPath to introduce scoping, but provides useful functionality such as being able to refer to elements defined outside of the current scope used to evaluate an expression.

Motivation

As a JMESPath expression is being evaluated, the current element, which can be explicitly referred to via the @ token, changes as expressions are evaluated. Given a simple sub expression such as foo.bar, first the foo expression is evaluted with the starting input JSON document, and the result of that expression is then used as the current element when the bar element is evaluted. Conceptually we're taking some object, and narrowing down its current element as the expression is evaluted.

Once we've drilled down to a specific current element, there is no way, in the context of the currently evaluated expression, to refer to any elements outside of that element. One scenario where this is problematic is being able to refer to a parent element.

For example, suppose we had this data:

    {"first_choice": "WA",
     "states": [
       {"name": "WA", "cities": ["Seattle", "Bellevue", "Olympia"]},
       {"name": "CA", "cities": ["Los Angeles", "San Francisco"]},
       {"name": "NY", "cities": ["New York City", "Albany"]},
     ]
    }

Let's say we wanted to get the list of cities of the state corresponding to our first_choice key. We'll make the assumption that the state names are unique in the states list. This is currently not possible with JMESPath. In this example we can hard code the state WA:

    states[?name==`WA`].cities

but it is not possible to base this on a value of first_choice, which comes from the parent element. This JEP proposes a solution that makes this possible in JMESPath.

Specification

There are two components to this JEP, a new function, let(), and a change to the way that identifiers are resolved.

The let() Function

The let() function is heavily inspired from the let function commonly seen in the Lisp family of languages:

The let function is defined as follows:

    any let(object scope, expression->any expr)

let is a function that takes two arguments. The first argument is a JSON object. This hash defines the names and their corresponding values that will be accessible to the expression specified in the second argument. The second argument is an expression reference that will be evaluated.

Resolving Identifiers

Prior to this JEP, identifiers are resolved by consulting the current context in which the expression is evaluted. For example, using the same search function as defined in the JMESPath specification, the evaluation of:

    search(foo, {"foo": "a", "bar": "b"}) -> "a"

will result in the foo identifier being resolved in the context of the input object {"foo": "a", "bar": "b"}. The context object defines foo as a, which results in the identifier foo being resolved as a.

In the case of a sub expression, where the current evaluation context changes once the left hand side of the sub expression is evaluted:

    search(a.b, {"a": {"b": "y"}) -> "y"

The identifier b is resolved with a current context of {"b": "y"}, which results in a value of y.

This JEP adds an additional step to resolving identifiers. In addition to the implicit evaluation context that changes based on the result of continually evaluating expressions, the let() command allows for additional contexts to be specified, which we refer to by the common name scope. The steps for resolving an identifier are:

  • Attempt to lookup the identifier in the current evaluation context.
  • If this identifier is not resolved, look up the value in the current scope provided by the user.
  • If the idenfitier is not resolved and there is a parent scope, attempt to resolve the identifier in the parent scope. Continue doing this until there is no parent scope, in which case, if the identifier has not been resolved, the identifier is resolved as null.

Parent scopes are created by nested let() calls.

Below are a few examples to make this more clear. First, let's examine the case where the identifier can be resolved from the current evaluation context:

    search(let({a: `x`}, &b), {"b": "y"}) -> "y"

In this scenario, we are evaluating the expression b, with the context object of {"b": "y"}. Here b has a value of y, so the result of this function is y.

Now let's look at an example where an identifier is resolved from a scope object provided via let():

    search(let({a: `x`}, &a, {"b": "y"})) -> "x"

Here, we're trying to resolve the a identifier. The current evaluation context, {"b": "y"}, does not define a. Normally, this would result in the identifier being resolved as null:

    search(a, {"b": "y"}) -> null

However, we now fall back to looking in the provided scope object {"a": "x"}, which was provided as the first argument to let. Note here that the value of a has a value of "x", so the identifier is resolved as "x", and the return value of the let() function is "x".

Finally, let's look at an example of parent scopes. Consider the following expression:

    search(let({a: `x`}, &let({b: `y`}, &{a: a, b: b, c: c})),
           {"c": "z"}) -> {"a": "x", "b": "y", "c": "z"}

Here we have nested let calls, and the expression we are trying to evaluate is the multiselect hash {a: a, b: b, c: c}. The c identifier comes from the evaluation context {"c": "z"}. The b identifier comes from the scope object in the second let call: {b: `y`}. And finally, here's the lookup process for the a identifier:

  • Is a defined in the current evaluation context? No.
  • Is a defined in the scope provided by the user? No.
  • Is there a parent scope? Yes
  • Does the parent scope, {a: `x`}, define a? Yes, a has the value of "x", so a is resolved as the string "x".

Current Node Evaluation

While the JMESPath specification defines how the current node is determined, it is worth explicitly calling out how this works with the let() function and expression references. Consider the following expression:

    a.let({x: `x`}, &b.let({y: `y`}, &c))

Given the input data:

    {"a": {"b": {"c": "foo"}}}

When the expression c is evaluated, the current evaluation context is {"c": "foo"}. This is because this expression isn't evaluated until the second let() call evaluates the expression, which does not occur until the first let() function evaluates the expression.

Motivating Example

With these changes defined, the expression in the "Motivation" section can be be written as:

    let({first_choice: first_choice}, &states[?name==first_choice].cities)

Which evalutes to ["Seattle", "Bellevue", "Olympia"].

Rationale

If we just consider the feature of being able to refer to a parent element, this approach is not the only way to accomplish this. We could also allow for explicit references using a specific token, say $. The original example in the "Motivation" section would be:

    states[?name==$.first_choice].cities

While this could work, this has a number of downsides, the biggest one being that you'll need to always keep track of the parent element. You don't know ahead of time if you're going to need the parent element, so you'll always need to track this value. It also doesn't handle nested lexical scopes. What if you wanted to access a value in the grand parent element? Requiring an explicit binding approach via let() handles both these cases, and doesn't require having to track parent elements. You only need to track additional scope when let() is used.

Raw String Literals

  • JEP: 12
  • Author: Michael Downling
  • Created: 2015-04-09

Abstract

This JEP proposes the following modifications to JMESPath in order to improve the usability of the language and ease the implementation of parsers:

  • Addition of a raw string literal to JMESPath that will allow expressions to contain raw strings that are not mutated by JSON escape sequences (e.g., “\n”, “\r”, “\u005C”).

  • Deprecation of the current literal parsing behavior that allows for unquoted JSON strings to be parsed as JSON strings, removing an ambiguity in the JMESPath grammar and helping to ensure consistency among implementations.

This proposal seeks to add the following syntax to JMESPath:

'foobar'
'foo\'bar'
`bar` -> Parse error/warning (implementation specific)

Motivation

Raw string literals are provided in various programming languages in order to prevent language specific interpretation (i.e., JSON parsing) and remove the need for escaping, avoiding a common problem called leaning toothpick syndrome (LTS). Leaning toothpick syndrome is an issue in which strings become unreadable due to excessive use of escape characters in order to avoid delimiter collision (e.g., \\\\\\\\\\\\).

When evaluating a JMESPath expression, it is often necessary to utilize string literals that are not extracted from the data being evaluated, but rather statically part of the compiled JMESPath expression. String literals are useful in many areas, but most notably when invoking functions or building up multi-select lists and hashes.

The following expression returns the number of characters found in the string "foo". When parsing this expression, `"foo"` is parsed as a JSON value which produces the string literal value of foo:

`"foo"`

The following expression is functionally equivalent. Notice that the quotes are elided from the JSON literal:

`foo`

These string literals are parsed using a JSON parser according to RFC 4627, which will expand unicode escape sequences, newline characters, and several other escape sequences documented in RFC 4627 section 2.5.

For example, the use of an escaped unicode value \\u002B is expanded into + in the following JMESPath expression:

`"foo\u002B"` -> "foo+"

You can escape escape sequences in JSON literals to prevent an escape sequence from being expanded:

`"foo\\u002B"` -> "foo\u002B"
`foo\\u002B` -> "foo\u002B"

While this allows you to provide literal strings, it presents the following problems:

  1. Incurs an additional JSON parsing penalty.

  2. Requires the cognitive overhead of escaping escape characters if you actually want the data to be represented as it was literally provided (which can lead to LTS). If the data being escaped was meant to be used along with another language that uses \\ as an escape character, then the number of backslash characters doubles.

  3. Introduces an ambiguous rule to the JMESPath grammar that requires a prose based specification to resolve the ambiguity in parser implementations.

The relevant literal grammar rules are currently defined as follows:

literal = "`" json-value "`"
literal =/ "`" 1*(unescaped-literal / escaped-literal) "`"
unescaped-literal = %x20-21 /       ; space !
                        %x23-5B /   ; # - [
                        %x5D-5F /   ; ] ^ _
                        %x61-7A     ; a-z
                        %x7C-10FFFF ; |}~ ...
escaped-literal   = escaped-char / (escape %x60)
json-value = false / null / true / json-object / json-array /
             json-number / json-quoted-string
false = %x66.61.6c.73.65   ; false
null  = %x6e.75.6c.6c      ; null
true  = %x74.72.75.65      ; true
json-quoted-string = %x22 1*(unescaped-literal / escaped-literal) %x22
begin-array     = ws %x5B ws  ; [ left square bracket
begin-object    = ws %x7B ws  ; { left curly bracket
end-array       = ws %x5D ws  ; ] right square bracket
end-object      = ws %x7D ws  ; } right curly bracket
name-separator  = ws %x3A ws  ; : colon
value-separator = ws %x2C ws  ; , comma
ws              = *(%x20 /              ; Space
                    %x09 /              ; Horizontal tab
                    %x0A /              ; Line feed or New line
                    %x0D                ; Carriage return
                   )
json-object = begin-object [ member *( value-separator member ) ] end-object
member = quoted-string name-separator json-value
json-array = begin-array [ json-value *( value-separator json-value ) ] end-array
json-number = [ minus ] int [ frac ] [ exp ]
decimal-point = %x2E       ; .
digit1-9 = %x31-39         ; 1-9
e = %x65 / %x45            ; e E
exp = e [ minus / plus ] 1*DIGIT
frac = decimal-point 1*DIGIT
int = zero / ( digit1-9 *DIGIT )
minus = %x2D               ; -
plus = %x2B                ; +
zero = %x30                ; 0

The literal rule is ambiguous because unescaped-literal includes all of the same characters that json-value match, allowing any value that is valid JSON to be matched on either unescaped-literal or json-value.

Rationale

When implementing parsers for JMESPath, one must provide special case parsing when parsing JSON literals due to the allowance of elided quotes around JSON string literals (e.g., `foo`). This specific aspect of JMESPath cannot be described unambiguously in a context free grammar and could become a common cause of errors when implementing JMESPath parsers.

Parsing JSON literals has other complications as well. Here are the steps needed to currently parse a JSON literal value in JMESPath:

  1. When a ` token is encountered, begin parsing a JSON literal.

  2. Collect each character between the opening ` and closing ` tokens, including any escaped ` characters (i.e., \` ) and store the characters in a variable (let’s call it $lexeme).

  3. Copy the contents of $lexeme to a temporary value in which all leading and trailing whitespace is removed. Let’s call this $temp (this is currently not documented but required in the JMESPath compliance tests).

  4. If $temp can be parsed as valid JSON, then use the parsed result as the value for the literal token.

  5. If $temp cannot be parsed as valid JSON, then wrap the contents of $lexeme in double quotes and parse the wrapped value as a JSON string, making the following expressions equivalent: `foo` == `"foo"`, and `[1, ]` == `"[1, ]"`.

It is reasonable to assume that the most common use case for a JSON literal in a JMESPath expression is to provide a string value to a function argument or to provide a literal string value to a value in a multi-select list or multi-select hash. In order to make providing string values easier, it was decided that JMESPath should allow the quotes around the string to be elided.

This proposal posits that allowing quotes to be elided when parsing JSON literals should be deprecated in favor of adding a proper string literal rule to JMESPath.

Specification

A raw string literal is value that begins and ends with a single quote, does not interpret escape characters, and may contain escaped single quotes to avoid delimiter collision.

Examples

Here are several examples of valid raw string literals and how they are parsed:

  • A basic raw string literal, parsed as foo bar:
'foo bar'
  • An escaped single quote, parsed as foo'bar:
'foo\'bar'
  • A raw string literal that contains new lines:
'foo
bar
baz!'

The above expression would be parsed as a string that contains new lines:

foo
baz
bar!
  • A raw string literal that contains escape characters, parsed as foo\\nbar:
foo\nbar

ABNF

The following ABNF grammar rules will be added, and is allowed anywhere an expression is allowed:

raw-string        = "'" *raw-string-char "'"
; The first grouping matches any character other than "\"
raw-string-char   = (%x20-26 / %x28-5B / %x5D-10FFFF) / raw-string-escape
raw-string-escape = escape ["'"]

This rule allows any character inside of a raw string, including an escaped single quote.

In addition to adding a raw-string rule, the literal rule in the ABNF will be updated to become:

literal = "`" json-value "`"

Impact

The impact to existing users of JMESPath is that the use of a JSON literal in which the quotes are elided SHOULD be converted to use the string-literal rule of the grammar. Whether or not this conversion is absolutely necessary will depend on the specific JMESPath implementation.

Implementations MAY choose to support the old syntax of allowing elided quotes in JSON literal expressions. If an implementation chooses this approach, the implementation SHOULD raise some kind of warning to the user to let them know of the deprecation and possible incompatibility with other JMESPath implementations.

In order to support this type of variance in JMESPath implementations, all of the JSON literal compliance test cases that involve elided quotes MUST be removed, and test cases regarding failing on invalid unquoted JSON values MUST not be allowed in the compliance test unless placed in a JEP 12 specific test suite, allowing implementations that support elided quotes in JSON literals to filter out the JEP 12 specific test cases.

Alternative approaches

There are several alternative approaches that could be taken.

Leave as-is

This is a valid and reasonable suggestion. Leaving JMESPath as-is would avoid a breaking change to the grammar and users could continue to use multiple escape characters to avoid delimiter collision.

The goal of this proposal is not to add functionality to JMESPath, but rather to make the language easier to use, easier to reason about, and easier to implement. As it currently stands, the behavior of JSON parsing is ambiguous and requires special casing when implementing a JMESPath parser. It also allows for minor differences in implementations due to this ambiguity.

Take the following example:

`[1`

One implementation may interpret this expression as a JSON string with the string value of "[1", while other implementations may raise a parse error because the first character of the expression appears to be valid JSON.

By updating the grammar to require valid JSON in the JSON literal token, we can remove this ambiguity completely, removing a potential source of inconsistency from the various JMESPath implementations.

Disallow single quotes in a raw string

This proposal states that single quotes in a raw string literal must be escaped with a backslash. An alternative approach could be to not allow single quotes in a raw string literal. While this would simplify the raw-string grammar rule, it would severely limit the usability of the raw-string rule, forcing users to use the literal rule.

Use a customizable delimiter

Several languages allow for a custom delimiter to be placed around a raw string. For example, Lua allows for a long bracket notation in which raw strings are surrounded by [[]] with any number of balanced = characters between the brackets:

[==[foo=bar]==] -- parsed as "foo=bar"

This approach is very flexible and removes the need to escape any characters; however, this can not be expressed in a regular grammar. A parser would need to keep track of the number of opened delimiters and ensure that it is closed with the appropriate number of matching characters.

The addition of a string literal as described in this JEP does not preclude a later addition of a heredoc or delimited style string literal as provided by languages like Lua, D, C++, etc…

Lexical Scoping

  • JEP: 18
  • Author: @jamesls
  • Created: 2023-03-21

Abstract

This JEP proposes the introduction of lexical scoping using a new let expression. You can now bind variables that are evaluated in the context of a given lexical scope. This enables queries that can refer to elements defined outside of their current element, which is not currently possible. This JEP supercedes JEP 11, which proposed similar functionality through a let() function.

Motivation

A JMESPath expression is always evaluated in the context of a current element, which can be explicitly referred to via the @ token. The current element changes as expressions are evaluated. For example, suppose we had the expression foo.bar[0] that we want to evalute against an input document of:

{"foo": {"bar": ["hello", "world"]}, "baz": "baz"}

The expression, and the associated current element are evaluated as follows:

# Start
expression = foo.bar[0]
@ = {"foo": {"bar": ["hello", "world"]}, "baz": "baz"}

# Step 1
expression = foo
@ = {"foo": {"bar": ["hello", "world"]}, "baz": "baz"}
result = {"bar": ["hello", "world"]}

# Step 2
expression = bar
@ = {"bar": ["hello", "world"]}
result = ["hello", "world"]

# Step 3
expression = [0]
@ = ["hello", "world"]
result = "hello"

The end result of evaluating this expression is "hello". Note that each step changes the values that are accessible to the current expression being evaluated. In "Step 2", it is not possible for the expression to reference the value of "baz" in the current element of the previous step, "Step 1".

This ability to reference variables in a parent scope is a serious limitation of JMESPath, and anecdotally is one of the commonly requested features of the language. Below are examples of input documents and the desired output documents that aren't possible to create with the current version of JMESPath:

Input:

[
  {"home_state": "WA",
   "states": [
     {"name": "WA", "cities": ["Seattle", "Bellevue", "Olympia"]},
     {"name": "CA", "cities": ["Los Angeles", "San Francisco"]},
     {"name": "NY", "cities": ["New York City", "Albany"]}
   ]
  },
  {"home_state": "NY",
   "states": [
     {"name": "WA", "cities": ["Seattle", "Bellevue", "Olympia"]},
     {"name": "CA", "cities": ["Los Angeles", "San Francisco"]},
     {"name": "NY", "cities": ["New York City", "Albany"]}
   ]
  }
]


(for each list in "states", select the list of cities associated
 with the state defined in the "home_state" key)

Output:

[
  ["Seattle", "Bellevue", "Olympia"],
  ["New York City", "Albany"]
]
Input:
{"imageDetails": [
  {
    "repositoryName": "org/first-repo",
    "imageTags": ["latest", "v1.0", "v1.2"],
    "imageDigest": "sha256:abcd"
  },
  {
    "repositoryName": "org/second-repo",
    "imageTags": ["v2.0", "v2.2"],
    "imageDigest": "sha256:efgh"
  },
]}


(create a list of pairs containing an image tag and its associated repo name)

Output:

[
  ["latest", "org/first-repo"],
  ["v1.0", "org/first-repo"],
  ["v1.2", "org/first-repo"],
  ["v2.0", "org/second-repo"],
  ["v2.2", "org/second-repo"],
]

In order to support these queries we need some way for an expression to reference values that exist outside of its implicit current element.

Specification

A new "let expression" is added to the language. The expression has the format: let <bindings> in <expr>. The updated grammar rules in ABNF are:

let-expression = "let" bindings "in" expression
bindings = variable-binding *( "," variable-binding )
variable-binding = variable-ref "=" expression
variable-ref = "$" unquoted-string

The let-expression and variable-ref rule are also added as a new expression types:

expression =/ let-expression / variable-ref

Examples of this new syntax:

  • let $foo = bar in {a: myvar, b: $foo}
  • let $foo = baz[0] in bar[? baz == $foo ] | [0]
  • let $a = b, $c = d in bar[*].[$a, $c, foo, bar]

It's worth noting that this is the first JEP to introduce keywords into the language: the let and in keywords. These are not reserved keywords, these words can continue to be used as identifiers in expressions. There are no backwards incompatible changes being proposed with this JEP. The grammar rules unambiguously describe whether let is meant to be interpreted as a keyword or as an identifier (often referred to as contextual keywords).

New evaluation rules

Let expressions are evaluated as follows.

Given the rule "let" bindings "in" expression, the bindings rule is processed first. Each variable-binding within the bindings rule defines the name of a variable and an expression. Each expression is evaluated, and the result of this evaluation is then bound to the associated variable name.

Once all the variable-binding rules have been processed, the associated expression clause of the let expression is then evaluated. During the evaluation of the expression, any references, via the variable-ref rule, to a variable name will evaluate to the value bound to the name. Once the associated expression has been evaluated, the let expression itself evaluates to the result of this expression. After the let expression has been evaluated, the variable bindings associated with the let expression are no longer valid. This is also referred to as the visibility of a binding; the bindings of a let expression are only visible during the evaluation of the expression clause of the let expression.

When evaluating the bindings rule, a variable-binding for a variable name that is already visible in the current scope will replace the existing binding when evaluating the expression clause of the let expression. This means in the context of nested let expressions (and consequently nested scopes), a variable in an inner scope can shadow a variable defined in an outer scope.

If a variable-ref references a variable that has not been defined, the evaluation of that variable-ref will trigger an undefined-variable error. This error MUST occur when the expression is evaluated and not at compile time. This is to enable implementations to define an implementation specific mechanism for defining an initial or "global" scope. Implementations are free to offer a "strict" compilation mode that a user can opt into, but MUST support triggering an undefined-variable error only when the variable-ref is evaluated.

Note that when evaluating the bindings rule, the expression bound to a variable is completely evaluated before binding to the variable. Any references to the variable are replaced with the result of this evaluation, the expression is not re-evaluated. This is worth clarifying specifically for projections (wildcard expressions, the flatten operator, slices and filter expressions). If the expression being bound is a projection, the evaluation of this expression effectively stops the projection. This means subsequent references using the variable-ref MUST NOT continue projecting to child expressions. For example, this is the behavior for a projection:

search(
  foo[*][0]
  {"foo": [[0, 1], [2, 3], [4, 5]]}
) -> [0, 2, 4]

And this is the behavior when assigning a variable to a projection:

search(
  let $foo = foo[*]
  in
    $foo[0]
  {"foo": [[0, 1], [2, 3], [4, 5]]}
) -> [0, 1]

In the first example, the [0] expression is projected onto each element in the list, returning the first element of each sub list: [0, 2, 4]. In the second example, the foo[*] expression is evaluated to [[0, 1], [2, 3], [4, 5]] and assigned to the variable $foo. The projection expression evaluation is complete, and the projection is stopped. Evaluating the expression $foo[0] results in the variable $foo being replaced with its bound value of [[0, 1], [2, 3], [4, 5]], so the entire expression becomes [[0, 1], [2, 3], [4, 5]][0], which returns the first element in the list which is [0, 1].

Examples

Basic examples demonstrating core functionality.

search(let $foo = foo in $foo, {"foo": "bar"}) -> "bar"
search(let $foo = foo.bar in $foo, {"foo": {"bar": "baz"}}) -> "baz"
search(let $foo = foo in [$foo, $foo], {"foo": "bar"}) -> ["bar", "bar"]

Nested bindings.

search(
  let $a = a
  in
    b[*].[a, $a, let $a = 'shadow' in $a],
  {"a": "topval", "b": [{"a": "inner1"}, {"a": "inner2"}]}
) -> [["inner1", "topval", "shadow"], ["inner2", "topval", "shadow"]]

Error cases.

search($foo, {}) -> <error: undefined-variable>
search([let $foo = 'bar' in $foo, $foo], {}) -> <error: undefined-variable>

Rationale

Note: see previous discussion for more background.

Introducing keywords into the language

The let expression proposed in this JEP is based off of similar constructs in existing programming languages:

It was important to borrow from existing syntax and semantics. Lexical scoping is a familiar concept to developers, so care was taken to be consistent with the mental model that developers already have.

Alternatives were considered that avoided introducing new keywords into the language (this proposal adds the first keyword to the language). These included some variation that approximated defining an anonymous function with arguments, e.g.:

|foo, bar| => {$foo: a, $bar: b}

The reason for not going with this approach is that adding the ability to define functions is a large feature that will take considerable effort to design. This may be something to consider in the future, but it's a larger scope than introducing lexical scoping and made the most sense to address separately. We'd also need to introduce not only defining anonymous functions with arguments, but also a mechanism to invoke such functions. You can then create lexical scope by defining a function and immediately invoking it. For example, in javascript it would look like this:

(({x, y}) => ([x, y]))(
    {x: "foo", y: "bar"}
);

This was considered too verbose for such a common use case of defining variables. It makes sense that a dedicated, more succinct syntax was preferred, as many languages have a dedicated let syntax for defining variables.

Backwards compatibility concern

Languages will often design keywords as reserved words that can't be used as variable names or other identifiers. This helps to provide clarity because the reader knows that the keyword can only have a single meaning. This is possible to do when you first design the language, or if you are willing to introduce breaking changes into the language. JMESPath instead takes an alternate approach of introducing keywords that can be inferred from the context in which they're used, which is known as contextual keywords. There are other languages that also take this approach, such as C# <https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/#contextual-keywords>__.

In order to do this, the updated grammar rules must be chosen to avoid any ambiguity when parsing expressions. This may limit the syntax and location where the new keywords could be used, so the tradeoffs of adding a new keyword must be considered carefully.

We should be wary of adding new keywords to JMESPath, and only do so when there is a strong rationale for doing so. let is one such case, as detailed in this section.

Adding a sigil for variable references

One of the changes from an earlier proposal of this feature (JEP-11) is that this proposal adds explicit syntax for variable references via the $foo syntax. The lookup process between the expression foo and $foo are fundamentally different types of lookup. One searches through values from the implicit current element and one is a lookup in the lexical scope.

Not having a syntactic difference creates ambiguity regarding the intended type of lookup by the user. This also prevents defining where scoped lookups are allowed through the grammar. For example, in the expression foo.bar it is unclear whether bar refers to a lookup in the current element or the lexical scope. Having explicit syntax removes this ambiguity, allowing a user to explicitly state their intent. It also enables distinct error conditions.

A reference to a non-existent variable is an error, as the user provided explicit syntax stating that they expect the variable to exist. The variable not existing is result of the user not binding the variable name at some point, which is an error. Conversely, an expression evaluated against the current element results in null if the key does not exist. This is because the query is being evaluated against an input JSON document, and we don't know what keys may or may not be present.

Multiple assignments with commas

Assigning multiple variables is done through comma separated variable-binding rules, e.g. $foo = foo, $bar = bar. An alternative considered was to use syntax similar to Javascript's object destructuring:

let {$foo, $bar} = @ in ...

There are several reasons this alternative was not chosen:

  • This requires multiple assignments to come from an object type, which might require a user to unnecessarily create a multi-select-hash in order to assign multiple variables.
  • Destructuring binds to top level values in an object, and does not allow for a single binding to evaluate to an expression, without having to again preconstruct that value via a multi-select-hash.
  • Object destructuring is an additive change. Nothing in this JEP precludes this addition in the future, e.g.:
let {$foo, $bar} = @ in ...
let {$foo, $bar} = @, $baz = a.b.c in ...

Unbound values error at evaluation time not at compile time

The JEP also requires that unbound values error at evaluation time, not at compile time. This enables implementations to bind an initial (global) scope when a query is evaluated. This is something that other query languages provide, and is useful to define fixed queries that only vary by the value of specific variables. We'll look at a few examples.

First, consider a command line utility, let's called it jp, that accepts a path to a file containing a JMESPath query and reads an input JSON document through stdin. This command line utility could offer a --params option that allows a user to pass in an initial scope. For example:

myquery.jmespath

results[*].[name, uuid, $hostname]

A user could then use this CLI to retrieve JSON data and filter it:

$ curl https://myapi/info/$HOSTNAME | \
    jp --filename myquery.jmespath --params '{"hostname": "$HOSTNAME"}'

In this case the JMESPath expression does not need to change and can be shared with other people, and still include data that's specific to your machine.

Another example would be where JMESPath is used in some shared definition file. Suppose we had a file that defines how to make an API request, and specifies a condition we'd like to meet based on the output response. We want to describe that the expected output depends on the input provided. This is how we can describe this:

{"GroupActive": {
   "operation": "DescribeGroups",
   "acceptors": {
     "argument": "Response[].[length(Instances[?State=='Active']) == length($params.GroupNames)",
     "matcher": "path"
   }
}}

This is saying that we should invoke the DescribeGroups operations with a list of group names, and that we want to check that the response contains a list of Instances with State == 'Active' whose length matches the length of the params group names. You could now bind the user provided params as the initial scope of {"params": inputParams} and code generate something like this (using the python JMESPath library in this example):

def wait(user_params):
  response = client.DescribeGroups(user_params)
  expected = jmespath.compile(
    "Response[].[length(Instances[?State=='Active']) "
    "== length($params.GroupNames)"
  )
  result = expected.search(
    response,
    # This is the new part, give queries access to the user params
    # via the $params variable.
    scope={'params': user_params},
)
  if result:
    return "SomeSuccessResponse"
  return "SomeFailureResponse"

# User can invoke this via:
wait({"GroupNames": ["group1", "group2", "group3"]})

This JEP does not require that implementations provide this capability of passing in an initial scope, but by requiring that undefined variable references are runtime errors it enables implementations to provide this capability. Implementations are also free to provide an opt-in "strict" mode that can fail at compile time if a user knows they will not be providing an initial scope.

Testcases

Basic expressions

# Basic expressions
- given:
    foo:
      bar: baz
  cases:
    - expression: "let $foo = foo in $foo"
      result:
        bar: baz
    - expression: "let $foo = foo.bar in $foo"
      result: "baz"
    - expression: "let $foo = foo.bar in [$foo, $foo]"
      result: ["baz", "baz"]
    - comment: "Multiple assignments"
      expression: "let $foo = 'foo', $bar = 'bar' in [$foo, $bar]"
      result: ["foo", "bar"]
# Nested expressions
- given:
    a: topval
    b:
      - a: inner1
      - a: inner2
  cases:
    - expression: "let $a = a in b[*].[a, $a, let $a = 'shadow' in $a]"
      result:
        - ["inner1", "topval", "shadow"]
        - ["inner2", "topval", "shadow"]
    - comment: Bindings only visible within expression clause
      expression: "let $a = 'top-a' in let $a = 'in-a', $b = $a in $b"
      result: "top-a"
# Let as valid identifiers
- given:
    let:
      let: let-val
      in: in-val
  cases:
    - expression: "let $let = let in {let: let, in: $let}"
      result:
        let:
          let: let-val
          in: in-val
        in:
          let: let-val
          in: in-val
    - expression: "let $let = 'let' in { let: let, in: $let }"
      result:
        let:
          let: let-val
          in: in-val
        in: "let"
    - expression: "let $let = 'let' in { let: 'let', in: $let }"
      result:
        let: "let"
        in: "let"
# Projections stop
- given:
    foo: [[0, 1], [2, 3], [4, 5]]
  cases:
    - comment: Projection is stopped when bound to variable
      expression: "let $foo = foo[*] in $foo[0]"
      result: [0, 1]
# Examples from Motivation section
- given:
    - home_state: WA
      states:
        - name: WA
          cities: ["Seattle", "Bellevue", "Olympia"]
        - name: CA
          cities: ["Los Angeles", "San Francisco"]
        - name: NY
          cities: ["New York City", "Albany"]
    - home_state: NY
      states:
        - name: WA
          cities: ["Seattle", "Bellevue", "Olympia"]
        - name: CA
          cities: ["Los Angeles", "San Francisco"]
        - name: NY
          cities: ["New York City", "Albany"]
  cases:
    - expression: "[*].[let $home_state = home_state in states[? name == $home_state].cities[]][]"
      result:
        - ["Seattle", "Bellevue", "Olympia"]
        - ["New York City", "Albany"]
- given:
    imageDetails:
      - repositoryName: "org/first-repo"
        imageTags:
          - latest
          - v1.0
          - v1.2
        imageDigest: "sha256:abcd"
      - repositoryName: "org/second-repo"
        imageTags:
          - v2.0
          - v2.2
        imageDigest: "sha256:efgh"
  cases:
    - expression: >
        imageDetails[].[
          let $repo = repositoryName,
              $digest = imageDigest
          in
            imageTags[].[@, $digest, $repo]
        ][][]
      result:
        - ["latest", "sha256:abcd", "org/first-repo"]
        - ["v1.0", "sha256:abcd", "org/first-repo"]
        - ["v1.2", "sha256:abcd", "org/first-repo"]
        - ["v2.0", "sha256:efgh", "org/second-repo"]
        - ["v2.2", "sha256:efgh", "org/second-repo"]
# Errors
- given: {}
  cases:
    - expression: "$noexist"
      error: "undefined-variable"
    - comment: Reference out of scope variable
      expression: "[let $scope = 'foo' in [$scope], $scope]"
      error: "undefined-variable"
    - comment: Can't use var ref in RHS of subexpression
      expression: "foo.$bar"
      error: "syntax"