JMESPath Enhancement Proposals
The JMESPath Enhancement Proposals (JEP) process is used to modify the JMESPath language and specification. There are implementations of JMESPath in over 10 languages, and this process ensures stakeholders and community members have the opportunity to review and provide feedback before it's officially part of the specification.
You can see the list of accepted JEPs at:
https://jmespath.github.io/jmespath.jep/
Things that need a JEP
Any functional change that would require an update to the specification requires a JEP.
This includes, but is not limited to:
- New syntax
- New functions
- New semantics
You can review the existing JEPs in this repo to get a sense of the type of changes that require a JEP.
Things that do not need a JEP
Anything that is specific to a JMESPath library does not need a JEP. You should defer to the specific library's contributing guide. This can include additional language specific APIs, extension points (e.g. adding custom functions), configuration options, etc.
Guidelines for proposing new features
First, make sure that the feature has not been previously proposed. If it has, make sure to reference prior proposals and explain why this new proposal should be considered despite similar proposals not being accepted.
Writing a JEP can be a lot of work, so it can help to get initial guidance before going too far. A well thought out, high quality JEP helps its chance of acceptance and helps ensure a productive review process.
Before writing a JEP, you can create an issue for initial high level feedback in order to get a sense of the likelihood of a JEP being accepted. You can also use that issue to gauge interest in the feature.
The JEP Process
-
Fork this repository.
-
Copy
0000-jep-template.md
toproposals/0000-feature-name.md
, wherefeature-name
is a high level descriptive name of the proposal. You don't need to add a JEP number, one will be assigned during the review process. -
Fill in all sections of the JEP template. Be mindful of the "Motivation" and "Rationale" sections. These are an important part of driving consensus for a JEP.
-
Submit a pull request to this repo.
-
The JEP will be reviewed and feedback will be provided. Proposals often go through several rounds of feedback, this is a normal and expected part of the process.
-
As you incorporate feedback, do not rebase your commits. This ensures the history and evolution of the proposal remains visible.
-
The discussions will eventually stabilize to one of several states:
- The JEP has consensus for both the functionality and the proposed specification and is ready to be accepted.
- The JEP has consensus for the feature but there is not consensus with the specification.
- The JEP does not have consensus for the feature.
- The JEP loses steam and the discussions go stale. This will result in the PR being closed, but is subject to being reopened by anyone that wants to continue working on the JEP.
-
Once the JEP is approved by the JMESPath core team the pull request will be merged and the JEP will be assigned a number.
-
The relevant parts of the "Specification" section will be added to the JMESPath specification, and the tests cases from the "Test Cases" section of the JEP will be added to the jmespath.test repo.
-
JMESPath libraries can now implement the accepted JEP.
Tenets of JMESPath
When proposing new features, keep these tenets in mind. Adhering to these tenets gives your proposal a higher likelihood of being accepted:
- JMESPath is not specific to a particular programming language. Avoid constructs that are difficult to implement in another language.
- JMESPath strives to have one way to do something.
- Features are driven from real world use cases.
Nested Expressions
- JEP: 1
- Author: Michael Dowling
- Created: 2013-11-27
Abstract
This document proposes modifying the JMESPath grammar
to support arbitrarily nested expressions within multi-select-list
and
multi-select-hash
expressions.
Motivation
This JMESPath grammar currently does not allow arbitrarily nested expressions
within multi-select-list
and multi-select-hash
expressions. This
prevents nested branching expressions, nested multi-select-list
expressions
within other multi expressions, and nested or-expressions
within any
multi-expression.
By allowing any expression to be nested within a multi-select-list
and
multi-select-hash
expression, we can trim down several grammar rules and
provide customers with a much more flexible expression DSL.
Supporting arbitrarily nested expressions within other expressions requires:
-
Updating the grammar to remove
non-branched-expr
-
Updating compliance tests to add various permutations of the grammar to ensure implementations are compliant.
-
Updating the JMESPath documentation to reflect the ability to arbitrarily nest expressions.
Nested Expression Examples
Nested branch expressions
Given:
{
"foo": {
"baz": [
{
"bar": "abc"
}, {
"bar": "def"
}
],
"qux": ["zero"]
}
}
With: foo.[baz[\*].bar, qux[0]]
Result:
[
[
"abc",
"def"
],
"zero"
]
Nested branch expressions with nested mutli-select
Given:
{
"foo": {
"baz": [
{
"bar": "a",
"bam": "b",
"boo": "c"
}, {
"bar": "d",
"bam": "e",
"boo": "f"
}
],
"qux": ["zero"]
}
}
With: foo.[baz[\*].[bar, boo], qux[0]]
Result:
[
[
[
"a",
"c"
],
[
"d",
"f"
]
],
"zero"
]
Nested or expressions
Given:
{
"foo": {
"baz": [
{
"bar": "a",
"bam": "b",
"boo": "c"
}, {
"bar": "d",
"bam": "e",
"boo": "f"
}
],
"qux": ["zero"]
}
}
With: foo.[baz[\*].not_there || baz[\*].bar, qux[0]]
Result:
[
[
"a",
"d"
],
"zero"
]
No breaking changes
Because there are no breaking changes from this modification, existing multi-select expressions will still work unchanged:
Given:
{
"foo": {
"baz": {
"abc": 123,
"bar": 456
}
}
}
With: foo.[baz, baz.bar]
Result:
[
{
"abc": 123,
"bar": 456
},
456
]
Modified Grammar
The following modified JMESPath grammar supports arbitrarily nested expressions and is specified using ABNF, as described in RFC4234
expression = sub-expression / index-expression / or-expression / identifier / "*"
expression =/ multi-select-list / multi-select-hash
sub-expression = expression "." expression
or-expression = expression "||" expression
index-expression = expression bracket-specifier / bracket-specifier
multi-select-list = "[" ( expression *( "," expression ) ) "]"
multi-select-hash = "{" ( keyval-expr *( "," keyval-expr ) ) "}"
keyval-expr = identifier ":" expression
bracket-specifier = "[" (number / "*") "]"
number = [-]1*digit
digit = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "0"
identifier = 1*char
identifier =/ quote 1*(unescaped-char / escaped-quote) quote
escaped-quote = escape quote
unescaped-char = %x30-10FFFF
escape = %x5C ; Back slash: \
quote = %x22 ; Double quote: '"'
char = %x30-39 / ; 0-9
%x41-5A / ; A-Z
%x5F / ; _
%x61-7A / ; a-z
%x7F-10FFFF
Functions
- JEP: 3
- Author: Michael Dowling, James Saryerwinnie
- Created: 2013-11-27
Abstract
This document proposes modifying the JMESPath grammar to support function expressions.
Motivation
Functions allow users to easily transform and filter data in JMESPath
expressions. As JMESPath is currently implemented, functions would be very
useful in multi-select-list
and multi-select-hash
expressions to format
the output of an expression to contain data that might not have been in the
original JSON input. Combined with filtered expressions, functions would be a
powerful mechanism to perform any kind of special comparisons for things like
length()
, contains()
, etc.
Data Types
In order to support functions, a type system is needed. The JSON types are used:
-
number (integers and double-precision floating-point format in JSON)
-
string
-
boolean (
true
orfalse
) -
array (an ordered, sequence of values)
-
object (an unordered collection of key value pairs)
-
null
Syntax Changes
Functions are defined in the function-expression
rule below. A function
expression is an expression
itself, and is valid any place an
expression
is allowed.
The grammar will require the following grammar additions:
function-expression = unquoted-string (
no-args /
one-or-more-args )
no-args = "(" ")"
one-or-more-args = "(" ( function-arg *( "," function-arg ) ) ")"
function-arg = expression / number / current-node
current-node = "@"
expression
will need to be updated to add the function-expression
production:
expression = sub-expression / index-expression / or-expression / identifier / "*"
expression =/ multi-select-list / multi-select-hash
expression =/ literal / function-expression
A function can accept any number of arguments, and each argument can be an expression. Each function must define a signature that specifies the number and allowed types of its expected arguments. Functions can be variadic.
current-node
The current-node
token can be used to represent the current node being
evaluated. The current-node
token is useful for functions that require the
current node being evaluated as an argument. For example, the following
expression creates an array containing the total number of elements in the
foo
object followed by the value of foo["bar"]
.
foo[].[count(@), bar]
JMESPath assumes that all function arguments operate on the current node unless
the argument is a literal
or number
token. Because of this, an
expression such as @.bar
would be equivalent to just bar
, so the
current node is only allowed as a bare expression.
current-node state
At the start of an expression, the value of the current node is the data being evaluated by the JMESPath expression. As an expression is evaluated, the value the the current node represents MUST change to reflect the node currently being evaluated. When in a projection, the current node value MUST be changed to the node currently being evaluated by the projection.
Function Evaluation
Functions are evaluated in applicative order. Each argument must be an
expression, each argument expression must be evaluated before evaluating the
function. The function is then called with the evaluated function arguments.
The result of the function-expression
is the result returned by the
function call. If a function-expression
is evaluated for a function that
does not exist, the JMESPath implementation must indicate to the caller that an
unknown-function
error occurred. How and when this error is raised is
implementation specific, but implementations should indicate to the caller that
this specific error occurred.
Functions can either have a specific arity or be variadic with a minimum
number of arguments. If a function-expression
is encountered where the
arity does not match or the minimum number of arguments for a variadic function
is not provided, then implementations must indicate to the caller than an
invalid-arity
error occurred. How and when this error is raised is
implementation specific.
Each function signature declares the types of its input parameters. If any
type constraints are not met, implementations must indicate that an
invalid-type
error occurred.
In order to accommodate type contraints, functions are provided to convert
types to other types (to_string
, to_number
) which are defined below.
No explicit type conversion happens unless a user specifically uses one of
these type conversion functions.
Function expressions are also allowed as the child element of a sub expression.
This allows functions to be used with projections, which can enable functions
to be applied to every element in a projection. For example, given the input
data of ["1", "2", "3", "notanumber", true]
, the following expression can
be used to convert (and filter) all elements to numbers:
search([].to_number(@), ``["1", "2", "3", "notanumber", true]``) -> [1, 2, 3]
This provides a simple mechanism to explicitly convert types when needed.
Built-in Functions
JMESPath has various built-in functions that operate on different data types, documented below. Each function below has a signature that defines the expected types of the input and the type of the returned output:
return_type function_name(type $argname)
return_type function_name2(type1|type2 $argname)
If a function can accept multiple types for an input value, then the
multiple types are separated with |
. If the resolved arguments do not
match the types specified in the signature, an invalid-type
error occurs.
The array
type can further specify requirements on the type of the elements
if they want to enforce homogeneous types. The subtype is surrounded by
[type]
, for example, the function signature below requires its input
argument resolves to an array of numbers:
return_type foo(array[number] $argname)
As a shorthand, the type any
is used to indicate that the argument can be
of any type (array|object|number|string|boolean|null
).
The first function below, abs
is discussed in detail to demonstrate the
above points. Subsequent function definitions will not include these details
for brevity, but the same rules apply.
NOTE: All string related functions are defined on the basis of Unicode code points; they do not take normalization into account.
abs
number abs(number $value)
Returns the absolute value of the provided argument. The signature indicates
that a number is returned, and that the input argument $value
must
resolve to a number, otherwise a invalid-type
error is triggered.
Below is a worked example. Given:
{"foo": -1, "bar": "2"}
Evaluating abs(foo)
works as follows:
- Evaluate the input argument against the current data:
search(foo, {"foo": -11, "bar": 2"}) -> -1
-
Validate the type of the resolved argument. In this case
-1
is of typenumber
so it passes the type check. -
Call the function with the resolved argument:
abs(-1) -> 1
-
The value of
1
is the resolved value of the function expressionabs(foo)
.
Below is the same steps for evaluating abs(bar)
:
- Evaluate the input argument against the current data:
search(foo, {"foo": -1, "bar": 2"}) -> "2"
- Validate the type of the resolved argument. In this case
"2
is of typestring
so the immediate indicate that aninvalid-type
error occurred.
As a final example, here is the steps for evaluating abs(to_number(bar))
:
- Evaluate the input argument against the current data:
search(to_number(bar), {"foo": -1, "bar": "2"})
- In order to evaluate the above expression, we need to evaluate
to_number(bar)
:
search(bar, {"foo": -1, "bar": "2"}) -> "2"
# Validate "2" passes the type check for to_number, which it does.
to_number("2") -> 2
- Now we can evaluate the original expression:
search(to_number(bar), {"foo": -1, "bar": "2"}) -> 2
- Call the function with the final resolved value:
abs(2) -> 2
- The value of
2
is the resolved value of the function expressionabs(to_number(bar))
.
Examples
Expression | Result |
---|---|
abs(1) | 1 |
abs(-1) | 1 |
abs(`abc`) |
avg
number avg(array[number] $elements)
Returns the average of the elements in the provided array.
An empty array will produce a return value of null.
Examples
Given | Expression | Result |
---|---|---|
[10, 15, 20] | avg(@) | 15 |
[10, false, 20] | avg(@) | <error: invalid-type> |
[false] | avg(@) | <error: invalid-type> |
false | avg(@) | <error: invalid-type> |
ceil
number ceil(number $value)
Returns the next highest integer value by rounding up if necessary.
Examples
Expression | Result |
---|---|
ceil(`1.001`) | 2 |
ceil(`1.9`) | 2 |
ceil(`1`) | 1 |
ceil(`abc`) | null |
| ### contains
boolean contains(array|string $subject, array|object|string|number|boolean $search)
Returns true
if the given $subject
contains the provided $search
string.
If $subject
is an array, this function returns true if one of the elements
in the array is equal to the provided $search
value.
If the provided $subject
is a string, this function returns true if
the string contains the provided $search
argument.
Examples
Given | Expression | Result |
---|---|---|
n/a | contains(`foobar`, `foo`) | true |
n/a | contains(`foobar`, `not`) | false |
n/a | contains(`foobar`, `bar`) | true |
n/a | contains(`false`, `bar`) | <error: invalid-type> |
n/a | contains(`foobar`, 123) | false |
["a", "b"] | contains(@, `a`) | true |
["a"] | contains(@, `a\`) | true |
["a"] | contains(@, `b\`) | false |
floor
number floor(number $value)
Returns the next lowest integer value by rounding down if necessary.
Examples
Expression | Result |
---|---|
floor(`1.001\`) | 1 |
floor(`1.9\`) | 1 |
floor(`1\`) | 1 |
join
string join(string $glue, array[string] $stringsarray)
Returns all of the elements from the provided $stringsarray
array joined
together using the $glue
argument as a separator between each.
Examples
Given | Expression | Result |
---|---|---|
["a", "b"] | join(`, `, @) | "a, b" |
["a", "b"] | join(``, @) | "ab" |
["a", false, "b"] | join(`, `, @) | <error: invalid-type> |
[false] | join(`, `, @) | <error: invalid-type> |
keys
array keys(object $obj)
Returns an array containing the keys of the provided object.
Examples
Given | Expression | Result |
---|---|---|
{"foo": "baz", "bar": "bam"} | keys(@) | ["foo", "bar"] |
{} | keys(@) | [] |
false | keys(@) | <error: invalid-type> |
[b, a, c] | keys(@) | <error: invalid-type> |
length
number length(string|array|object $subject)
Returns the length of the given argument using the following types rules:
-
string: returns the number of code points in the string
-
array: returns the number of elements in the array
-
object: returns the number of key-value pairs in the object
Examples
Given | Expression | Result |
---|---|---|
n/a | length(`abc`) | 3 |
"current" | length(@) | 7 |
"current" | length(not_there) | <error: invalid-type> |
["a", "b", "c"] | length(@) | 3 |
[] | length(@) | 0 |
{} | length(@) | 0 |
{"foo": "bar", "baz": "bam"} | length(@) | 2 |
max
number max(array[number] $collection)
Returns the highest found number in the provided array argument.
An empty array will produce a return value of null.
Examples
Given | Expression | Result |
---|---|---|
[10, 15] | max(@) | 15 |
[10, false, 20] | max(@) | <error: invalid-type> |
min
number min(array[number] $collection)
Returns the lowest found number in the provided $collection
argument.
Examples
Given | Expression | Result |
---|---|---|
[10, 15] | min(@) | 10 |
[10, false, 20] | min(@) | <error: invalid-type> |
sort
array sort(array $list)
This function accepts an array $list
argument and returns the sorted
elements of the $list
as an array.
The array must be a list of strings or numbers. Sorting strings is based on code points. Locale is not taken into account.
Examples
Given | Expression | Result |
---|---|---|
[b, a, c] | sort(@) | [a, b, c] |
[1, a, c] | sort(@) | [1, a, c] |
[false, [], null] | sort(@) | [[], null, false] |
[[], {}, false] | sort(@) | [{}, [], false] |
{"a": 1, "b": 2} | sort(@) | null |
false | sort(@) | null |
to_string
string to_string(string|number|array|object|boolean $arg)
-
string - Returns the passed in value.
-
number/array/object/boolean - The JSON encoded value of the object. The JSON encoder should emit the encoded JSON value without adding any additional new lines.
Examples
Given | Expression | Result |
---|---|---|
null | to_string(`2`) | "2" |
to_number
number to_number(string|number $arg)
-
string - Returns the parsed number. Any string that conforms to the
json-number
production is supported. -
number - Returns the passed in value.
-
array - null
-
object - null
-
boolean - null
type
string type(array|object|string|number|boolean|null $subject)
Returns the JavaScript type of the given $subject
argument as a string
value.
The return value MUST be one of the following:
- number
- string
- boolean
- array
- object
- null
Examples
Given | Expression | Result |
---|---|---|
"foo" | type(@) | "string" |
true | type(@) | "boolean" |
false | type(@) | "boolean" |
null | type(@) | "null" |
123 | type(@) | number |
123.05 | type(@) | number |
["abc"] | type(@) | "array" |
{"abc": "123"} | type(@) | "object" |
values
array values(object $obj)
Returns the values of the provided object.
Examples
Given | Expression | Result |
---|---|---|
{"foo": "baz", "bar": "bam"} | values(@) | ["baz", "bam"] |
["a", "b"] | values(@) | <error: invalid-type> |
false | values(@) | <error: invalid-type> |
Compliance Tests
A functions.json
will be added to the compliance test suite.
The test suite will add the following new error types:
- unknown-function
- invalid-arity
- invalid-type
The compliance does not specify when the errors are raised, as this will depend on implementation details. For an implementation to be compliant they need to indicate that an error occurred while attempting to evaluate the JMESPath expression.
History
-
This JEP originally proposed the literal syntax. The literal portion of this JEP was removed and added instead to JEP 7.
-
This JEP originally specified that types matches should return null. This has been updated to specify that an invalid type error should occur instead.
Pipe Expressions
- JEP: 4
- Author: Michael Dowling
- Created: 2013-12-07
Abstract
This document proposes adding support for piping expressions into subsequent expressions.
Motivation
The current JMESPath grammar allows for projections at various points in an expression. However, it is not currently possible to operate on the result of a projection as a list.
The following example illustrates that it is not possible to operate on the result of a projection (e.g., take the first match of a projection).
Given:
{
"foo": {
"a": {
"bar": [1, 2, 3]
},
"b": {
"bar": [4, 5, 6]
}
}
}
Expression:
foo.*.bar[0]
The result would be element 0 of each bar
:
[1, 4]
With the addition of filters, we could pass the result of one expression to another, operating on the result of a projection (or any expression).
Expression:
foo.*.bar | [0]
Result:
[1, 2, 3]
Not only does this give us the ability to operate on the result of a projection, but pipe expressions can also be useful for breaking down a complex expression into smaller, easier to comprehend, parts.
Modified Grammar
The following modified JMESPath grammar supports piped expressions.
expression = sub-expression / index-expression / or-expression / identifier / "*"
expression =/ multi-select-list / multi-select-hash / pipe-expression
sub-expression = expression "." expression
pipe-expression = expression "|" expression
or-expression = expression "||" expression
index-expression = expression bracket-specifier / bracket-specifier
multi-select-list = "[" ( expression *( "," expression ) ) "]"
multi-select-hash = "{" ( keyval-expr *( "," keyval-expr ) ) "}"
keyval-expr = identifier ":" expression
bracket-specifier = "[" (number / "*") "]" / "[]"
number = [-]1*digit
digit = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "0"
identifier = 1*char
identifier =/ quote 1*(unescaped-char / escaped-quote) quote
escaped-quote = escape quote
unescaped-char = %x30-10FFFF
escape = %x5C ; Back slash: \
quote = %x22 ; Double quote: '"'
char = %x30-39 / ; 0-9
%x41-5A / ; A-Z
%x5F / ; _
%x61-7A / ; a-z
%x7F-10FFFF
NOTE: pipe-expression
has a higher precendent than the or-operator
Compliance Tests
[{
"given": {
"foo": {
"bar": {
"baz": "one"
},
"other": {
"baz": "two"
},
"other2": {
"baz": "three"
},
"other3": {
"notbaz": ["a", "b", "c"]
},
"other4": {
"notbaz": ["d", "e", "f"]
}
}
},
"cases": [
{
"expression": "foo.*.baz | [0]",
"result": "one"
},
{
"expression": "foo.*.baz | [1]",
"result": "two"
},
{
"expression": "foo.*.baz | [2]",
"result": "three"
},
{
"expression": "foo.bar.* | [0]",
"result": "one"
},
{
"expression": "foo.*.notbaz | [*]",
"result": [["a", "b", "c"], ["d", "e", "f"]]
},
{
"expression": "foo | bar",
"result": {"baz": "one"}
},
{
"expression": "foo | bar | baz",
"result": "one"
},
{
"expression": "foo|bar| baz",
"result": "one"
},
{
"expression": "not_there | [0]",
"result": null
},
{
"expression": "not_there | [0]",
"result": null
},
{
"expression": "[foo.bar, foo.other] | [0]",
"result": {"baz": "one"}
},
{
"expression": "{\"a\": foo.bar, \"b\": foo.other} | a",
"result": {"baz": "one"}
},
{
"expression": "{\"a\": foo.bar, \"b\": foo.other} | b",
"result": {"baz": "two"}
},
{
"expression": "{\"a\": foo.bar, \"b\": foo.other} | *.baz",
"result": ["one", "two"]
},
{
"expression": "foo.bam || foo.bar | baz",
"result": "one"
},
{
"expression": "foo | not_there || bar",
"result": {"baz": "one"}
}
]
}]
Array Slice Expressions
- JEP: 5
- Author: Michael Dowling
- Created: 2013-12-08
Abstract
This document proposes modifying the JMESPath grammar to support array slicing for accessing specific portions of an array.
Motivation
The current JMESPath grammar does not allow plucking out specific portions of an array.
The following examples are possible with array slicing notation utilizing an optional start position, optional stop position, and optional step that can be less than or greater than 0:
-
Extracting every N indices (e.g., only even
[::2]
, only odd[1::2]
, etc) -
Extracting only elements after a given start position:
[2:]
-
Extracting only elements before a given stop position:
[:5]
-
Extracting elements between a given start and end position:
[2::5]
-
Only the last 5 elements:
[-5:]
-
The last five elements in reverse order:
[:-5:-1]
-
Reversing the order of an array:
[::-1]
Syntax
This syntax introduces Python style array slicing that allows a start position, stop position, and step. This syntax also proposes following the same semantics as python slices.
[start:stop:step]
Each part of the expression is optional. You can omit the start position, stop position, or step. No more than three values can be provided in a slice expression.
The step value determines how my indices to skip after each element is plucked from the array. A step of 1 (the default step) will not skip any indices. A step value of 2 will skip every other index while plucking values from an array. A step value of -1 will extract values in reverse order from the array. A step value of -2 will extract values in reverse order from the array while, skipping every other index.
Slice expressions adhere to the following rules:
-
If a negative start position is given, it is calculated as the total length of the array plus the given start position.
-
If no start position is given, it is assumed to be 0 if the given step is greater than 0 or the end of the array if the given step is less than 0.
-
If a negative stop position is given, it is calculated as the total length of the array plus the given stop position.
-
If no stop position is given, it is assumed to be the length of the array if the given step is greater than 0 or 0 if the given step is less than 0.
-
If the given step is omitted, it it assumed to be 1.
-
If the given step is 0, an error must be raised.
-
If the element being sliced is not an array, the result must be
null
. -
If the element being sliced is an array and yields no results, the result must be an empty array.
Modified Grammar
The following modified JMESPath grammar supports array slicing.
expression = sub-expression / index-expression / or-expression / identifier / "*"
expression =/ multi-select-list / multi-select-hash
sub-expression = expression "." expression
or-expression = expression "||" expression
index-expression = expression bracket-specifier / bracket-specifier
multi-select-list = "[" ( expression *( "," expression ) ) "]"
multi-select-hash = "{" ( keyval-expr *( "," keyval-expr ) ) "}"
keyval-expr = identifier ":" expression
bracket-specifier = "[" (number / "*" / slice-expression) "]" / "[]"
slice-expression = ":"
slice-expression =/ number ":" number ":" number
slice-expression =/ number ":"
slice-expression =/ number ":" ":" number
slice-expression =/ ":" number
slice-expression =/ ":" number ":" number
slice-expression =/ ":" ":" number
number = [-]1*digit
digit = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "0"
identifier = 1*char
identifier =/ quote 1*(unescaped-char / escaped-quote) quote
escaped-quote = escape quote
unescaped-char = %x30-10FFFF
escape = %x5C ; Back slash: \
quote = %x22 ; Double quote: '"'
char = %x30-39 / ; 0-9
%x41-5A / ; A-Z
%x5F / ; _
%x61-7A / ; a-z
%x7F-10FFFF
Improved Identifiers
- JEP: 6
- Author: James Saryerwinnie
- Created: 2013-12-14
Abstract
This JEP proposes grammar modifications to JMESPath in order to improve identifiers used in JMESPath. In doing so, several inconsistencies in the identifier grammar rules will be fixed, along with an improved grammar for specifying unicode identifiers in a way that is consistent with JSON strings.
Motivation
There are two ways to currently specify an identifier, the unquoted rule:
identifier = 1*char
and the quoted rule:
identifier =/ quote 1*(unescaped-char / escaped-quote) quote
The char
rule contains a set of characters that do not have to be
quoted:
char = %x30-39 / ; 0-9
%x41-5A / ; A-Z
%x5F / ; _
%x61-7A / ; a-z
%x7F-10FFFF
There is an ambiguity between the %x30-39
rule and the number
rule:
number = ["-"]1*digit
digit = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "0"
It’s ambiguous which rule to use. Given a string “123”, it’s not clear whether
this should be parsed as an identifier or a number. Existing implementations
aren’t following this rule (because it’s ambiguous) so the grammar should
be updated to remove the ambiguity, specifically, an unquoted identifier can
only start with the characters [a-zA-Z_]
.
Unicode
JMESPath supports unicode through the char
and unescaped-char
rule:
unescaped-char = %x30-10FFFF
char = %x30-39 / ; 0-9
%x41-5A / ; A-Z
%x5F / ; _
%x61-7A / ; a-z
%x7F-10FFFF
However, JSON supports a syntax for escaping unicode characters. Any character in the Basic Multilingual Plane (BMP) can be escaped with:
char = escape (%x75 4HEXDIG ) ; \uXXXX
Similar to the way that XPath supports numeric character references used
in XML (&#nnnn
), JMESPath should support the same escape sequences
used in JSON. JSON also supports a 12 character escape sequence for
characters outside of the BMP, by encoding the UTF-16 surrogate pair.
For example, the code point U+1D11E
can be represented
as "\\uD834\\uDD1E"
.
Escape Sequences
Consider the following JSON object:
{"foo\nbar": "baz"}
A JMESPath expression should be able to retrieve the value of baz. With
the current grammar, one must rely on the environment’s ability to input
control characters such as the newline (%x0A
). This can be problematic
in certain environments. For example, in python, this is not a problem:
>>> jmespath_expression = "foo\nbar"
Python will interpret the sequence "\\n"
(%x5C %x6E
) as the newline
character %x0A
. However, consider Bash:
$ foo --jmespath-expression "foo\nbar"
In this situation, bash will not interpret the "\\n"
(%x5C %x6E
)
sequence.
Specification
The char
rule contains a set of characters that do not have to be
quoted. The new set of characters that do not have to quoted will be:
unquoted-string = (%x41-5A / %x61-7A / %x5F) *(%x30-39 / %x41-5A / %x5F / %x61-7A)
In order for an identifier to not be quoted, it must start with [A-Za-z_]
,
then must be followed by zero or more [0-9A-Za-z_]
.
The unquoted rule is updated to account for all JSON supported escape sequences:
quoted-string =/ quote 1*(unescaped-char / escaped-char) quote
The full rule for an identifier is:
identifier = unquoted-string / quoted-string
unquoted-string = (%x41-5A / %x61-7A / %x5F) *( ; a-zA-Z_
%x30-39 / ; 0-9
%x41-5A / ; A-Z
%x5F / ; _
%x61-7A) ; a-z
quoted-string = quote 1*(unescaped-char / escaped-char) quote
unescaped-char = %x20-21 / %x23-5B / %x5D-10FFFF
escape = %x5C ; Back slash: \
quote = %x22 ; Double quote: '"'
escaped-char = escape (
%x22 / ; " quotation mark U+0022
%x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG ) ; uXXXX U+XXXX
Rationale
Adopting the same string rules as JSON strings will allow users familiar with JSON semantics to understand how JMESPath identifiers will work.
This change also provides a nice consistency for the literal syntax proposed in JEP 3. With this model, the supported literal strings can be the same as quoted identifiers.
This also will allow the grammar to grow in a consistent way if JMESPath adds support for filtering based on literal values. For example (note that this is just a suggested syntax, not a formal proposal), given the data:
{"foo": [{"✓": "✓"}, {"✓": "✗"}]}
You can now have the following JMESPath expressions:
foo[?"✓" = `✓`]
foo[?"\u2713" = `\u2713`]
As a general property, any supported JSON string is now a supported quoted identifier.
Impact
For any implementation that was parsing digits as an identifier, identifiers
starting with digits will no longer be valid, e.g. foo.0.1.2
.
There are several compliance tests that will have to be updated as a result of this JEP. They were arguably wrong to begin with.
basic.json
The following needs to be changed because identifiers starting with a number must now be quoted:
- "expression": "foo.1",
+ "expression": "foo.\"1\"",
"result": ["one", "two", "three"]
},
{
- "expression": "foo.1[0]",
+ "expression": "foo.\"1\"[0]",
"result": "one"
},
Similarly, the following needs to be changed because an unquoted
identifier cannot start with -
:
- "expression": "foo.-1",
+ "expression": "foo.\"-1\"",
"result": "bar"
}
escape.json
The escape.json has several more interseting cases that need to be updated. This has to do with the updated escaping rules. Each one will be explained.
- "expression": "\"foo\nbar\"",
+ "expression": "\"foo\\nbar\"",
"result": "newline"
},
This has to be updated because a JSON parser will interpret the \\n
sequence
as the newline character. The newline character is not allowed in a
JMESPath identifier (note that the newline character %0A
is not in any
rule). In order for a JSON parser to create a sequence of %x5C %x6E
, the
JSON string must be \\\\n
(%x5C %x5C %x6E
).
- "expression": "\"c:\\\\windows\\path\"",
+ "expression": "\"c:\\\\\\\\windows\\\\path\"",
"result": "windows"
},
The above example is a more pathological case of escaping. In this example, we
have a string that represents a windows path “c:\windowpath”. There are two
levels of escaping happening here, one at the JSON parser, and one at the
JMESPath parser. The JSON parser will take the sequence
"\\"c:\\\\\\\\\\\\\\\\windows\\\\\\\\path\\""
and create the string
"\\"c:\\\\\\\\windows\\\\path\\""
. The JMESPath parser will take the string
"\\"c:\\\\\\\\windows\\\\path\\"'
and, applying its own escaping rules, will
look for a key named c:\\\\windows\\path
.
Filter Expressions
- JEP: 7
- Author: James Saryerwinnie
- Created: 2013-12-16
Abstract
This JEP proposes grammar modifications to JMESPath to allow for filter expressions. A filtered expression allows list elements to be selected based on matching expressions. A literal expression is also introduced (from JEP 3) so that it is possible to match elements against literal values.
Motivation
A common request when querying JSON objects is the ability to select elements based on a specific value. For example, given a JSON object:
{"foo": [{"state": "WA", "value": 1},
{"state": "WA", "value": 2},
{"state": "CA", "value": 3},
{"state": "CA", "value": 4}]}
A user may want to select all objects in the foo
list that have
a state
key of WA
. There is currently no way to do this
in JMESPath. This JEP will introduce a syntax that allows this:
foo[?state == `WA`]
Additionally, a user may want to project additional expressions onto the values
matched from a filter expression. For example, given the data above, select
the value
key from all objects that have a state
of WA
:
foo[?state == `WA`].value
would return [1, 2]
.
Specification
The updated grammar for filter expressions:
bracket-specifier = "[" (number / "*") "]" / "[]"
bracket-specifier =/ "[?" list-filter-expression "]"
list-filter-expression = expression comparator expression
comparator = "<" / "<=" / "==" / ">=" / ">" / "!="
expression =/ literal
literal = "`" json-value "`"
literal =/ "`" 1*(unescaped-literal / escaped-literal) "`"
unescaped-literal = %x20-21 / ; space !
%x23-5A / ; # - [
%x5D-5F / ; ] ^ _
%x61-7A ; a-z
%x7C-10FFFF ; |}~ ...
escaped-literal = escaped-char / (escape %x60)
The json-value
rule is any valid json value. While it’s recommended
that implementations use an existing JSON parser to parse the
json-value
, the grammar is added below for completeness:
json-value = "false" / "null" / "true" / json-object / json-array /
json-number / json-quoted-string
json-quoted-string = %x22 1*(unescaped-literal / escaped-literal) %x22
begin-array = ws %x5B ws ; [ left square bracket
begin-object = ws %x7B ws ; { left curly bracket
end-array = ws %x5D ws ; ] right square bracket
end-object = ws %x7D ws ; } right curly bracket
name-separator = ws %x3A ws ; : colon
value-separator = ws %x2C ws ; , comma
ws = *(%x20 / ; Space
%x09 / ; Horizontal tab
%x0A / ; Line feed or New line
%x0D ; Carriage return
)
json-object = begin-object [ member *( value-separator member ) ] end-object
member = quoted-string name-separator json-value
json-array = begin-array [ json-value *( value-separator json-value ) ] end-array
json-number = [ minus ] int [ frac ] [ exp ]
decimal-point = %x2E ; .
digit1-9 = %x31-39 ; 1-9
e = %x65 / %x45 ; e E
exp = e [ minus / plus ] 1*DIGIT
frac = decimal-point 1*DIGIT
int = zero / ( digit1-9 *DIGIT )
minus = %x2D ; -
plus = %x2B ; +
zero = %x30 ; 0
Comparison Operators
The following operations are supported:
-
==
, tests for equality. -
!=
, tests for inequality. -
<
, less than. -
<=
, less than or equal to. -
>
, greater than. -
>=
, greater than or equal to.
The behavior of each operation is dependent on the type of each evaluated expression.
The comparison semantics for each operator are defined below based on the corresponding JSON type:
Equality Operators
For string/number/true/false/null
types, equality is an exact match. A
string
is equal to another string
if they they have the exact sequence
of code points. The literal values true/false/null
are only equal to their
own literal values. Two JSON objects are equal if they have the same set
of keys (for each key in the first JSON object there exists a key with equal
value in the second JSON object). Two JSON arrays are equal if they have
equal elements in the same order (given two arrays x
and y
,
for each i
in x
, x[i] == y[i]
).
Ordering Operators
Ordering operators >, >=, <, <=
are only valid for numbers.
Evaluating any other type with a comparison operator will yield a null
value, which will result in the element being excluded from the result list.
For example, given:
search('foo[?a<b]', {"foo": [{"a": "char", "b": "char"},
{"a": 2, "b": 1},
{"a": 1, "b": 2}]})
The three elements in the foo list are evaluated against a < b
. The first
element resolves to the comparison "char" < "bar"
, and because these types
are string, the expression results in null
, so the first element is not
included in the result list. The second element resolves to 2 < 1
,
which is false
, so the second element is excluded from the result list.
The third expression resolves to 1 < 2
which evalutes to true
, so the
third element is included in the list. The final result of that expression
is [{"a": 1, "b": 2}]
.
Filtering Semantics
When a filter expression is matched, the matched element in its entirety is included in the filtered response.
Using the previous example, given the following data:
{"foo": [{"state": "WA", "value": 1},
{"state": "WA", "value": 2},
{"state": "CA", "value": 3},
{"state": "CA", "value": 4}]}
The expression foo[?state == \
WA`]` will return the following value:
[{"state": "WA", "value": 1}]
Literal Expressions
Literal expressions are also added in the JEP, which is essentially a JSON value surrounded by the “`” character. You can escape the “`” character via “`”, and if the character “`” appears in the JSON value, it must also be escaped. A simple two pass algorithm in the lexer could first process any escaped “`” characters before handing the resulting string to a JSON parser.
Because string literals are by far the most common type of JSON value, an alternate syntax is supported where the starting and ending double quotes are not required for strings. For example:
`foobar` -> "foobar"
`"foobar"` -> "foobar"
`123` -> 123
`"123"` -> "123"
`123.foo` -> "123.foo"
`true` -> true
`"true"` -> "true"
`truee` -> "truee"
Literal expressions aren’t allowed on the right hand side of a subexpression:
foo[*].`literal`
but they are allowed on the left hand side:
`{"foo": "bar"}`.foo
They may also be included in other expressions outside of a filter expressions. For example:
{value: foo.bar, type: `multi-select-hash`}
Rationale
The proposed filter expression syntax was chosen such that there is sufficient expressive power for any type of filter one might need to perform while at the same time being as minimal as possible. To help illustrate this, below are a few alternate syntax that were considered.
In the simplest case where one might filter a key based on a literal value, a possible filter syntax would be:
foo[bar == baz]
or in general terms: [identifier comparator literal-value]
. However this
has several issues:
-
It is not possible to filter based on two expressions (get all elements whose
foo
key equals itsbar
key. -
The literal value is on the right hand side, making it hard to troubleshoot if the identifier and literal value are swapped:
foo[baz == bar]
. -
Without some identifying token unary filters would not be possible as they would be ambiguous. Is the expression
[foo]
filtering all elements with a foo key with a truth value or is it a multiselect-list selecting thefoo
key from each hash? Starting a filter expression with a token such as[?
make it clear that this is a filter expression. -
This makes the syntax for filtering against literal JSON arrays and objects hard to visually parse. “Filter all elements whose
foo
key is a single list with a single integer value of 2:[foo == [2]]
. -
Adding literal expressions makes them useful even outside of a filter expression. For example, in a
multi-select-hash
, you can create arbitrary key value pairs:{a: foo.bar, b: \
some string`}`.
This JEP is purposefully minimal. There are several extensions that can be added in future:
- Support any arbitrary expression within the
[? ... ]
. This would enable constructs such as or expressions within a filter. This would allow unary expressions.
In order for this to be useful we need to define what corresponds to true and false values, e.g. an empty list is a false value. Additionally, “or expressions” would need to change its semantics to branch based on the true/false value of an expression instead of whether or not the expression evalutes to null.
This is certainly a direction to take in the future, adding arbitrary expressions in a filter would be a backwards compatible change, so it’s not part of this JEP.
- Allow filter expressions as top level expressions. This would potentially
just return
true/false
for any value that it matched.
This might be useful if you can combine this with something that can accept a list to use as a mask for filtering other elements.
Expression Types
- JEP: 8
- Author: James Saryerwinnie
- Created: 2013-03-02
Abstract
This JEP proposes grammar modifications to JMESPath to allow for
expression references within functions. This allows for functions
such as sort_by
, max_by
, min_by
. These functions take
an argument that resolves to an expression type. This enables
functionality such as sorting an array based on an expression that
is evaluated against every array element.
Motivation
A useful feature that is common in other expression languages is the ability to sort a JSON object based on a particular key. For example, given a JSON object:
{
"people": [
{"age": 20, "age_str": "20", "bool": true, "name": "a", "extra": "foo"},
{"age": 40, "age_str": "40", "bool": false, "name": "b", "extra": "bar"},
{"age": 30, "age_str": "30", "bool": true, "name": "c"},
{"age": 50, "age_str": "50", "bool": false, "name": "d"},
{"age": 10, "age_str": "10", "bool": true, "name": 3}
]
}
It is not currently possible to sort the people
array by the age
key.
Also, sort
is not defined for the object
type, so it’s not currently
possible to even sort the people
array. In order to sort the people
array, we need to know what key to use when sorting the array.
This concept of sorting based on a key can be generalized. Instead of
requiring a key name, an expression can be provided that each element
would be evaluated against. In the simplest case, this expression would just
be an identifier
, but more complex expressions could be used such as
foo.bar.baz
.
A simple way to accomplish this might be to create a function like this:
sort_by(array arg1, expression)
# Called like:
sort_by(people, age)
sort_by(people, to_number(age_str))
However, there’s a problem with the sort_by
function as defined above.
If we follow the function argument resolution process we get:
sort_by(people, age)
# 1. resolve people
arg1 = search(people, <input data>) -> [{"age": ...}, {...}]
# 2. resolve age
arg2 = search(age, <input data>) -> null
sort_by([{"age": ...}, {...}], null)
The second argument is evaluated against the current node and the expression
age
will resolve to null
because the input data has no age
key.
There needs to be some way to specify that an expression should evaluate to
an expression type:
arg = search(<some expression>, <input data>) -> <expression: age>
Then the function definition of sort_by
would be:
sort_by(array arg1, expression arg2)
Specification
The following grammar rules will be updated to:
function-arg = expression /
current-node /
"&" expression
Evaluating an expression reference should return an object of type “expression”. The list of data types supported by a function will now be:
-
number (integers and double-precision floating-point format in JSON)
-
string
-
boolean (
true
orfalse
) -
array (an ordered, sequence of values)
-
object (an unordered collection of key value pairs)
-
null
-
expression (denoted by
&expression
)
Function signatures can now be specified using this new expression
type.
Additionally, a function signature can specify the return type of the
expression. Similarly how arrays can specify a type within a list using the
array[type]
syntax, expressions can specify their resolved type using
expression->type
syntax.
Note that any valid expression is allowed after &
, so the following
expressions are valid:
sort_by(people, &foo.bar.baz)
sort_by(people, &foo.bar[0].baz)
sort_by(people, &to_number(foo[0].bar))
Additional Functions
The following functions will be added:
sort_by
sort_by(array elements, expression->number|expression->string expr)
Sort an array using an expression expr
as the sort key.
Below are several examples using the people
array (defined above) as the
given input. sort_by
follows the same sorting logic as the sort
function.
Examples
Expression | Result |
---|---|
sort_by(people, &age)[].age | [10, 20, 30, 40, 50] |
sort_by(people, &age)[0] | {"age": 10, "age_str": "10", "bool": true, "name": 3} |
sort_by(people, &to_number(age_str))[0] | {"age": 10, "age_str": "10", "bool": true, "name": 3} |
max_by
max_by(array elements, expression->number expr)
Return the maximum element in an array using the expression expr
as the
comparison key. The entire maximum element is returned.
Below are several examples using the people
array (defined above) as the
given input.
Examples
Expression | Result |
---|---|
max_by(people, &age) | {"age": 50, "age_str": "50", "bool": false, "name": "d"} |
max_by(people, &age).age | 50 |
max_by(people, &to_number(age_str)) | {"age": 50, "age_str": "50", "bool": false, "name": "d"}, |
max_by(people, &age_str) | <error: invalid-type> |
max_by(people, age) | <error: invalid-type> |
min_by
min_by(array elements, expression->number expr)
Return the minimum element in an array using the expression expr
as the
comparison key. The entire maximum element is returned.
Below are several examples using the people
array (defined above) as the
given input.
Examples
Expression | Result |
---|---|
min_by(people, &age) | {"age": 10, "age_str": "10", "bool": true, "name": 3} |
min_by(people, &age).age | 10 |
min_by(people, &to_number(age_str)) | {"age": 10, "age_str": "10", "bool": true, "name": 3} |
min_by(people, &age_str) | <error: invalid-type> |
min_by(people, age) | <error: invalid-type> |
Alternatives
There were a number of alternative proposals considered. Below outlines several of these alternatives.
Logic in Argument Resolver
The first proposed choice (which was originally in JEP-3 but later removed) was
to not have any syntactic construct for specifying functions, and to allow the
function signature to dictate whether or not an argument was resolved. The
signature for sort_by
would be:
sort_by(array arg1, any arg2)
arg1 -> resolved
arg2 -> not resolved
Then the argument resolver would introspect the argument specification of a function to determine what to do. Roughly speaking, the pseudocode would be:
call-function(current-data)
arglist = []
for each argspec in functions-argspec:
if argspect.should_resolve:
arglist <- resolve(argument, current-data)
else
arglist <- argument
type-check(arglist)
return invoke-function(arglist)
However, there are several reasons not to do this:
-
This imposes a specific implementation. This implementation would be challenging in a bytecode VM, as the CALL bytecode will typically resolve arguments onto the stack and allow the function to then pop arguments off the stack and perform its own arity validation.
-
This deviates from the “standard” model of how functions are traditionally implemented.
Specifying Expressions as Strings
Another proposed alternative was to allow the expression to be
a string type and to give functions the capability to parse/eval
expressions. The sort_by
function would look like this:
sort_by(people, `age`)
sort_by(people, `foo.bar.baz`)
The main reasons this proposal was not chosen was because:
-
This complicates the implementations. For implementations that walk the AST inline, this means AST nodes need access to the parser. For external tree visitors, the visitor needs access to the parser.
-
This moves what could by a compile time error into a run time error. The evaluation of the expression string happens when the function is invoked.
Improved Filters
- JEP: 9
- Author: James Saryerwinnie
- Created: 2014-07-07
Abstract
JEP 7 introduced filter expressions, which is a mechanism to allow
list elements to be selected based on matching an expression against
each list element. While this concept is useful, the actual comparator
expressions were not sufficiently capable to accomodate a number of common
queries. This JEP expands on filter expressions by proposing support for
and-expressions
, not-expression
, paren-expressions
, and
unary-expressions
. With these additions, the capabilities of a filter
expression now allow for sufficiently powerful queries to handle the majority
of queries.
Motivation
JEP 7 introduced filter queries, that essentially look like this:
foo[?lhs omparator rhs]
where the left hand side (lhs) and the right hand side (rhs)
are both an expression
, and comparator is one of
==, !=, <, <=, >, >=
.
This added a useful feature to JMESPath: the ability to filter a list based on evaluating an expression against each element in a list.
In the time since JEP 7 has been part of JMESPath, a number of cases have been pointed out in which filter expressions cannot solve. Below are examples of each type of missing features.
Or Expressions
First, users want the ability to filter based on matching one or more expressions. For example, given:
{
"cities": [
{"name": "Seattle", "state": "WA"},
{"name": "Los Angeles", "state": "CA"},
{"name": "Bellevue", "state": "WA"},
{"name": "New York", "state": "NY"},
{"name": "San Antonio", "state": "TX"},
{"name": "Portland", "state": "OR"}
]
}
a user might want to select locations on the west coast, which in
this specific example means cities in either WA
, OR
, or
CA
. It’s not possible to express this as a filter expression
given the grammar of expression comparator expression
. Ideally
a user should be able to use:
cities[?state == `WA` || state == `OR` || state == `CA`]
JMESPath already supports Or expressions, just not in the context of filter expressions.
And Expressions
The next missing feature of filter expressions is support for And expressions. It’s actually somewhat odd that JMESPath has support for Or expressions, but not for And expressions. For example, given a list of user accounts with permissions:
{
"users": [
{"name": "user1", "type": "normal"", "allowed_hosts": ["a", "b"]},
{"name": "user2", "type": "admin", "allowed_hosts": ["a", "b"]},
{"name": "user3", "type": "normal", "allowed_hosts": ["c", "d"]},
{"name": "user4", "type": "admin", "allowed_hosts": ["c", "d"]},
{"name": "user5", "type": "normal", "allowed_hosts": ["c", "d"]},
{"name": "user6", "type": "normal", "allowed_hosts": ["c", "d"]}
]
}
We’d like to find admin users that have permissions to the host named
c
. Ideally, the filter expression would be:
users[?type == `admin` && contains(allowed_hosts, `c`)]
Unary Expressions
Think of an if statement in a language such as C or Java. While you can write an if statement that looks like:
if (foo == bar) { ... }
You can also use a unary expression such as:
if (allowed_access) { ... }
or:
if (!allowed_access) { ... }
Adding support for unary expressions brings a natural syntax when filtering against boolean values. Instead of:
foo[?boolean_var == `true`]
a user could instead use:
foo[?boolean_var]
As a more realistic example, given a slightly different structure
for the users
data above:
{
"users": [
{"name": "user1", "is_admin": false, "disabled": false},
{"name": "user2", "is_admin": true, "disabled": true},
{"name": "user3", "is_admin": false, "disabled": false},
{"name": "user4", "is_admin": true, "disabled": false},
{"name": "user5", "is_admin": false, "disabled": true},
{"name": "user6", "is_admin": false, "disabled": false}
]
}
If we want to get the names of all admin users whose account is enabled, we could either say:
users[?is_admin == `true` && disabled == `false]
but it’s more natural and succinct to instead say:
users[?is_admin && !disabled]
A case can be made that this syntax is not strictly necessary. This is true. However, the main reason for adding support for unary expressions in a filter expression is users expect this syntax, and are surprised when this is not a supported syntax. Especially now that we are basically anchoring to a C-like syntax for filtering in this JEP, users will expect unary expressions even more.
Paren Expressions
Once ||
and &&
statements have been introduced, there will be times
when you want to override the precedence of these operators.
A paren-expression
allows a user to override the precedence order of
an expression, e.g. (a || b) && c
, instead of the default precedence
of a || (b && c)
for the expression a || b && c
.
Specification
There are several updates to the grammar:
and-expression = expression "&&" expression
not-expression = "!" expression
paren-expression = "(" expression ")"
Additionally, the filter-expression
rule is updated
to be more general:
bracket-specifier =/ "[?" expression "]"
The list-filter-expr
is now a more general
comparator-expression
:
comparator-expression = expression comparator expression
which is now just an expression:
expression /= comparator-expression
And finally, the current-node
is now allowed as a generic
expression:
expression /= current-node
Operator Precedence
This JEP introduces and expressions, which would normally be defined as:
expression = or-expression / and-expression / not-expression
or-expression = expression "||" expression
and-expression = expression "&&" expression
not-expression = "!" expression
However, if this current pattern is followed, it makes it impossible to parse an expression with the correct precedence. A more standard way of expressing this would be:
expression = or-expression
or-expression = and-expression "||" and-expression
and-expression = not-expression "&&" not-expression
not-expression = "!" expression
The precedence for the new boolean expressions matches how most other languages define boolean expressions. That is from weakest binding to tightest binding:
-
Or -
||
-
And -
&&
-
Unary not -
!
So for example, a || b && c
is parsed as a || (b && c)
and
not (a || b) && c
.
The operator precedence list in the specification will now read:
-
Pipe -
|
-
Or -
||
-
And -
&&
-
Unary not -
!
-
Rbracket -
]
Now that these expressions are allowed as general expressions
, there
semantics outside of their original contexts must be defined.
And Expressions
For reference, the JMESPath spec already defines the following values as “false-like” values:
-
Empty list:
[]
-
Empty object:
{}
-
Empty string:
""
-
False boolean:
false
-
Null value:
null
And any value that is not a false-like value is a truth-like value.
An and-expression
has similar semantics to and expressions in other
languages. If the expression on the left hand side is a truth-like value, then
the value on the right hand side is returned. Otherwise the result of the
expression on the left hand side is returned. This also reduces to the
expected truth table:
Truth table for and expressions
LHS | RHS | Result |
---|---|---|
True | True | True |
True | False | False |
False | True | False |
False | False | False |
This is the standard truth table for a logical conjunction (AND).
Below are a few examples of and expressions:
Examples
search(True && False, {"True": true, "False": false}) -> false
search(Number && EmptyList, {"Number": 5, EmptyList: []}) -> []
search(foo[?a == `1` && b == `2`],
{"foo": [{"a": 1, "b": 2}, {"a": 1, "b": 3}]}) -> [{"a": 1, "b": 2}]
Not Expressions
A not-expression
negates the result of an expression. If the expression
results in a truth-like value, a not-expression
will change this value to
false
. If the expression results in a false-like value, a
not-expression
will change this value to true
.
Examples
search(!True, {"True": true}) -> false
search(!False, {"False": false}) -> true
search(!Number, {"Number": 5}) -> false
search(!EmptyList, {"EmptyList": []}) -> true
Paren Expressions
A paren-expression
allows a user to override the precedence order of
an expression, e.g. (a || b) && c
.
Examples
search(foo[?(a == `1` || b ==`2`) && c == `5`],
{"foo": [{"a": 1, "b": 2, "c": 3}, {"a": 3, "b": 4}]}) -> []
Rationale
This JEP brings several tokens that were only allowed in specific constructs
into the more general expression
rule. Specifically:
-
The
current-node
(@
) was previously only allowed in function expressions, but is now allowed as a generalexpression
. -
The
filter-expression
now accepts any arbitraryexpression
. -
The
list-filter-expr
is now just a genericcomparator-expression
, which again is just a generalexpression
.
There are several reasons the previous grammar rules were minimally scoped. One of the main reasons, as stated in JEP 7 which introduced filter expressions, was to keep the spec “purposefully minimal.” In fact the end of JEP 7 states that there “are several extensions that can be added in future.” This is in fact exactly what this JEP proposes, the recommendations from JEP 7.
Slice Projections
- JEP: 10
- Author: James Saryerwinnie
- Created: 2015-02-08
Abstract
This document proposes modifying the semantics of slice expressions to create projections, which brings consistency with the wildcard, flattening, and filtering projections.
Motivation
JEP 5 introduced slice expressions. This added python slice semantics
to JSON. Slicing does not produce a projection so expressions such as
the following will always return null
: myarray[:10].foo.bar
.
Instead if you wanted to access foo.bar
for each element in the
array slice you currently have to write myarray[:10][*].foo.bar
.
This JEP proposes that a slice expression will create a projection.
Rationale
A reasonable objection to this JEP is that this is unnecessary because, as
shown in the example above, you can take any slice and create a projection via
[\*]
. This is entirely true, unlike other JEPs, this JEP does not enable
any behavior that was previously not possible.
Instead, the main reason for this JEP is for consistency. Right now there are three types of array projections:
-
List Projections (
foo[*].bar
) -
Filter Projections (
foo[?a==b].bar
) -
Flatten Projections (
foo[].bar
)
Note the general form, foo[<stuff here>].<child-expr>
. Each of the
existing array projections have the same general semantics:
-
Take the left hand side, which is a list, and produce another list as a result of evaluating the left hand side. This newly produced list will contain elements of the original input (or elements of the elements of the original input in the case of the flatten projection).
-
Evaluate the right hand side against each element in the list produced from evaluating the left hand side.
So in general, the left hand side is responsible for creating a new list but not for manipulating individual elements of the list. The right hand side is for manipulating individual elements of the list. In the case of the list projection, every element from the original list is used. In the case of a filter projection, only elements matching an expression are passed to the right hand side. In the case of a flatten projection, sub arrays are merged before passing the expression onto the right hand side.
It’s a reasonable expectation that slices behave similar. After all, slices take an array and produce a sub array. It many ways, it’s very similar to filter projections. While filter projections only include elements that match a particular expression, slice projections only include elements from and to a specific index. Given its semantics are so close to the filter projections, slices should create projections to be consistent.
Specification
Whenver a slice is created, a projection will be created. This will be the fourth type of array projection in JMESPath. In addition to the existing array projections:
-
List Projections
-
Flatten Projections
-
Filter Projections
A new projection type, the slice projection will be added. A slice projection is evaluated similar to the other array projections. Given a slice projection which contains a left hand side containing the slice expression and a right hand side, the slice expression is evaluated to create a new sub array, and each expression on the right hand side is evaluted against each element from the array slice to create the final result.
This JEP does not include any modifications to the JMESPath grammar.
Impact
The impact to existing users of slices is minimal. Consider:
-
Existing expressions such as
foo[:10].bar
are currently returningnull
. Now they will return nonnull
values. -
The only impact to existing users is if someone had an expression such as
foo[:10][0]
, which given the projection semantics will now create a list containing the 0th element from each sublist. Before this JEP, that expression is equivalent tofoo[0]
so the slice is unnecessary. And any users that actually had expressions like this can now just usefoo[0]
instead.
Lexical Scoping
- JEP: 11
- Author: James Saryerwinnie
- Created: 2015-02-24
Abstract
This JEP proposes a new function let()
(originally proposed by Michael
Dowling) that allows for evaluating an expression with an explicitly
defined lexical scope. This will require some changes to the lookup
semantics in JMESPath to introduce scoping, but provides useful
functionality such as being able to refer to elements defined outside of
the current scope used to evaluate an expression.
Motivation
As a JMESPath expression is being evaluated, the current element, which
can be explicitly referred to via the @
token, changes as expressions
are evaluated. Given a simple sub expression such as foo.bar
, first
the foo
expression is evaluted with the starting input JSON document,
and the result of that expression is then used as the current element
when the bar
element is evaluted. Conceptually we're taking some
object, and narrowing down its current element as the expression is
evaluted.
Once we've drilled down to a specific current element, there is no way, in the context of the currently evaluated expression, to refer to any elements outside of that element. One scenario where this is problematic is being able to refer to a parent element.
For example, suppose we had this data:
{"first_choice": "WA",
"states": [
{"name": "WA", "cities": ["Seattle", "Bellevue", "Olympia"]},
{"name": "CA", "cities": ["Los Angeles", "San Francisco"]},
{"name": "NY", "cities": ["New York City", "Albany"]},
]
}
Let's say we wanted to get the list of cities of the state
corresponding to our first_choice
key. We'll make the assumption that
the state names are unique in the states
list. This is currently not
possible with JMESPath. In this example we can hard code the state WA
:
states[?name==`WA`].cities
but it is not possible to base this on a value of first_choice
, which
comes from the parent element. This JEP proposes a solution that makes
this possible in JMESPath.
Specification
There are two components to this JEP, a new function, let()
, and a
change to the way that identifiers are resolved.
The let() Function
The let()
function is heavily inspired from the let
function
commonly seen in the Lisp family of languages:
The let function is defined as follows:
any let(object scope, expression->any expr)
let
is a function that takes two arguments. The first argument is a
JSON object. This hash defines the names and their corresponding values
that will be accessible to the expression specified in the second
argument. The second argument is an expression reference that will be
evaluated.
Resolving Identifiers
Prior to this JEP, identifiers are resolved by consulting the current
context in which the expression is evaluted. For example, using the same
search
function as defined in the JMESPath specification, the
evaluation of:
search(foo, {"foo": "a", "bar": "b"}) -> "a"
will result in the foo
identifier being resolved in the context of the
input object {"foo": "a", "bar": "b"}
. The context object defines
foo
as a
, which results in the identifier foo
being resolved as
a
.
In the case of a sub expression, where the current evaluation context changes once the left hand side of the sub expression is evaluted:
search(a.b, {"a": {"b": "y"}) -> "y"
The identifier b
is resolved with a current context of {"b": "y"}
,
which results in a value of y
.
This JEP adds an additional step to resolving identifiers. In addition
to the implicit evaluation context that changes based on the result of
continually evaluating expressions, the let()
command allows for
additional contexts to be specified, which we refer to by the common
name scope. The steps for resolving an identifier are:
- Attempt to lookup the identifier in the current evaluation context.
- If this identifier is not resolved, look up the value in the current scope provided by the user.
- If the idenfitier is not resolved and there is a parent scope,
attempt to resolve the identifier in the parent scope. Continue
doing this until there is no parent scope, in which case, if the
identifier has not been resolved, the identifier is resolved as
null
.
Parent scopes are created by nested let()
calls.
Below are a few examples to make this more clear. First, let's examine the case where the identifier can be resolved from the current evaluation context:
search(let({a: `x`}, &b), {"b": "y"}) -> "y"
In this scenario, we are evaluating the expression b
, with the context
object of {"b": "y"}
. Here b
has a value of y
, so the result of
this function is y
.
Now let's look at an example where an identifier is resolved from a
scope object provided via let()
:
search(let({a: `x`}, &a, {"b": "y"})) -> "x"
Here, we're trying to resolve the a
identifier. The current
evaluation context, {"b": "y"}
, does not define a
. Normally, this
would result in the identifier being resolved as null
:
search(a, {"b": "y"}) -> null
However, we now fall back to looking in the provided scope object
{"a": "x"}
, which was provided as the first argument to let
. Note
here that the value of a
has a value of "x"
, so the identifier is
resolved as "x"
, and the return value of the let()
function is
"x"
.
Finally, let's look at an example of parent scopes. Consider the following expression:
search(let({a: `x`}, &let({b: `y`}, &{a: a, b: b, c: c})),
{"c": "z"}) -> {"a": "x", "b": "y", "c": "z"}
Here we have nested let calls, and the expression we are trying to
evaluate is the multiselect hash {a: a, b: b, c: c}
. The c
identifier comes from the evaluation context {"c": "z"}
. The b
identifier comes from the scope object in the second let
call:
{b: `y`}
. And finally, here's the lookup process for the a
identifier:
- Is
a
defined in the current evaluation context? No. - Is
a
defined in the scope provided by the user? No. - Is there a parent scope? Yes
- Does the parent scope,
{a: `x`}
, definea
? Yes,a
has the value of"x"
, soa
is resolved as the string"x"
.
Current Node Evaluation
While the JMESPath specification defines how the current node is
determined, it is worth explicitly calling out how this works with the
let()
function and expression references. Consider the following
expression:
a.let({x: `x`}, &b.let({y: `y`}, &c))
Given the input data:
{"a": {"b": {"c": "foo"}}}
When the expression c
is evaluated, the current evaluation context is
{"c": "foo"}
. This is because this expression isn't evaluated until
the second let()
call evaluates the expression, which does not occur
until the first let()
function evaluates the expression.
Motivating Example
With these changes defined, the expression in the "Motivation" section can be be written as:
let({first_choice: first_choice}, &states[?name==first_choice].cities)
Which evalutes to ["Seattle", "Bellevue", "Olympia"]
.
Rationale
If we just consider the feature of being able to refer to a parent
element, this approach is not the only way to accomplish this. We could
also allow for explicit references using a specific token, say $
. The
original example in the "Motivation" section would be:
states[?name==$.first_choice].cities
While this could work, this has a number of downsides, the biggest one
being that you'll need to always keep track of the parent element. You
don't know ahead of time if you're going to need the parent element,
so you'll always need to track this value. It also doesn't handle
nested lexical scopes. What if you wanted to access a value in the grand
parent element? Requiring an explicit binding approach via let()
handles both these cases, and doesn't require having to track parent
elements. You only need to track additional scope when let()
is used.
Raw String Literals
- JEP: 12
- Author: Michael Downling
- Created: 2015-04-09
Abstract
This JEP proposes the following modifications to JMESPath in order to improve the usability of the language and ease the implementation of parsers:
-
Addition of a raw string literal to JMESPath that will allow expressions to contain raw strings that are not mutated by JSON escape sequences (e.g., “\n”, “\r”, “\u005C”).
-
Deprecation of the current literal parsing behavior that allows for unquoted JSON strings to be parsed as JSON strings, removing an ambiguity in the JMESPath grammar and helping to ensure consistency among implementations.
This proposal seeks to add the following syntax to JMESPath:
'foobar'
'foo\'bar'
`bar` -> Parse error/warning (implementation specific)
Motivation
Raw string literals are provided in various programming languages in order to prevent
language specific interpretation (i.e., JSON parsing) and remove the need for
escaping, avoiding a common problem called leaning toothpick syndrome (LTS). Leaning toothpick
syndrome is an issue in which strings become unreadable due to excessive use of
escape characters in order to avoid delimiter collision (e.g., \\\\\\\\\\\\
).
When evaluating a JMESPath expression, it is often necessary to utilize string literals that are not extracted from the data being evaluated, but rather statically part of the compiled JMESPath expression. String literals are useful in many areas, but most notably when invoking functions or building up multi-select lists and hashes.
The following expression returns the number of characters found in the string
"foo"
. When parsing this expression, `"foo"`
is parsed as a JSON value
which produces the string literal value of foo
:
`"foo"`
The following expression is functionally equivalent. Notice that the quotes are elided from the JSON literal:
`foo`
These string literals are parsed using a JSON parser according to RFC 4627, which will expand unicode escape sequences, newline characters, and several other escape sequences documented in RFC 4627 section 2.5.
For example, the use of an escaped unicode value \\u002B
is expanded into
+
in the following JMESPath expression:
`"foo\u002B"` -> "foo+"
You can escape escape sequences in JSON literals to prevent an escape sequence from being expanded:
`"foo\\u002B"` -> "foo\u002B"
`foo\\u002B` -> "foo\u002B"
While this allows you to provide literal strings, it presents the following problems:
-
Incurs an additional JSON parsing penalty.
-
Requires the cognitive overhead of escaping escape characters if you actually want the data to be represented as it was literally provided (which can lead to LTS). If the data being escaped was meant to be used along with another language that uses
\\
as an escape character, then the number of backslash characters doubles. -
Introduces an ambiguous rule to the JMESPath grammar that requires a prose based specification to resolve the ambiguity in parser implementations.
The relevant literal grammar rules are currently defined as follows:
literal = "`" json-value "`"
literal =/ "`" 1*(unescaped-literal / escaped-literal) "`"
unescaped-literal = %x20-21 / ; space !
%x23-5B / ; # - [
%x5D-5F / ; ] ^ _
%x61-7A ; a-z
%x7C-10FFFF ; |}~ ...
escaped-literal = escaped-char / (escape %x60)
json-value = false / null / true / json-object / json-array /
json-number / json-quoted-string
false = %x66.61.6c.73.65 ; false
null = %x6e.75.6c.6c ; null
true = %x74.72.75.65 ; true
json-quoted-string = %x22 1*(unescaped-literal / escaped-literal) %x22
begin-array = ws %x5B ws ; [ left square bracket
begin-object = ws %x7B ws ; { left curly bracket
end-array = ws %x5D ws ; ] right square bracket
end-object = ws %x7D ws ; } right curly bracket
name-separator = ws %x3A ws ; : colon
value-separator = ws %x2C ws ; , comma
ws = *(%x20 / ; Space
%x09 / ; Horizontal tab
%x0A / ; Line feed or New line
%x0D ; Carriage return
)
json-object = begin-object [ member *( value-separator member ) ] end-object
member = quoted-string name-separator json-value
json-array = begin-array [ json-value *( value-separator json-value ) ] end-array
json-number = [ minus ] int [ frac ] [ exp ]
decimal-point = %x2E ; .
digit1-9 = %x31-39 ; 1-9
e = %x65 / %x45 ; e E
exp = e [ minus / plus ] 1*DIGIT
frac = decimal-point 1*DIGIT
int = zero / ( digit1-9 *DIGIT )
minus = %x2D ; -
plus = %x2B ; +
zero = %x30 ; 0
The literal
rule is ambiguous because unescaped-literal
includes
all of the same characters that json-value
match, allowing any value
that is valid JSON to be matched on either unescaped-literal
or
json-value
.
Rationale
When implementing parsers for JMESPath, one must provide special case parsing
when parsing JSON literals due to the allowance of elided quotes around JSON
string literals (e.g., `foo`
). This specific aspect of JMESPath cannot be
described unambiguously in a context free grammar and could become a common
cause of errors when implementing JMESPath parsers.
Parsing JSON literals has other complications as well. Here are the steps needed to currently parse a JSON literal value in JMESPath:
-
When a
`
token is encountered, begin parsing a JSON literal. -
Collect each character between the opening
`
and closing`
tokens, including any escaped`
characters (i.e.,\`
) and store the characters in a variable (let’s call it$lexeme
). -
Copy the contents of
$lexeme
to a temporary value in which all leading and trailing whitespace is removed. Let’s call this$temp
(this is currently not documented but required in the JMESPath compliance tests). -
If
$temp
can be parsed as valid JSON, then use the parsed result as the value for the literal token. -
If
$temp
cannot be parsed as valid JSON, then wrap the contents of$lexeme
in double quotes and parse the wrapped value as a JSON string, making the following expressions equivalent:`foo`
==`"foo"`
, and`[1, ]`
==`"[1, ]"`
.
It is reasonable to assume that the most common use case for a JSON literal in a JMESPath expression is to provide a string value to a function argument or to provide a literal string value to a value in a multi-select list or multi-select hash. In order to make providing string values easier, it was decided that JMESPath should allow the quotes around the string to be elided.
This proposal posits that allowing quotes to be elided when parsing JSON literals should be deprecated in favor of adding a proper string literal rule to JMESPath.
Specification
A raw string literal is value that begins and ends with a single quote, does not interpret escape characters, and may contain escaped single quotes to avoid delimiter collision.
Examples
Here are several examples of valid raw string literals and how they are parsed:
- A basic raw string literal, parsed as
foo bar
:
'foo bar'
- An escaped single quote, parsed as
foo'bar
:
'foo\'bar'
- A raw string literal that contains new lines:
'foo
bar
baz!'
The above expression would be parsed as a string that contains new lines:
foo
baz
bar!
- A raw string literal that contains escape characters,
parsed as
foo\\nbar
:
foo\nbar
ABNF
The following ABNF grammar rules will be added, and is allowed anywhere an expression is allowed:
raw-string = "'" *raw-string-char "'"
; The first grouping matches any character other than "\"
raw-string-char = (%x20-26 / %x28-5B / %x5D-10FFFF) / raw-string-escape
raw-string-escape = escape ["'"]
This rule allows any character inside of a raw string, including an escaped single quote.
In addition to adding a raw-string
rule, the literal
rule in the ABNF
will be updated to become:
literal = "`" json-value "`"
Impact
The impact to existing users of JMESPath is that the use of a JSON literal in which the quotes are elided SHOULD be converted to use the string-literal rule of the grammar. Whether or not this conversion is absolutely necessary will depend on the specific JMESPath implementation.
Implementations MAY choose to support the old syntax of allowing elided quotes in JSON literal expressions. If an implementation chooses this approach, the implementation SHOULD raise some kind of warning to the user to let them know of the deprecation and possible incompatibility with other JMESPath implementations.
In order to support this type of variance in JMESPath implementations, all of the JSON literal compliance test cases that involve elided quotes MUST be removed, and test cases regarding failing on invalid unquoted JSON values MUST not be allowed in the compliance test unless placed in a JEP 12 specific test suite, allowing implementations that support elided quotes in JSON literals to filter out the JEP 12 specific test cases.
Alternative approaches
There are several alternative approaches that could be taken.
Leave as-is
This is a valid and reasonable suggestion. Leaving JMESPath as-is would avoid a breaking change to the grammar and users could continue to use multiple escape characters to avoid delimiter collision.
The goal of this proposal is not to add functionality to JMESPath, but rather to make the language easier to use, easier to reason about, and easier to implement. As it currently stands, the behavior of JSON parsing is ambiguous and requires special casing when implementing a JMESPath parser. It also allows for minor differences in implementations due to this ambiguity.
Take the following example:
`[1`
One implementation may interpret this expression as a JSON string with the
string value of "[1"
, while other implementations may raise a parse error
because the first character of the expression appears to be valid JSON.
By updating the grammar to require valid JSON in the JSON literal token, we can remove this ambiguity completely, removing a potential source of inconsistency from the various JMESPath implementations.
Disallow single quotes in a raw string
This proposal states that single quotes in a raw string literal must be escaped
with a backslash. An alternative approach could be to not allow single quotes
in a raw string literal. While this would simplify the raw-string
grammar
rule, it would severely limit the usability of the raw-string
rule, forcing
users to use the literal
rule.
Use a customizable delimiter
Several languages allow for a custom delimiter to be placed around a raw
string. For example, Lua allows for a long bracket notation in which raw
strings are surrounded by [[]]
with any number of balanced = characters
between the brackets:
[==[foo=bar]==] -- parsed as "foo=bar"
This approach is very flexible and removes the need to escape any characters; however, this can not be expressed in a regular grammar. A parser would need to keep track of the number of opened delimiters and ensure that it is closed with the appropriate number of matching characters.
The addition of a string literal as described in this JEP does not preclude a later addition of a heredoc or delimited style string literal as provided by languages like Lua, D, C++, etc…
Lexical Scoping
- JEP: 18
- Author: @jamesls
- Created: 2023-03-21
Abstract
This JEP proposes the introduction of lexical scoping using a new
let
expression. You can now bind variables that are evaluated in the
context of a given lexical scope. This enables queries that can refer to
elements defined outside of their current element, which is not currently
possible. This JEP supercedes JEP 11, which proposed similar functionality
through a let()
function.
Motivation
A JMESPath expression is always evaluated in the context of a current
element, which can be explicitly referred to via the @
token. The
current element changes as expressions are evaluated. For example,
suppose we had the expression foo.bar[0]
that we want to evalute against
an input document of:
{"foo": {"bar": ["hello", "world"]}, "baz": "baz"}
The expression, and the associated current element are evaluated as follows:
# Start
expression = foo.bar[0]
@ = {"foo": {"bar": ["hello", "world"]}, "baz": "baz"}
# Step 1
expression = foo
@ = {"foo": {"bar": ["hello", "world"]}, "baz": "baz"}
result = {"bar": ["hello", "world"]}
# Step 2
expression = bar
@ = {"bar": ["hello", "world"]}
result = ["hello", "world"]
# Step 3
expression = [0]
@ = ["hello", "world"]
result = "hello"
The end result of evaluating this expression is "hello"
. Note that each
step changes the values that are accessible to the current expression being
evaluated. In "Step 2", it is not possible for the expression to reference
the value of "baz"
in the current element of the previous step, "Step 1".
This ability to reference variables in a parent scope is a serious limitation of JMESPath, and anecdotally is one of the commonly requested features of the language. Below are examples of input documents and the desired output documents that aren't possible to create with the current version of JMESPath:
Input:
[
{"home_state": "WA",
"states": [
{"name": "WA", "cities": ["Seattle", "Bellevue", "Olympia"]},
{"name": "CA", "cities": ["Los Angeles", "San Francisco"]},
{"name": "NY", "cities": ["New York City", "Albany"]}
]
},
{"home_state": "NY",
"states": [
{"name": "WA", "cities": ["Seattle", "Bellevue", "Olympia"]},
{"name": "CA", "cities": ["Los Angeles", "San Francisco"]},
{"name": "NY", "cities": ["New York City", "Albany"]}
]
}
]
(for each list in "states", select the list of cities associated
with the state defined in the "home_state" key)
Output:
[
["Seattle", "Bellevue", "Olympia"],
["New York City", "Albany"]
]
Input:
{"imageDetails": [
{
"repositoryName": "org/first-repo",
"imageTags": ["latest", "v1.0", "v1.2"],
"imageDigest": "sha256:abcd"
},
{
"repositoryName": "org/second-repo",
"imageTags": ["v2.0", "v2.2"],
"imageDigest": "sha256:efgh"
},
]}
(create a list of pairs containing an image tag and its associated repo name)
Output:
[
["latest", "org/first-repo"],
["v1.0", "org/first-repo"],
["v1.2", "org/first-repo"],
["v2.0", "org/second-repo"],
["v2.2", "org/second-repo"],
]
In order to support these queries we need some way for an expression to reference values that exist outside of its implicit current element.
Specification
A new "let expression" is added to the language. The expression has the
format: let <bindings> in <expr>
. The updated grammar rules in ABNF are:
let-expression = "let" bindings "in" expression
bindings = variable-binding *( "," variable-binding )
variable-binding = variable-ref "=" expression
variable-ref = "$" unquoted-string
The let-expression
and variable-ref
rule are also added as a new expression
types:
expression =/ let-expression / variable-ref
Examples of this new syntax:
let $foo = bar in {a: myvar, b: $foo}
let $foo = baz[0] in bar[? baz == $foo ] | [0]
let $a = b, $c = d in bar[*].[$a, $c, foo, bar]
It's worth noting that this is the first JEP to introduce keywords into the
language: the let
and in
keywords. These are not reserved keywords, these
words can continue to be used as identifiers in expressions. There are no
backwards incompatible changes being proposed with this JEP. The grammar rules
unambiguously describe whether let
is meant to be interpreted as a keyword
or as an identifier (often referred to as contextual keywords).
New evaluation rules
Let expressions are evaluated as follows.
Given the rule "let" bindings "in" expression
, the bindings
rule is
processed first. Each variable-binding
within the bindings
rule defines
the name of a variable and an expression. Each expression is evaluated, and the
result of this evaluation is then bound to the associated variable name.
Once all the variable-binding
rules have been processed, the associated
expression
clause of the let expression is then evaluated. During the
evaluation of the expression, any references, via the variable-ref
rule, to a
variable name will evaluate to the value bound to the name. Once the
associated expression has been evaluated, the let expression itself evaluates
to the result of this expression. After the let expression has been evaluated,
the variable bindings associated with the let expression are no longer valid.
This is also referred to as the visibility of a binding; the bindings of a
let expression are only visible during the evaluation of the expression
clause of the let expression.
When evaluating the bindings
rule, a variable-binding
for a variable name
that is already visible in the current scope will replace the existing binding
when evaluating the expression
clause of the let expression. This means in
the context of nested let expressions (and consequently nested scopes), a
variable in an inner scope can shadow a variable defined in an outer scope.
If a variable-ref
references a variable that has not been defined, the
evaluation of that variable-ref
will trigger an undefined-variable
error.
This error MUST occur when the expression is evaluated and not at compile
time. This is to enable implementations to define an implementation specific
mechanism for defining an initial or "global" scope. Implementations are free
to offer a "strict" compilation mode that a user can opt into, but MUST support
triggering an undefined-variable
error only when the variable-ref
is
evaluated.
Note that when evaluating the bindings
rule, the expression bound
to a variable is completely evaluated before binding to the variable.
Any references to the variable are replaced with the result of this evaluation,
the expression is not re-evaluated. This is worth clarifying specifically
for projections (wildcard expressions, the flatten operator, slices and
filter expressions). If the expression being bound is a projection, the
evaluation of this expression effectively stops the projection. This means
subsequent references using the variable-ref
MUST NOT continue projecting
to child expressions. For example, this is the behavior for a projection:
search(
foo[*][0]
{"foo": [[0, 1], [2, 3], [4, 5]]}
) -> [0, 2, 4]
And this is the behavior when assigning a variable to a projection:
search(
let $foo = foo[*]
in
$foo[0]
{"foo": [[0, 1], [2, 3], [4, 5]]}
) -> [0, 1]
In the first example, the [0]
expression is projected onto each element
in the list, returning the first element of each sub list: [0, 2, 4]
.
In the second example, the foo[*]
expression is evaluated to
[[0, 1], [2, 3], [4, 5]]
and assigned to the variable $foo
. The
projection expression evaluation is complete, and the projection is stopped.
Evaluating the expression $foo[0]
results in the variable $foo
being
replaced with its bound value of [[0, 1], [2, 3], [4, 5]]
, so the entire
expression becomes [[0, 1], [2, 3], [4, 5]][0]
, which returns the first
element in the list which is [0, 1]
.
Examples
Basic examples demonstrating core functionality.
search(let $foo = foo in $foo, {"foo": "bar"}) -> "bar"
search(let $foo = foo.bar in $foo, {"foo": {"bar": "baz"}}) -> "baz"
search(let $foo = foo in [$foo, $foo], {"foo": "bar"}) -> ["bar", "bar"]
Nested bindings.
search(
let $a = a
in
b[*].[a, $a, let $a = 'shadow' in $a],
{"a": "topval", "b": [{"a": "inner1"}, {"a": "inner2"}]}
) -> [["inner1", "topval", "shadow"], ["inner2", "topval", "shadow"]]
Error cases.
search($foo, {}) -> <error: undefined-variable>
search([let $foo = 'bar' in $foo, $foo], {}) -> <error: undefined-variable>
Rationale
Note: see previous discussion for more background.
Introducing keywords into the language
The let expression proposed in this JEP is based off of similar constructs in existing programming languages:
It was important to borrow from existing syntax and semantics. Lexical scoping is a familiar concept to developers, so care was taken to be consistent with the mental model that developers already have.
Alternatives were considered that avoided introducing new keywords into the language (this proposal adds the first keyword to the language). These included some variation that approximated defining an anonymous function with arguments, e.g.:
|foo, bar| => {$foo: a, $bar: b}
The reason for not going with this approach is that adding the ability to define functions is a large feature that will take considerable effort to design. This may be something to consider in the future, but it's a larger scope than introducing lexical scoping and made the most sense to address separately. We'd also need to introduce not only defining anonymous functions with arguments, but also a mechanism to invoke such functions. You can then create lexical scope by defining a function and immediately invoking it. For example, in javascript it would look like this:
(({x, y}) => ([x, y]))(
{x: "foo", y: "bar"}
);
This was considered too verbose for such a common use case of defining
variables. It makes sense that a dedicated, more succinct syntax was
preferred, as many languages have a dedicated let
syntax for defining
variables.
Backwards compatibility concern
Languages will often design keywords as reserved words that can't be used
as variable names or other identifiers. This helps to provide clarity
because the reader knows that the keyword can only have a single meaning.
This is possible to do when you first design the language, or if you are
willing to introduce breaking changes into the language. JMESPath instead
takes an alternate approach of introducing keywords that can be inferred
from the context in which they're used, which is known as contextual
keywords. There are other languages that also take this approach,
such as C# <https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/#contextual-keywords>
__.
In order to do this, the updated grammar rules must be chosen to avoid any ambiguity when parsing expressions. This may limit the syntax and location where the new keywords could be used, so the tradeoffs of adding a new keyword must be considered carefully.
We should be wary of adding new keywords to JMESPath, and only do so when
there is a strong rationale for doing so. let
is one such case, as detailed
in this section.
Adding a sigil for variable references
One of the changes from an earlier proposal of this feature (JEP-11) is that
this proposal adds explicit syntax for variable references via the $foo
syntax. The lookup process between the expression foo
and $foo
are
fundamentally different types of lookup. One searches through values
from the implicit current element and one is a lookup in the lexical scope.
Not having a syntactic difference creates ambiguity regarding the intended
type of lookup by the user. This also prevents defining where scoped lookups
are allowed through the grammar. For example, in the expression foo.bar
it is unclear whether bar
refers to a lookup in the current element or the
lexical scope. Having explicit syntax removes this ambiguity, allowing a user
to explicitly state their intent. It also enables distinct error conditions.
A reference to a non-existent variable is an error, as the user provided
explicit syntax stating that they expect the variable to exist. The variable
not existing is result of the user not binding the variable name at some
point, which is an error. Conversely, an expression evaluated against the
current element results in null
if the key does not exist. This is
because the query is being evaluated against an input JSON document, and
we don't know what keys may or may not be present.
Multiple assignments with commas
Assigning multiple variables is done through comma separated variable-binding
rules, e.g. $foo = foo, $bar = bar
. An alternative considered was to use
syntax similar to Javascript's object destructuring:
let {$foo, $bar} = @ in ...
There are several reasons this alternative was not chosen:
- This requires multiple assignments to come from an object type, which might require a user to unnecessarily create a multi-select-hash in order to assign multiple variables.
- Destructuring binds to top level values in an object, and does not allow for a single binding to evaluate to an expression, without having to again preconstruct that value via a multi-select-hash.
- Object destructuring is an additive change. Nothing in this JEP precludes this addition in the future, e.g.:
let {$foo, $bar} = @ in ...
let {$foo, $bar} = @, $baz = a.b.c in ...
Unbound values error at evaluation time not at compile time
The JEP also requires that unbound values error at evaluation time, not at compile time. This enables implementations to bind an initial (global) scope when a query is evaluated. This is something that other query languages provide, and is useful to define fixed queries that only vary by the value of specific variables. We'll look at a few examples.
First, consider a command line utility, let's called it jp
, that accepts a
path to a file containing a JMESPath query and reads an input JSON document
through stdin. This command line utility could offer a --params
option that
allows a user to pass in an initial scope. For example:
myquery.jmespath
results[*].[name, uuid, $hostname]
A user could then use this CLI to retrieve JSON data and filter it:
$ curl https://myapi/info/$HOSTNAME | \
jp --filename myquery.jmespath --params '{"hostname": "$HOSTNAME"}'
In this case the JMESPath expression does not need to change and can be shared with other people, and still include data that's specific to your machine.
Another example would be where JMESPath is used in some shared definition file. Suppose we had a file that defines how to make an API request, and specifies a condition we'd like to meet based on the output response. We want to describe that the expected output depends on the input provided. This is how we can describe this:
{"GroupActive": {
"operation": "DescribeGroups",
"acceptors": {
"argument": "Response[].[length(Instances[?State=='Active']) == length($params.GroupNames)",
"matcher": "path"
}
}}
This is saying that we should invoke the DescribeGroups
operations with a
list of group names, and that we want to check that the response contains a
list of Instances
with State == 'Active'
whose length matches the length of
the params group names. You could now bind the user provided params as the
initial scope of {"params": inputParams}
and code generate something like
this (using the python JMESPath library in this example):
def wait(user_params):
response = client.DescribeGroups(user_params)
expected = jmespath.compile(
"Response[].[length(Instances[?State=='Active']) "
"== length($params.GroupNames)"
)
result = expected.search(
response,
# This is the new part, give queries access to the user params
# via the $params variable.
scope={'params': user_params},
)
if result:
return "SomeSuccessResponse"
return "SomeFailureResponse"
# User can invoke this via:
wait({"GroupNames": ["group1", "group2", "group3"]})
This JEP does not require that implementations provide this capability of passing in an initial scope, but by requiring that undefined variable references are runtime errors it enables implementations to provide this capability. Implementations are also free to provide an opt-in "strict" mode that can fail at compile time if a user knows they will not be providing an initial scope.
Testcases
Basic expressions
# Basic expressions
- given:
foo:
bar: baz
cases:
- expression: "let $foo = foo in $foo"
result:
bar: baz
- expression: "let $foo = foo.bar in $foo"
result: "baz"
- expression: "let $foo = foo.bar in [$foo, $foo]"
result: ["baz", "baz"]
- comment: "Multiple assignments"
expression: "let $foo = 'foo', $bar = 'bar' in [$foo, $bar]"
result: ["foo", "bar"]
# Nested expressions
- given:
a: topval
b:
- a: inner1
- a: inner2
cases:
- expression: "let $a = a in b[*].[a, $a, let $a = 'shadow' in $a]"
result:
- ["inner1", "topval", "shadow"]
- ["inner2", "topval", "shadow"]
- comment: Bindings only visible within expression clause
expression: "let $a = 'top-a' in let $a = 'in-a', $b = $a in $b"
result: "top-a"
# Let as valid identifiers
- given:
let:
let: let-val
in: in-val
cases:
- expression: "let $let = let in {let: let, in: $let}"
result:
let:
let: let-val
in: in-val
in:
let: let-val
in: in-val
- expression: "let $let = 'let' in { let: let, in: $let }"
result:
let:
let: let-val
in: in-val
in: "let"
- expression: "let $let = 'let' in { let: 'let', in: $let }"
result:
let: "let"
in: "let"
# Projections stop
- given:
foo: [[0, 1], [2, 3], [4, 5]]
cases:
- comment: Projection is stopped when bound to variable
expression: "let $foo = foo[*] in $foo[0]"
result: [0, 1]
# Examples from Motivation section
- given:
- home_state: WA
states:
- name: WA
cities: ["Seattle", "Bellevue", "Olympia"]
- name: CA
cities: ["Los Angeles", "San Francisco"]
- name: NY
cities: ["New York City", "Albany"]
- home_state: NY
states:
- name: WA
cities: ["Seattle", "Bellevue", "Olympia"]
- name: CA
cities: ["Los Angeles", "San Francisco"]
- name: NY
cities: ["New York City", "Albany"]
cases:
- expression: "[*].[let $home_state = home_state in states[? name == $home_state].cities[]][]"
result:
- ["Seattle", "Bellevue", "Olympia"]
- ["New York City", "Albany"]
- given:
imageDetails:
- repositoryName: "org/first-repo"
imageTags:
- latest
- v1.0
- v1.2
imageDigest: "sha256:abcd"
- repositoryName: "org/second-repo"
imageTags:
- v2.0
- v2.2
imageDigest: "sha256:efgh"
cases:
- expression: >
imageDetails[].[
let $repo = repositoryName,
$digest = imageDigest
in
imageTags[].[@, $digest, $repo]
][][]
result:
- ["latest", "sha256:abcd", "org/first-repo"]
- ["v1.0", "sha256:abcd", "org/first-repo"]
- ["v1.2", "sha256:abcd", "org/first-repo"]
- ["v2.0", "sha256:efgh", "org/second-repo"]
- ["v2.2", "sha256:efgh", "org/second-repo"]
# Errors
- given: {}
cases:
- expression: "$noexist"
error: "undefined-variable"
- comment: Reference out of scope variable
expression: "[let $scope = 'foo' in [$scope], $scope]"
error: "undefined-variable"
- comment: Can't use var ref in RHS of subexpression
expression: "foo.$bar"
error: "syntax"