7.7.2.2. ExtractorJSON#
7.7.2.2.1. Summary#
Added in version 16.0.6.
注釈
This is an experimental feature. Currently, this feature is still not stable.
ExtractorJSON extracts values from JSON data by a JSONPath
expression. You can use this extractor to index only the values you
need from JSON without indexing the whole JSON text.
The extracted values keep their JSON types. For example, a string in
JSON is extracted as a text and an integer in JSON is extracted as an
integer. So you can use ExtractorJSON for both full text search
against strings and range search against numbers.
注釈
ExtractorJSON requires the
jsoncons library. It isn't
available if Groonga is built without jsoncons.
7.7.2.2.2. Syntax#
ExtractorJSON has a required path parameter:
ExtractorJSON("path", "JSONPath")
7.7.2.2.3. Usage#
Here is an example that extracts all elements of the tags array by
the $.tags[*] JSONPath. The title value isn't extracted because it
doesn't match the JSONPath:
Execution example:
extract \
--extractors 'ExtractorJSON("path", "$.tags[*]")' \
--value '{"tags": ["groonga", "search", "engine"], "title": "ignored"}'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "extracted": [
# "groonga",
# "search",
# "engine"
# ]
# }
# ]
A JSONPath can match nested values. The following example extracts all
elements of nested arrays by the $.values[*][*] JSONPath:
Execution example:
extract \
--extractors 'ExtractorJSON("path", "$.values[*][*]")' \
--value '{"values": [[1, 10], [100, 1000]]}'
# [[0,1337566253.89858,0.000355720520019531],{"extracted":[1,10,100,1000]}]
When you attach ExtractorJSON to a lexicon, the lexicon indexes the
extracted values. The original JSON is kept in the data column.
The following example indexes integers in a JSON column. The lexicon
key type is Int32 because the extracted values are integers. The
index is used automatically when the JSON column is loaded, so you can
search the original records by the extracted values:
Execution example:
table_create Data TABLE_NO_KEY
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Data value COLUMN_SCALAR JSON
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Numbers TABLE_PAT_KEY Int32 \
--extractors 'ExtractorJSON("path", "$.value[*][*]")'
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Numbers data_value COLUMN_INDEX Data value
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Data
[
{"value": "{\"value\": [[1, 10], [100]]}"},
{"value": "{\"value\": [[2], [20, 200]]}"},
{"value": "{\"value\": [[-1, -10], [-100]]}"}
]
# [[0,1337566253.89858,0.000355720520019531],3]
select Data --filter 'between(Numbers.data_value, 10, 20)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "value",
# "JSON"
# ]
# ],
# [
# 1,
# {
# "value": [
# [
# 1,
# 10
# ],
# [
# 100
# ]
# ]
# }
# ],
# [
# 2,
# {
# "value": [
# [
# 2
# ],
# [
# 20,
# 200
# ]
# ]
# }
# ]
# ]
# ]
# ]
The records whose extracted values are between 10 and 20 are
selected even though the values are stored in nested arrays in JSON.
7.7.2.2.4. Parameters#
7.7.2.2.4.1. Required parameter#
7.7.2.2.4.1.1. path#
Specifies a JSONPath expression that matches the values to extract.
A JSONPath expression begins with $, which means the root of the
JSON value. For example, $.tags[*] matches all elements of the
tags array, and $.values[*][*] matches all elements of nested
arrays under values.
This parameter is required. ExtractorJSON reports an error if this
parameter is missing.