7.7.2.2. ExtractorJSON#

7.7.2.2.1. Summary#

Added in version 16.0.6.

注釈

This is an experimental feature. Currently, this feature is still not stable.

ExtractorJSON extracts values from JSON data by a JSONPath expression. You can use this extractor to index only the values you need from JSON without indexing the whole JSON text.

The extracted values keep their JSON types. For example, a string in JSON is extracted as a text and an integer in JSON is extracted as an integer. So you can use ExtractorJSON for both full text search against strings and range search against numbers.

注釈

ExtractorJSON requires the jsoncons library. It isn't available if Groonga is built without jsoncons.

7.7.2.2.2. Syntax#

ExtractorJSON has a required path parameter:

ExtractorJSON("path", "JSONPath")

7.7.2.2.3. Usage#

Here is an example that extracts all elements of the tags array by the $.tags[*] JSONPath. The title value isn't extracted because it doesn't match the JSONPath:

Execution example:

extract \
  --extractors 'ExtractorJSON("path", "$.tags[*]")' \
  --value '{"tags": ["groonga", "search", "engine"], "title": "ignored"}'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "extracted": [
#       "groonga",
#       "search",
#       "engine"
#     ]
#   }
# ]

A JSONPath can match nested values. The following example extracts all elements of nested arrays by the $.values[*][*] JSONPath:

Execution example:

extract \
  --extractors 'ExtractorJSON("path", "$.values[*][*]")' \
  --value '{"values": [[1, 10], [100, 1000]]}'
# [[0,1337566253.89858,0.000355720520019531],{"extracted":[1,10,100,1000]}]

When you attach ExtractorJSON to a lexicon, the lexicon indexes the extracted values. The original JSON is kept in the data column.

The following example indexes integers in a JSON column. The lexicon key type is Int32 because the extracted values are integers. The index is used automatically when the JSON column is loaded, so you can search the original records by the extracted values:

Execution example:

table_create Data TABLE_NO_KEY
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Data value COLUMN_SCALAR JSON
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Numbers TABLE_PAT_KEY Int32 \
  --extractors 'ExtractorJSON("path", "$.value[*][*]")'
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Numbers data_value COLUMN_INDEX Data value
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Data
[
{"value": "{\"value\": [[1, 10], [100]]}"},
{"value": "{\"value\": [[2], [20, 200]]}"},
{"value": "{\"value\": [[-1, -10], [-100]]}"}
]
# [[0,1337566253.89858,0.000355720520019531],3]
select Data --filter 'between(Numbers.data_value, 10, 20)'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "value",
#           "JSON"
#         ]
#       ],
#       [
#         1,
#         {
#           "value": [
#             [
#               1,
#               10
#             ],
#             [
#               100
#             ]
#           ]
#         }
#       ],
#       [
#         2,
#         {
#           "value": [
#             [
#               2
#             ],
#             [
#               20,
#               200
#             ]
#           ]
#         }
#       ]
#     ]
#   ]
# ]

The records whose extracted values are between 10 and 20 are selected even though the values are stored in nested arrays in JSON.

7.7.2.2.4. Parameters#

7.7.2.2.4.1. Required parameter#

7.7.2.2.4.1.1. path#

Specifies a JSONPath expression that matches the values to extract.

A JSONPath expression begins with $, which means the root of the JSON value. For example, $.tags[*] matches all elements of the tags array, and $.values[*][*] matches all elements of nested arrays under values.

This parameter is required. ExtractorJSON reports an error if this parameter is missing.

7.7.2.2.5. See also#