7.3.25. extract#
7.3.25.1. Summary#
Added in version 16.0.3.
注釈
This is an experimental feature. Currently, this feature is still not stable.
extract command extracts plain text or values from structured data
such as HTML and JSON by the specified extractors.
There is no need to create a table to use extract command. It is
useful for you to check the results of extractors before you attach
them to a lexicon by the extractors option of
table_create.
See Extractors for details of extractors.
7.3.25.2. Syntax#
This command takes two parameters.
Both extractors and value are required:
extract extractors
value
7.3.25.3. Usage#
Here is an example that extracts text content from HTML by ExtractorHTML. It removes HTML tags and expands character references:
Execution example:
extract \
--extractors 'ExtractorHTML' \
--value "<html><body>He<ll>o</body></html>"
# [[0,1337566253.89858,0.000355720520019531],{"extracted":"He<ll>o"}]
Here is an example that extracts values from JSON by
ExtractorJSON. The $.tags[*] JSONPath
matches all elements of the tags array:
Execution example:
extract \
--extractors 'ExtractorJSON("path", "$.tags[*]")' \
--value '{"tags": ["groonga", "search", "engine"], "title": "ignored"}'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "extracted": [
# "groonga",
# "search",
# "engine"
# ]
# }
# ]
7.3.25.4. Parameters#
This section describes parameters of extract.
7.3.25.4.1. Required parameters#
There are required parameters, extractors and value.
7.3.25.4.1.1. extractors#
Specifies extractors separated by ,. extract command applies the
extractors to value in order. The output of an extractor is passed
to the next extractor as its input.
See Extractors for all extractors.
7.3.25.4.1.2. value#
Specifies the value that you want to extract plain text or values from.
If you want to include spaces in value, you need to quote value by
single quotation (') or double quotation (").
7.3.25.5. Return value#
[HEADER, {"extracted": EXTRACTED_VALUE}]
HEADERSee 出力形式 about
HEADER.EXTRACTED_VALUEThe value extracted by the specified extractors. It's a single value when the extractors return a single value such as ExtractorHTML. It's an array when the extractors return multiple values such as ExtractorJSON.