7.3.57. select
#
7.3.57.1. Summary#
select
searches records that are matched to specified conditions
from a table and then outputs them.
select
is the most important command in groonga. You need to
understand select
to use the full power of Groonga.
7.3.57.2. Syntax#
This command takes many parameters.
The required parameter is only table
. Other parameters are
optional:
select table
[match_columns=null]
[query=null]
[filter=null]
[scorer=null]
[sortby=null]
[output_columns="_id, _key, *"]
[offset=0]
[limit=10]
[drilldown=null]
[drilldown_sortby=null]
[drilldown_output_columns="_key, _nsubrecs"]
[drilldown_offset=0]
[drilldown_limit=10]
[cache=yes]
[match_escalation_threshold=0]
[query_expansion=null]
[query_flags=ALLOW_PRAGMA|ALLOW_COLUMN]
[query_expander=null]
[adjuster=null]
[drilldown_calc_types=NONE]
[drilldown_calc_target=null]
[drilldown_filter=null]
[sort_keys=null]
[drilldown_sort_keys=null]
[match_escalation=auto]
[load_table=null]
[load_columns=null]
[load_values=null]
[drilldown_max_n_target_records=-1]
[n_workers=0]
[fuzzy_max_distance_ratio=0]
[fuzzy_max_distance=0]
[fuzzy_max_expansions=10]
[fuzzy_prefix_length=0]
[fuzzy_with_transposition=yes]
[fuzzy_tokenize=no]
This command has the following named parameters for dynamic columns:
columns[${NAME}].stage=null
columns[${NAME}].flags=COLUMN_SCALAR
columns[${NAME}].type=null
columns[${NAME}].value=null
columns[${NAME}].window.sort_keys=null
columns[${NAME}].window.group_keys=null
You can use one or more alphabets, digits, _
for ${NAME}
. For
example, column1
is a valid ${NAME}
. This is the same rule as
normal column. See also name.
Parameters that have the same ${NAME}
are grouped.
For example, the following parameters specify one dynamic column:
--columns[name].stage initial
--columns[name].type UInt32
--columns[name].value 29
The following parameters specify two dynamic columns:
--columns[name1].stage initial
--columns[name1].type UInt32
--columns[name1].value 29
--columns[name2].stage filtered
--columns[name2].type Float
--columns[name2].value '_score * 0.1'
This command has the following named parameters for advanced drilldown:
drilldowns[${LABEL}].keys=null
drilldowns[${LABEL}].sort_keys=null
drilldowns[${LABEL}].output_columns="_key, _nsubrecs"
drilldowns[${LABEL}].offset=0
drilldowns[${LABEL}].limit=10
drilldowns[${LABEL}].calc_types=NONE
drilldowns[${LABEL}].calc_target=null
drilldowns[${LABEL}].filter=null
drilldowns[${LABEL}].max_n_target_records=-1
drilldowns[${LABEL}].columns[${NAME}].stage=null
drilldowns[${LABEL}].columns[${NAME}].flags=COLUMN_SCALAR
drilldowns[${LABEL}].columns[${NAME}].type=null
drilldowns[${LABEL}].columns[${NAME}].value=null
drilldowns[${LABEL}].columns[${NAME}].window.sort_keys=null
drilldowns[${LABEL}].columns[${NAME}].window.group_keys=null
Deprecated since version 6.0.3: drilldown[...]
syntax is deprecated, Use drilldowns[...]
instead.
You can use one or more alphabets, digits, _
and .
for
${LABEL}
. For example, parent.sub1
is a valid ${LABEL}
.
Parameters that have the same ${LABEL}
are grouped.
For example, the following parameters specify one drilldown:
--drilldowns[label].keys column
--drilldowns[label].sort_keys -_nsubrecs
The following parameters specify two drilldowns:
--drilldowns[label1].keys column1
--drilldowns[label1].sort_keys -_nsubrecs
--drilldowns[label2].keys column2
--drilldowns[label2].sort_keys _key
7.3.57.3. Usage#
Let’s learn about select
usage with examples. This section shows
many popular usages.
Here are a schema definition and sample data to show usage.
Execution example:
table_create Entries TABLE_HASH_KEY ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries content COLUMN_SCALAR Text
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries tag COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Entries
[
{"_key": "The first post!",
"content": "Welcome! This is my first post!",
"n_likes": 5,
"tag": "Hello"},
{"_key": "Groonga",
"content": "I started to use Groonga. It's very fast!",
"n_likes": 10,
"tag": "Groonga"},
{"_key": "Mroonga",
"content": "I also started to use Mroonga. It's also very fast! Really fast!",
"n_likes": 15,
"tag": "Groonga"},
{"_key": "Good-bye Senna",
"content": "I migrated all Senna system!",
"n_likes": 3,
"tag": "Senna"},
{"_key": "Good-bye Tritonn",
"content": "I also migrated all Tritonn system!",
"n_likes": 3,
"tag": "Senna"}
]
# [[0,1337566253.89858,0.000355720520019531],5]
There is a table, Entries
, for blog entries. An entry has title,
content, the number of likes for the entry and tag. Title is key of
Entries
. Content is value of Entries.content
column. The
number of likes is value of Entries.n_likes
column. Tag is value
of Entries.tag
column.
Entries._key
column and Entries.content
column are indexed
using TokenBigram
tokenizer. So both Entries._key
and
Entries.content
are fulltext search ready.
OK. The schema and data for examples are ready.
7.3.57.3.1. Simple usage#
Here is the most simple usage with the above schema and data. It outputs
all records in Entries
table.
Execution example:
select Entries
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]
Why does the command output all records? There are two reasons. The
first reason is that the command doesn’t specify any search
conditions. No search condition means all records are matched. The
second reason is that the number of all records is 5. select
command outputs 10 records at a maximum by default. There are only 5
records. It is less than 10. So the command outputs all records.
7.3.57.3.2. Search conditions#
Search conditions are specified by query
or filter
. You can
also specify both query
and filter
. It means that selected
records must be matched against both query
and filter
.
7.3.57.3.2.1. Search condition: query
#
query
is designed for search box in Web page. Imagine a search box
in google.com. You specify search conditions for query
as space
separated keywords. For example, search engine
means a matched
record should contain two words, search
and engine
.
Normally, query
parameter is used for specifying fulltext search
conditions. It can be used for non fulltext search conditions but
filter
is used for the propose.
query
parameter is used with match_columns
parameter when
query
parameter is used for specifying fulltext search
conditions. match_columns
specifies which columns and indexes are
matched against query
.
Here is a simple query
usage example.
Execution example:
select Entries --match_columns content --query fast
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
The select
command searches records that contain a word fast
in content
column value from Entries
table.
query
has query syntax but its details aren’t described here. See
Query syntax for details.
7.3.57.3.2.2. Search condition: filter
#
filter
is designed for complex search conditions. You specify
search conditions for filter
as ECMAScript like syntax.
Here is a simple filter
usage example.
Execution example:
select Entries --filter 'content @ "fast" && _key == "Groonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]
The select
command searches records that contain a word fast
in content
column value and has Groonga
as _key
from
Entries
table. There are three operators in the command, @
,
&&
and ==
. @
is fulltext search operator. &&
and
==
are the same as ECMAScript. &&
is logical AND operator and
==
is equality operator.
filter
has more operators and syntax like grouping by (...)
its details aren’t described here. See
Script syntax for details.
7.3.57.3.3. Paging#
You can specify range of outputted records by offset
and limit
.
Here is an example to output only the 2nd record.
Execution example:
select Entries --offset 1 --limit 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]
offset
is zero-based. --offset 1
means output range is
started from the 2nd record.
limit
specifies the max number of output records. --limit 1
means the number of output records is 1 at a maximum. If no records
are matched, select
command outputs no records.
7.3.57.3.4. The total number of records#
You can use --limit 0
to retrieve the total number of records
without any contents of records.
Execution example:
select Entries --limit 0
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ]
# ]
# ]
--limit 0
is also useful for retrieving only the number of matched
records.
7.3.57.3.5. Drilldown#
You can get additional grouped results against the search result in
one select
. You need to use two or more SELECT
s in SQL but
select
in Groonga can do it in one select
.
This feature is called as drilldown in Groonga. It’s also called as faceted search in other search engine.
For example, think about the following situation.
You search entries that has fast
word:
Execution example:
select Entries --filter 'content @ "fast"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
You want to use tag
for additional search condition like
--filter 'content @ "fast" && tag == "???"
. But you don’t know
suitable tag until you see the result of content @ "fast"
.
If you know the number of matched records of each available tag, you can choose suitable tag. You can use drilldown for the case:
Execution example:
select Entries --filter 'content @ "fast"' --drilldown tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ]
# ]
# ]
# ]
--drilldown tag
returns a list of pair of available tag and the
number of matched records. You can avoid “no hit search” case by
choosing a tag from the list. You can also avoid “too many search
results” case by choosing a tag that the number of matched records is
few from the list.
You can create the following UI with the drilldown results:
Links to narrow search results. (Users don’t need to input a search query by their keyboard. They just click a link.)
Most EC sites use the UI. See side menu at Amazon.
Groonga supports not only counting grouped records but also finding the maximum and/or minimum value from grouped records, summing values in grouped records and so on. See Drilldown related parameters for details.
7.3.57.3.6. Dynamic column#
You can create zero or more columns dynamically while a select
execution. You can use them for drilldown by computed value, window
function and so on.
Here is an example that uses dynamic column for drilldown by computed
value. This example creates a new column named
n_likes_class
. n_likes_class
column has classified value of
Entry.n_likes
value. This example classifies Entry.n_likes
column value 10
step and the lowest number in the class is the
classified value. If a Entry.n_likes
value is between 0
and
9
such as 3
and 5
, n_likes_class
value (classified
value) is 0
. If Entry.n_likes
value is between 10
and
19
such as 10
and 15
, n_likes_class
value (classified
value) is 10
.
You can use number_classify function for
the classification. You need to register functions/number
plugin
by plugin_register command to use
number_classify function.
This example does drilldown by n_likes_class
value. The drilldown
result will help you to know data trend.
Execution example:
plugin_register functions/number
# [[0,1337566253.89858,0.000355720520019531],true]
select \
--table Entries \
--columns[n_likes_class].stage initial \
--columns[n_likes_class].type UInt32 \
--columns[n_likes_class].value 'number_classify(n_likes, 10)' \
--drilldown n_likes_class \
--drilldown_sort_keys _nsubrecs \
--output_columns n_likes,n_likes_class
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "n_likes",
# "UInt32"
# ],
# [
# "n_likes_class",
# "UInt32"
# ]
# ],
# [
# 5,
# 0
# ],
# [
# 10,
# 10
# ],
# [
# 15,
# 10
# ],
# [
# 3,
# 0
# ],
# [
# 3,
# 0
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 10,
# 2
# ],
# [
# 0,
# 3
# ]
# ]
# ]
# ]
See Dynamic column related parameters for details.
7.3.57.3.7. Window function#
You can compute each record value from values of grouped records. For example, you can compute sums of each group and puts sums to each record. The difference against drilldown is drilldown can compute sums of each group but it puts sums to each group not record.
Here is the result with window function. Each record has sum:
Group No. |
Target value |
Sum result |
---|---|---|
1 |
5 |
5 |
2 |
10 |
25 |
2 |
15 |
25 |
3 |
3 |
8 |
3 |
5 |
8 |
Here is the result with drilldown. Each group has sum:
Group No. |
Target values |
Sum result |
---|---|---|
1 |
5 |
5 |
2 |
10, 15 |
25 |
3 |
3, 5 |
8 |
Window function is useful for data analysis.
Here is an example that sums Entries.n_likes
per
Entries.tag
:
Execution example:
plugin_register functions/number
# [[0,1337566253.89858,0.000355720520019531],true]
select \
--table Entries \
--columns[n_likes_sum_per_tag].stage initial \
--columns[n_likes_sum_per_tag].type UInt32 \
--columns[n_likes_sum_per_tag].value 'window_sum(n_likes)' \
--columns[n_likes_sum_per_tag].window.group_keys tag \
--output_columns tag,n_likes,n_likes_sum_per_tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "tag",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "n_likes_sum_per_tag",
# "UInt32"
# ]
# ],
# [
# "Hello",
# 5,
# 5
# ],
# [
# "Groonga",
# 10,
# 25
# ],
# [
# "Groonga",
# 15,
# 25
# ],
# [
# "Senna",
# 3,
# 6
# ],
# [
# "Senna",
# 3,
# 6
# ]
# ]
# ]
# ]
See Window function related parameters for details.
7.3.57.3.8. Typo tolerance#
You can implement typo tolerance search by specifying how many characters to be accepted as typo. If no records are matched by the given query, Groonga searches with typo fixed query again automatically.
The number of accepted typo characters is 0 by default. So typo tolerance search isn’t enabled by default.
You can enable typo tolerance search by specifying
fuzzy_max_distance_ratio or
fuzzy_max_distance. In general,
--fuzzy_max_distance_ratio 0.34
will be a good parameter.
fuzzy_max_distance_ratio specifies how many typo characters is accepted based on the number of characters of each input term.
Here is a table that shows how many characters are accepted as typo
with --fuzzy_max_distance_ratio 0.34
:
The number of characters of a term |
The number of accepted typo characters |
---|---|
1 |
0 ( |
2 |
0 ( |
3 |
1 ( |
4 |
1 ( |
5 |
1 ( |
6 |
2 ( |
In other words, Groonga doesn’t accept any typo for a short term (0-2 characters term), accepts 1 typo for a middle term (3-5 characters term) and accepts 2 or more typos for a long term (6- characters term).
Here is an example that shows that we can search Groonga
with
Moronga
(2 typos):
Execution example:
select \
--table Entries \
--fuzzy_max_distance_ratio 0.34 \
--match_columns content \
--query Moronga \
--output_columns content,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "I started to use Groonga. It's very fast!",
# 1
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 2
# ]
# ]
# ]
# ]
You can specify the fixed number of typo accept characters by
fuzzy_max_distance. For example, Groonga accepts 2
characters for all terms with --fuzzy_max_distance 2
. But
--fuzzy_max_distance_ratio
will be better for many use cases.
You need correct terms for typo tolerance search. Groonga uses terms
in a lexicon as correct terms. Terms
is a lexicon for this
case. Terms in a lexicon are generated by a tokenizer. If your data is
alphabet based language such as English, you can use
TokenNgram. Because
TokenNgram tokenizes a text to (almost)
words for alphabet based languages. If your data isn’t alphabet based
language such as Japanese, you can’t use
TokenNgram. Because
TokenNgram tokenizes a text to
N-characters for non alphabet based languages. You need to use
morphological analyzer based tokenizer for non alphabet based
languages. For example, you can use
TokenMecab for Japanese. (You can use
TokenMecab for non Japanese languages
with suitable dictionary.)
Here is an example to use typo tolerance search with Japanese
text. --default_tokenizer TokenMecab
for JapaneseTerms
is
important. JapaneseTerms
is a lexicon for this case:
.. groonga-command
.. include:: ../../example/reference/commands/select/usage_typo_tolerance_japanese.log
.. table_create JapaneseEntries TABLE_NO_KEY
.. column_create JapaneseEntries content COLUMN_SCALAR Text
.. table_create JapaneseTerms TABLE_PAT_KEY ShortText \
.. --default_tokenizer TokenMecab \
.. --normalizer NormalizerNFKC150
.. column_create JapaneseTerms japanese_entries_content \
.. COLUMN_INDEX|WITH_POSITION JapaneseEntries content
.. load --table JapaneseEntries
.. [
.. {"content": "ようこそ!これが最初の投稿です!"},
.. {"content": "Groongaを使い始めました。とても速いですね!"},
.. {"content": "Mroongaも使い始めました。これもとても速いですね!本当に速い!"},
.. {"content": "Sennaのシステムをすべて移行しました!"},
.. {"content": "Tritonnのシステムもすべて移行しました!"}
.. ]
.. select \
.. --table JapaneseEntries \
.. --fuzzy_max_distance_ratio 0.34 \
.. --match_columns content \
.. --query ともて \
.. --output_columns content,_score
See Fuzzy query related parameters for details.
7.3.57.4. Parameters#
This section describes all parameters. Parameters are categorized.
7.3.57.4.1. Required parameters#
There is a required parameter, table
.
7.3.57.4.1.1. table
#
Specifies a table to be searched. table
must be specified.
If nonexistent table is specified, an error is returned.
Execution example:
select Nonexistent
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "[select][table] invalid name: <Nonexistent>",
# [
# [
# "execute",
# "lib/proc/proc_select.cpp",
# 2929
# ]
# ]
# ]
# ]
7.3.57.4.3. Advanced search parameters#
7.3.57.4.3.1. match_escalation_threshold
#
Added in version 8.0.1.
Specifies threshold to determine whether search strategy escalation is used or not. The threshold is compared against the number of matched records. If the number of matched records is equal to or less than the threshold, the search strategy escalation is used. See 検索 about the search strategy escalation.
The default threshold is 0. It means that search strategy escalation is used only when no records are matched.
The default threshold can be customized by one of the followings.
--with-match-escalation-threshold
option of configure
--match-escalation-threshold
option of groonga command
match-escalation-threshold
configuration item in configuration file
Here is a simple match_escalation_threshold
usage example. The
first select
doesn’t have match_escalation_threshold
parameter. The second select
has match_escalation_threshold
parameter.
Execution example:
select Entries --match_columns content --query groo
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]
select Entries --match_columns content --query groo --match_escalation_threshold -1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ]
# ]
# ]
The first select
command searches records that contain a word
groo
in content
column value from Entries
table. But no
records are matched because the TokenBigram
tokenizer tokenizes
groonga
to groonga
not gr|ro|oo|on|ng|ga
. (The
TokenBigramSplitSymbolAlpha
tokenizer tokenizes groonga
to
gr|ro|oo|on|ng|ga
. See Tokenizers for details.)
It means that groonga
is indexed but groo
isn’t indexed. So no
records are matched against groo
by exact match. In the case, the
search strategy escalation is used because the number of matched
records (0) is equal to match_escalation_threshold
(0). One record
is matched against groo
by unsplit search.
The second select
command also searches records that contain a
word groo
in content
column value from Entries
table. And
it also doesn’t find matched records. In this case, the search
strategy escalation is not used because the number of matched
records (0) is larger than match_escalation_threshold
(-1). So no
more searches aren’t executed. And no records are matched.
7.3.57.4.3.2. match_escalation
#
Specifies how to use match escalation. See also match_escalation and 検索 about the match escalation.
Here are available values:
Value |
Description |
---|---|
|
Groonga uses match_escalation_threshold to determine whether match escalation is used or not. This is the default. |
|
Groonga always uses match escalation. |
|
Groonga never use match escalation. |
--match_escalation yes
is stronger than
--match_escalation_threshold 9999...999
. --filter 'true &&
column @ "query"
with --match_escalation yes
uses match
escalation. --filter 'true && column @ "query"
with
--match_escalation_threshold 9999...999
doesn’t use match
escalation.
Here is a simple match_escalation
usage example. The first
select
doesn’t have match_escalation
parameter. The
second select
has match_escalation
parameter.
Execution example:
select Entries --filter 'true && content @ "groo"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ]
# ]
# ]
select Entries --filter 'true && content @ "groo"' --match_escalation yes
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]
The first select
command searches records that contain a word
groo
in content
column value from Entries
table. But no
records are matched because the TokenBigram
tokenizer tokenizes
groonga
to groonga
not gr|ro|oo|on|ng|ga
.
The second select
command also searches records that contain a
word groo
in content
column value from Entries
table. And
it uses match escalation. So it can find matched records.
7.3.57.4.3.3. query_expansion
#
Deprecated since version 3.0.2: Use query_expander instead.
7.3.57.4.3.4. query_flags
#
It customs query
parameter syntax. You cannot update column value
by query
parameter by default. But if you specify
ALLOW_COLUMN|ALLOW_UPDATE
as query_flags
, you can update
column value by query
.
Here are available values:
ALLOW_PRAGMA
ALLOW_COLUMN
ALLOW_UPDATE
ALLOW_LEADING_NOT
QUERY_NO_SYNTAX_ERROR
NONE
ALLOW_PRAGMA
enables pragma at the head of query
. This is not
implemented yet.
ALLOW_COLUMN
enables search against columns that are not included
in match_columns
. To specify column, there are COLUMN:...
syntaxes.
ALLOW_UPDATE
enables column update by query
with
COLUMN:=NEW_VALUE
syntax. ALLOW_COLUMN
is also required to
update column because the column update syntax specifies column.
ALLOW_LEADING_NOT
enables leading NOT condition with -WORD
syntax. The query searches records that doesn’t match
WORD
. Leading NOT condition query is heavy query in many cases
because it matches many records. So this flag is disabled by
default. Be careful about it when you use the flag.
QUERY_NO_SYNTAX_ERROR
enables never causes syntax error for query.
This flag is useful when an application uses user input directly and doesn’t want to show syntax error to the user and in a log.
This flag is disabled by default.
NONE
is just ignores. You can use NONE
for specifying no flags.
They can be combined by separated |
such as
ALLOW_COLUMN|ALLOW_UPDATE
.
The default value is ALLOW_PRAGMA|ALLOW_COLUMN
.
Here is a usage example of ALLOW_COLUMN
.
Execution example:
select Entries --query content:@mroonga --query_flags ALLOW_COLUMN
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
The select
command searches records that contain mroonga
in
content
column value from Entries
table.
Here is a usage example of ALLOW_UPDATE
.
Execution example:
table_create Users TABLE_HASH_KEY ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Users age COLUMN_SCALAR UInt32
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Users
[
{"_key": "alice", "age": 18},
{"_key": "bob", "age": 20}
]
# [[0,1337566253.89858,0.000355720520019531],2]
select Users --query age:=19 --query_flags ALLOW_COLUMN|ALLOW_UPDATE
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt32"
# ]
# ],
# [
# 1,
# "alice",
# 19
# ],
# [
# 2,
# "bob",
# 19
# ]
# ]
# ]
# ]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt32"
# ]
# ],
# [
# 1,
# "alice",
# 19
# ],
# [
# 2,
# "bob",
# 19
# ]
# ]
# ]
# ]
The first select
command sets age
column value of all records
to 19
. The second select
command outputs updated age
column values.
Here is a usage example of ALLOW_LEADING_NOT
.
Execution example:
select Entries --match_columns content --query -mroonga --query_flags ALLOW_LEADING_NOT
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]
The select
command searches records that don’t contain mroonga
in content
column value from Entries
table.
Here are a schema definition and sample data to describe other flags:
Execution example:
table_create --name Magazine --flags TABLE_HASH_KEY --key_type ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create --table Magazine --name title --type ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Magazine
[
{"_key":"http://test.jp/magazine/webplus","title":"WEB+"},
{"_key":"http://test.jp/magazine/database","title":"DataBase"},
]
# [[0,1337566253.89858,0.000355720520019531],2]
Here is an example of QUERY_NO_SYNTAX_ERROR
:
Execution example:
select Magazine --match_columns title --query 'WEB +' --query_flags ALLOW_PRAGMA|ALLOW_COLUMN|QUERY_NO_SYNTAX_ERROR
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://test.jp/magazine/webplus",
# "WEB+"
# ]
# ]
# ]
# ]
If you don’t specify this flag, the query causes a syntax error as below.
Execution example:
select Magazine --match_columns title --query 'WEB +' --query_flags ALLOW_PRAGMA|ALLOW_COLUMN
# [
# [
# -63,
# 1337566253.89858,
# 0.000355720520019531,
# "Syntax error: <WEB +||>",
# [
# [
# "yy_syntax_error",
# "grn_ecmascript.lemon",
# 2929
# ]
# ]
# ]
# ]
Here is a usage example of NONE
.
Execution example:
select Entries --match_columns content --query 'mroonga OR _key:Groonga' --query_flags NONE
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
The select
command searches records that contain one of two words
mroonga
or _key:Groonga
in content
from Entries
table.
Note that _key:Groonga
doesn’t mean that the value of _key
column is equal to Groonga
. Because ALLOW_COLUMN
flag is not
specified.
See also Query syntax.
7.3.57.4.3.5. query_expander
#
It’s for query expansion. Query expansion substitutes specific words to another words in query. Normally, it’s used for synonym search.
It specifies a column that is used to substitute query
parameter
value. The format of this parameter value is
“${TABLE}.${COLUMN}
”. For example, “Terms.synonym
” specifies
synonym
column in Terms
table.
Table for query expansion is called “substitution table”. Substitution
table’s key must be ShortText
. So array table (TABLE_NO_KEY
)
can’t be used for query expansion. Because array table doesn’t have
key.
Column for query expansion is called “substitution
column”. Substitution column’s value type must be
ShortText
. Column type must be vector (COLUMN_VECTOR
).
Query expansion substitutes key of substitution table in query with
values in substitution column. If a word in query
is a key of
substitution table, the word is substituted with substitution column
value that is associated with the key. Substitution isn’t performed
recursively. It means that substitution target words in substituted
query aren’t substituted.
Here is a sample substitution table to show a simple
query_expander
usage example.
Execution example:
table_create Thesaurus TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Thesaurus synonym COLUMN_VECTOR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Thesaurus
[
{"_key": "mroonga", "synonym": ["mroonga", "tritonn", "groonga mysql"]},
{"_key": "groonga", "synonym": ["groonga", "senna"]}
]
# [[0,1337566253.89858,0.000355720520019531],2]
Thesaurus
substitution table has two synonyms, "mroonga"
and
"groonga"
. If an user searches with "mroonga"
, Groonga
searches with "((mroonga) OR (tritonn) OR (groonga mysql))"
. If an
user searches with "groonga"
, Groonga searches with "((groonga)
OR (senna))"
.
Normally, it’s good idea that substitution table uses a normalizer. For example, if normalizer is used, substitute target word is matched in case insensitive manner. See Normalizers for available normalizers.
Note that those synonym values include the key value such as
"mroonga"
and "groonga"
. It’s recommended that you include the
key value. If you don’t include key value, substituted value doesn’t
include the original substitute target value. Normally, including the
original value is better search result. If you have a word that you
don’t want to be searched, you should not include the original
word. For example, you can implement “stop words” by an empty vector
value.
Here is a simple query_expander
usage example.
Execution example:
select Entries --match_columns content --query "mroonga"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
select Entries --match_columns content --query "mroonga" --query_expander Thesaurus.synonym
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]
select Entries --match_columns content --query "((mroonga) OR (tritonn) OR (groonga mysql))"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]
The first select
command doesn’t use query expansion. So a record
that has "tritonn"
isn’t found. The second select
command uses
query expansion. So a record that has "tritonn"
is found. The
third select
command doesn’t use query expansion but it is same as
the second select
command. The third one uses expanded query.
Each substitute value can contain any Query syntax syntax
such as (...)
and OR
. You can use complex substitution by
using those syntax.
Here is a complex substitution usage example that uses query syntax.
Execution example:
load --table Thesaurus
[
{"_key": "popular", "synonym": ["popular", "n_likes:>=10"]}
]
# [[0,1337566253.89858,0.000355720520019531],1]
select Entries --match_columns content --query "popular" --query_expander Thesaurus.synonym
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
The load
command registers a new synonym "popular"
. It is
substituted with ((popular) OR (n_likes:>=10))
. The substituted
query means that “popular” is containing the word “popular” or 10 or
more liked entries.
The select
command outputs records that n_likes
column value
is equal to or more than 10
from Entries
table.
7.3.57.4.3.6. n_workers
#
Added in version 12.0.5.
Note
This is an experimental feature. Currently, this feature is still not stable.
This feature requires Command version 3 or later.
This feature requires that Apache Arrow is enabled in Groonga.
It depends on package provider whether Apache Arrow is enabled or not.
To check whether Apache Arrow is enabled, you can use status command that show the result of apache_arrow
is true
or not.
If Apache Arrow is disabled, you should build Groonga from the source code with enabling Apache Arrow following the steps in Install or request to enable Apache Arrow to the package provider.
drilldown , drilldowns and slices
are executed in parallel when this parameter is specified -1
or 2
or more.
In a default setting, drilldown
, drilldowns
and slices
are executed in serial.
In other words, a next process is executed after a current process is finished.
So, queries tend to take a long time if there are a lot of drilldown
, drilldowns
and slices
.
n_workers
enables to execute independent drilldown
, drilldowns
and slices
in parallel.
The execution time of the total sum of processes can be shourtend by executing them in parallel.
This parallel execution is done for each select
command.
“independent” means not using drilldowns.table
to reference the results of other drilldowns or slices.
If there are dependencies as same meaning as using drilldowns.table
, it wait for finish the dependent drilldowns or slices.
Therefore, the degree of parallelism is reduced if they have dependencies.
Executing in parallel means using multiple CPUs at the same time. If executing in parallel without free CPU resource, it may actually slow down the execution time. This is because they have to wait for the other process being executed by the target CPU to finish.
It depends on a system configuration whether or not there are free CPU resources and how many n_workers
should be specified.
For example, consider using Groonga HTTP server on a system with 6 CPUs.
Groonga HTTP server allocates 1 thread (= 1CPU) for each request.
When the average number of concurrent connections is 6, there are no free CPU resources because 6 CPUs are already in use. All the CPU is used to process each request.
When the average number of concurrent connections is 2, there are 4 free CPU resources because only 2 CPUs are already in use.
When specifying 2
for n_workers
, the select
command will use at most 3 CPUs, including the thread for processing requests.
Therefore, if two select
commands with 2
specified for n_workers
are requested at the same time,
they will use at most 6 CPUs in total and will be processed fastly by using all of the resources.
When specifying greater than 2
, the degree of parallelism can be higher than the CPU resources, so it may actually slow down the execution time.
n_workers
behaves as follows depending on the specified value.
When specifying
0
or1
Executes the select command in serial
When specifying
2
or moreExecutes the select command in parallel with at most the specified number of threads.
When specifying
-1
or lessExecutes the select command in parallel with the threads of at most the number of CPU cores.
The default value of this parameter is 0
.
It means that the select command is executed in serial in default.
Note
The default value can be changed by specifying the environment variable GRN_SELECT_N_WORKERS_DEFAULT
.
7.3.57.5. Return value#
The command returns a response with the following format:
[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT_1,
DRILLDOWN_RESULT_2,
...,
DRILLDOWN_RESULT_N
]
]
If the command fails, error details are in HEADER
.
See Output format for HEADER
.
There are zero or more DRILLDOWN_RESULT
. If no drilldown
and
drilldowns[${LABEL}].keys
are specified, they are omitted like the
following:
[
HEADER,
[
SEARCH_RESULT
]
]
If drilldown
has two or more keys like --drilldown "_key,
column1, column2"
, multiple DRILLDOWN_RESULT
exist:
[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT_FOR_KEY,
DRILLDOWN_RESULT_FOR_COLUMN1,
DRILLDOWN_RESULT_FOR_COLUMN2
]
]
If drilldowns[${LABEL}].keys
is used, only one DRILLDOWN_RESULT
exist:
[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT_FOR_LABELED_DRILLDOWN
]
]
DRILLDOWN_RESULT
format is different between drilldown
and
drilldowns[${LABEL}].keys
. It’s described later.
SEARCH_RESULT
is the following format:
[
[N_HITS],
COLUMNS,
RECORDS
]
See Simple usage for concrete example of the format.
N_HITS
is the number of matched records before limit
is applied.
COLUMNS
describes about output columns specified by
output_columns. It uses the following format:
[
[COLUMN_NAME_1, COLUMN_TYPE_1],
[COLUMN_NAME_2, COLUMN_TYPE_2],
...,
[COLUMN_NAME_N, COLUMN_TYPE_N]
]
COLUMNS
includes one or more output column information. Each
output column information includes the followings:
Column name as string
Column type as string or
null
Column name is extracted from value specified as output_columns.
Column type is Groonga’s type name or null
. It doesn’t describe
whether the column value is vector or scalar. You need to determine it
by whether real column value is array or not.
See Data types for type details.
null
is used when column value type isn’t determined. For example,
function call in output_columns such as
--output_columns "snippet_html(content)"
uses null
.
Here is an example of COLUMNS
:
[
["_id", "UInt32"],
["_key", "ShortText"],
["n_likes", "UInt32"],
]
RECORDS
includes column values for each matched record. Included
records are selected by offset and
limit. It uses the following format:
[
[
RECORD_1_COLUMN_1,
RECORD_1_COLUMN_2,
...,
RECORD_1_COLUMN_N
],
[
RECORD_2_COLUMN_1,
RECORD_2_COLUMN_2,
...,
RECORD_2_COLUMN_N
],
...
[
RECORD_N_COLUMN_1,
RECORD_N_COLUMN_2,
...,
RECORD_N_COLUMN_N
]
]
Here is an example RECORDS
:
[
[
1,
"The first post!",
5
],
[
2,
"Groonga",
10
],
[
3,
"Mroonga",
15
]
]
DRILLDOWN_RESULT
format is different between drilldown
and
drilldowns[${LABEL}].keys
.
drilldown
uses the same format as SEARCH_RESULT
:
[
[N_HITS],
COLUMNS,
RECORDS
]
And drilldown
generates one or more DRILLDOWN_RESULT
when
drilldown has one ore more keys.
drilldowns[${LABEL}].keys
uses the following format. Multiple
drilldowns[${LABEL}].keys
are mapped to one object (key-value
pairs):
{
"LABEL_1": [
[N_HITS],
COLUMNS,
RECORDS
],
"LABEL_2": [
[N_HITS],
COLUMNS,
RECORDS
],
...,
"LABEL_N": [
[N_HITS],
COLUMNS,
RECORDS
]
}
Each drilldowns[${LABEL}].keys
corresponds to the following:
"LABEL": [
[N_HITS],
COLUMNS,
RECORDS
]
The following value part is the same format as SEARCH_RESULT
:
[
[N_HITS],
COLUMNS,
RECORDS
]
See also Output format for drilldowns[${LABEL}] style for
drilldowns[${LABEL}]
style drilldown output format.