7.13.1. Query syntax#
Query syntax is a syntax to specify search condition for common Web
search form. It is similar to the syntax of Google’s search form. For
example, word1 word2
means that groonga searches records that
contain both word1
and word2
. word1 OR word2
means that
groonga searches records that contain either word1
or word2
.
Query syntax consists of Conditional expression ,
Combined expression and
Assignment expression. Normally
Assignment expression can be ignored. Because
Assignment expression is disabled in the
query option of select. You
can use it by specifying ALLOW_UPDATE
to the
query_flags option.
Conditional expression specifies an condition. Combined expression consists of one or more Conditional expression, Combined expression or Assignment expression. Assignment expression can assigns a value to a column.
7.13.1.1. Sample data#
Here are a schema definition and sample data to show usage.
Execution example:
table_create Entries TABLE_PAT_KEY ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries content COLUMN_SCALAR Text
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Entries
[
{"_key": "The first post!",
"content": "Welcome! This is my first post!",
"n_likes": 5},
{"_key": "Groonga",
"content": "I started to use Groonga. It's very fast!",
"n_likes": 10},
{"_key": "Mroonga",
"content": "I also started to use Mroonga. It's also very fast! Really fast!",
"n_likes": 15},
{"_key": "Good-bye Senna",
"content": "I migrated all Senna system!",
"n_likes": 3},
{"_key": "Good-bye Tritonn",
"content": "I also migrated all Tritonn system!",
"n_likes": 3}
]
# [[0,1337566253.89858,0.000355720520019531],5]
There is a table, Entries
, for blog entries. An entry has title,
content and the number of likes for the entry. Title is key of
Entries
. Content is value of Entries.content
column. The
number of likes is value of Entries.n_likes
column.
Entries._key
column and Entries.content
column are indexed
using TokenBigram
tokenizer. So both Entries._key
and
Entries.content
are fulltext search ready.
OK. The schema and data for examples are ready.
7.13.1.2. Escape#
There are special characters in query syntax. To use a special
character as itself, it should be escaped by prepending \
. For
example, "
is a special character. It is escaped as \"
.
Here is a special character list:
[space]
(escaped as[backslash][space]
) (You should substitute[space]
with a white space character that is 0x20 in ASCII and[backslash]
with\\
.)"
(escaped as\"
)(
(escaped as\(
))
(escaped as\)
)\
(escaped as\\
)
You can use quote instead of escape special characters except \
(backslash). You need to use backslash for escaping backslash like
\\
in quote.
Quote syntax is "..."
. You need escape "
as
\"
in "..."
quote syntax. For example, You say "Hello Alice!"
can be
quoted "You say \"Hello Alice!\""
.
In addition '...'
isn’t available in query syntax.
Note
There is an important point which you have to care. The \
(backslash) character is interpreted by command line shell. So if
you want to search (
itself for example, you need to escape
twice (\\(
) in command line shell. The command line shell
interprets \\(
as \(
, then pass such a literal to
Groonga. Groonga regards \(
as (
, then search (
itself
from database. If you can’t do intended search by Groonga, confirm
whether special character is escaped properly.
7.13.1.3. Conditional expression#
Here is available conditional expression list.
7.13.1.3.1. Full text search condition#
Its syntax is keyword
.
Full text search condition
specifies a full text search condition
against the default match columns. Match columns are full text search
target columns.
You should specify the default match columns for full text
search. They can be specified by --match_columns
option of
select. If you don’t specify the default match
columns, this conditional expression fails.
This conditional expression does full text search with
keyword
. keyword
should not contain any spaces. If keyword
contains a space such as search keyword
, it means two full text
search conditions; search
and keyword
. If you want to
specifies a keyword that contains one or more spaces, you can use
phrase search condition
that is described below.
Here is a simple example:
Execution example:
select Entries --match_columns content --query fast
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The expression matches records that contain a word fast
in
content
column value.
content
column is the default match column.
7.13.1.3.2. Phrase search condition#
Its syntax is "search keyword"
.
Phrase search condition
specifies a phrase search condition
against the default match columns.
You should specify the default match columns for full text
search. They can be specified by --match_columns
option of
select. If you don’t specify the default match
columns, this conditional expression fails.
This conditional expression does phrase search with search
keyword
. Phrase search searches records that contain search
and
keyword
and those terms are appeared in the same order and
adjacent. Thus, Put a search keyword in the form
is matched but
Search by the keyword
and There is a keyword. Search by it!
aren’t matched.
Here is a simple example:
Execution example:
select Entries --match_columns content --query '"I started"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]
The expression matches records that contain a phrase I started
in
content
column value. I also started
isn’t matched because
I
and started
aren’t adjacent.
content
column is the default match column.
7.13.1.3.3. Full text search condition (with explicit match column)#
Its syntax is column:@keyword
.
It’s similar to full text search condition
but it doesn’t require
the default match columns. You need to specify match column for the
full text search condition by column:
instead of
--match_columns
option of select.
This condtional expression is useful when you want to use two or more
full text search against different columns. The default match columns
specified by --match_columns
option can’t be specified multiple
times. You need to specify the second match column by this conditional
expression.
The different between full text search condition
and full text
search condition (with explicit match column)
is whether advanced
match columns are supported or not. Full text search condition
supports advanced match columns but full text search condition (with
explicit match column)
isn’t supported. Advanced match columns has
the following features:
Weight is supported.
Using multiple columns are supported.
Using index column as a match column is supported.
See description of --match_columns
option of
select about them.
Here is a simple example:
Execution example:
select Entries --query content:@fast
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The expression matches records that contain a word fast
in
content
column value.
7.13.1.3.4. Phrase search condition (with explicit match column)#
Its syntax is column:@"search keyword"
.
It’s similar to phrase search condition
but it doesn’t require the
default match columns. You need to specify match column for the phrase
search condition by column:
instead of --match_columns
option
of select.
The different between phrase search condition
and phrase search
condition (with explicit match column)
is similar to between full
text search condition
and full text search condition (with
explicit match column)
. Phrase search condition
supports
advanced match columns but phrase search condition (with explicit
match column)
isn’t supported. See description of full text search
condition (with explicit match column)
about advanced match columns.
Here is a simple example:
Execution example:
select Entries --query 'content:@"I started"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]
The expression matches records that contain a phrase I started
in
content
column value. I also started
isn’t matched because
I
and started
aren’t adjacent.
7.13.1.3.5. Prefix search condition#
Its syntax is column:^value
or value*
.
This conditional expression does prefix search with value
. Prefix
search searches records that contain a word that starts with value
.
You can use fast prefix search against a column. The column must be
indexed and index table must be patricia trie table
(TABLE_PAT_KEY
) or double array trie table
(TABLE_DAT_KEY
). You can also use fast prefix search against
_key
pseudo column of patricia trie table or double array trie
table. You don’t need to index _key
.
Prefix search can be used with other table types but it causes all records scan. It’s not problem for small records but it spends more time for large records.
It doesn’t require the default match columns such as full text
search condition
and phrase search condition
.
Here is a simple example:
Execution example:
select Entries --query '_key:^Goo'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ]
# ]
# ]
# ]
The expression matches records that contain a word that starts with
Goo
in _key
pseudo column value. Good-bye Senna
and
Good-bye Tritonn
are matched with the expression.
7.13.1.3.6. Suffix search condition#
Its syntax is column:$value
.
This conditional expression does suffix search with value
. Suffix
search searches records that contain a word that ends with value
.
You can use fast suffix search against a column. The column must be
indexed and index table must be patricia trie table
(TABLE_PAT_KEY
) with KEY_WITH_SIS
flag. You can also use fast
suffix search against _key
pseudo column of patricia trie table
(TABLE_PAT_KEY
) with KEY_WITH_SIS
flag. You don’t need to
index _key
. We recommended that you use index column based fast
suffix search instead of _key
based fast suffix search. _key
based fast suffix search returns automatically registered
substrings. (TODO: write document about suffix search and link to it
from here.)
Note
Fast suffix search can be used only for non-ASCII characters such as hiragana in Japanese. You cannot use fast suffix search for ASCII character.
Suffix search can be used with other table types or patricia trie
table without KEY_WITH_SIS
flag but it causes all records
scan. It’s not problem for small records but it spends more time for
large records.
It doesn’t require the default match columns such as full text
search condition
and phrase search condition
.
Here is a simple example. It uses fast suffix search for hiragana in Japanese that is one of non-ASCII characters.
Execution example:
table_create Titles TABLE_NO_KEY
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Titles content COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
table_create SuffixSearchTerms TABLE_PAT_KEY|KEY_WITH_SIS ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create SuffixSearchTerms index COLUMN_INDEX Titles content
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Titles
[
{"content": "ぐるんが"},
{"content": "むるんが"},
{"content": "せな"},
{"content": "とりとん"}
]
# [[0,1337566253.89858,0.000355720520019531],4]
select Titles --query 'content:$んが'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 2,
# "むるんが"
# ],
# [
# 1,
# "ぐるんが"
# ]
# ]
# ]
# ]
The expression matches records that have value that ends with んが
in content
column value. ぐるんが
and むるんが
are matched
with the expression.
7.13.1.3.7. Near search condition#
Its syntax is *N"token1 token2 ..."
.
This conditional expression does near search with token1
,
token2
and ...
. Near search searches records that contain the
all specified tokens and there are at most 10 tokens between them. For
example, *N"a b c"
matches a 1 2 3 4 5 b 6 7 8 9 10 c
but
doesn’t match a 1 2 3 4 5 b 6 7 8 9 10 11 c
:
Execution example:
table_create NearTokens TABLE_NO_KEY
# [[0,1337566253.89858,0.000355720520019531],true]
column_create NearTokens content COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
table_create NearTokenTerms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenNgram \
--normalizer NormalizerNFKC130
# [[0,1337566253.89858,0.000355720520019531],true]
column_create NearTokenTerms index COLUMN_INDEX|WITH_POSITION \
NearTokens content
# [[0,1337566253.89858,0.000355720520019531],true]
load --table NearTokens
[
{"content": "a 1 2 3 4 5 b 6 7 8 9 10 c"},
{"content": "a 1 2 3 4 5 b 6 7 8 9 10 11 c"},
{"content": "a 1 2 3 4 5 b 6 7 8 9 10 11 12 c"}
]
# [[0,1337566253.89858,0.000355720520019531],3]
select NearTokens --match_columns content --query '*N"a b c"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ]
# ]
# ]
# ]
Note that you must specify WITH_POSITION
to an index column that
is used for near search. If you don’t specify WITH_POSITION
, near
search can’t count distance correctly.
You can customize the max interval of the given tokens (10
by
default) by specifying a number after *N
. Here is an example that
uses 2
as the max interval of the given tokens:
*N2"..."
Here is an example to customize the max interval of the given tokens:
Execution example:
select NearTokens --match_columns content --query '*N11"a b c"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 2,
# "a 1 2 3 4 5 b 6 7 8 9 10 11 c"
# ]
# ]
# ]
# ]
To be precious, you can specify a word instead of a token for near
search. Because the passed text is tokenized before near search. A
word consists of one or more tokens. If you specify a word, it may not
work as you expected. For example, *N"a1b2c3d"
matches both a 1
b 2 c 3 d
and a b c d 1 2 3
:
Execution example:
load --table NearTokens
[
{"content": "groonga mroonga rroonga pgroonga"},
{"content": "groonga rroonga pgroonga mroonga"}
]
# [[0,1337566253.89858,0.000355720520019531],2]
select NearTokens \
--match_columns content \
--query '*NP"\\\"groonga mroonga\\\" pgroonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 6,
# "groonga mroonga rroonga pgroonga"
# ]
# ]
# ]
# ]
Because *N"a1b2c3d"
equals to *N"a 1 b 2 c 3 d"
.
If you want to specify words, Near phrase search condition is what you want.
New in version 12.0.1: The max intervals of each token.
You can specify the max intervals of each token. The default is no limit. It means that all intervals of each token are valid as long as the max interval is satisfied.
Here is an example that use 2
for the max interval of the first
interval and 4
for the max interval of the second interval:
*N10,2|4"a b c"
10
is the max interval.
|
is the separator of the max intervals of each token.
This matches a x b x x x c
. But this doesn’t match a x x b c
,
a b x x x x c
and so on because the former has 3
interval for
the first interval that is larger than 2
and the latter has 5
interval for the second interval that is later than 4
.
Here is an example that specifies the max intervals of each token:
Execution example:
select NearTokens --match_columns content --query '*N11,5|5"a b c"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ]
# ]
# ]
# ]
select NearTokens --match_columns content --query '*N11,5|6"a b c"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 2,
# "a 1 2 3 4 5 b 6 7 8 9 10 11 c"
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ]
# ]
# ]
# ]
You can omit one or more intervals. Omitted intervals are treated as
-1
. It means that *N11,5
equals *N11,5|-1
. -1
means
that no limit.
Here is an example that omits an interval:
Execution example:
select NearTokens --match_columns content --query '*N11,5"a b c"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 2,
# "a 1 2 3 4 5 b 6 7 8 9 10 11 c"
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ]
# ]
# ]
# ]
select NearTokens --match_columns content --query '*N11,5|-1"a b c"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 2,
# "a 1 2 3 4 5 b 6 7 8 9 10 11 c"
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ]
# ]
# ]
# ]
You can specify extra intervals. They are just ignored:
Execution example:
select NearTokens --match_columns content --query '*N11,5|5|1|1|1"a b c"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ]
# ]
# ]
# ]
select NearTokens --match_columns content --query '*N11,5|5"a b c"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ]
# ]
# ]
# ]
7.13.1.3.8. Near phrase search condition#
Its syntax is *NP"phrase1 phrase2 ..."
.
This conditional expression does near phrase search with phrase1
,
phrase2
and ...
. Near phrase search searches records that
contain the all specified phrases and there are at most 10 tokens
between them. For example, *NP"a1b2c3d"
matches a 1 b 2 c 3 d
but doesn’t match a b c d 1 2 3
. Because the latter uses different
order:
Execution example:
select NearTokens --match_columns content --query '*NP"a1b2c3d"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ]
# ]
# ]
# ]
You can use a phrase that includes spaces by quoting such as
*NP"\"groonga mroonga\" pgroonga"
. Note that you need to escape
\"
in command syntax such as *NP"\\\"groonga mroonga\\\"
pgroonga"
. This query matches groonga mroonga pgroonga
but
doesn’t match groonga pgroonga mroonga
because mroonga
isn’t
right after groonga
:
Execution example:
load --table NearTokens
[
{"content": "groonga mroonga rroonga pgroonga"},
{"content": "groonga rroonga pgroonga mroonga"}
]
# [[0,1337566253.89858,0.000355720520019531],2]
select NearTokens \
--match_columns content \
--query '*NP"\\\"groonga mroonga\\\" pgroonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 6,
# "groonga mroonga rroonga pgroonga"
# ]
# ]
# ]
# ]
You can customize the max interval of the given phrases (10
by
default) by specifying a number after *NP
. Here is an example that
uses 2
as the max interval of the given phrases:
*NP2"..."
Here is an example to customize the max interval of the given phrases:
Execution example:
select NearTokens --match_columns content --query '*NP1"groonga pgroonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 7,
# "groonga rroonga pgroonga mroonga"
# ]
# ]
# ]
# ]
You can use additional interval only for the last phrase. It means
that you can accept more distance only between the second to last
phrase and the last phrase. This is useful for implementing a near
phrase search in the same sentence. If you specify .
(sentence end
phrase) as the last phrase and specify -1
as the additional last
interval, the other specified phrases must be appeared before
.
. You must append $
to the last phrase like .$
.
Here is an example that uses -1
as the additional last interval of
the given phrases:
*NP10,-1"a b .$"
Here is an example to customize the additional last interval of the given phrases:
Execution example:
load --table NearTokens
[
{"content": "x 1 y 2 3 4 . x 1 2 y 3 z 4 5 6 7 ."},
{"content": "x 1 2 y 3 4 . x 1 2 y 3 z 4 5 6 7 ."},
{"content": "x 1 2 3 y 4 . x 1 y 2 z 3 4 5 6 7 ."},
]
# [[0,1337566253.89858,0.000355720520019531],3]
select NearTokens --match_columns content --query '*NP2,-1"x y .$"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 8,
# "x 1 y 2 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 9,
# "x 1 2 y 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 10,
# "x 1 2 3 y 4 . x 1 y 2 z 3 4 5 6 7 ."
# ]
# ]
# ]
# ]
You can also use positive number for the additional last interval. If you specify positive number as the additional last interval, all of the following conditions must be satisfied:
The interval between the first phrase and the second to last phrase is less than or equals to
the max interval
.The interval between the first phrase and the last phrase is less than or equals to
the max interval
+the additional last interval
.
If you specify negative number as the additional last interval, the second condition isn’t required. Appearing the last phrase is just needed.
Here is an example to use positive number as the additional last interval:
Execution example:
select NearTokens --match_columns content --query '*NP2,4"x y .$"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 8,
# "x 1 y 2 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 9,
# "x 1 2 y 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ]
# ]
# ]
# ]
New in version 12.0.1: The max intervals of each phrase.
You can also specify the max intervals of each phrase like Near search condition.
Here is an example:
Execution example:
select NearTokens --match_columns content --query '*NP11,0,5|5"a b c"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ]
# ]
# ]
# ]
select NearTokens --match_columns content --query '*NP11,0,5|6"a b c"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ]
# ]
# ]
# ]
7.13.1.3.9. Near phrase product search condition#
New in version 11.1.1.
Its syntax is *NPP"(phrase1_1 phrase1_2 ...) (phrase2_1 phrase2_2
...) ..."
.
This conditional expression does multiple
Near phrase search condition. Phrases for each
Near phrase search condition are computed as
product of {phrase1_1, phrase1_2, ...}
, {phrase2_1, phrase2_2,
...}
and ...
. For example, *NPP"(a b c) (d e)"
uses the
following phrases for near phrase searches:
a d
a e
b d
b e
c d
c e
Here is a simple example:
Execution example:
select NearTokens --match_columns content --query '*NPP"(a x) (b y)"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 8
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 2,
# "a 1 2 3 4 5 b 6 7 8 9 10 11 c"
# ],
# [
# 3,
# "a 1 2 3 4 5 b 6 7 8 9 10 11 12 c"
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ],
# [
# 8,
# "x 1 y 2 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 9,
# "x 1 2 y 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 10,
# "x 1 2 3 y 4 . x 1 y 2 z 3 4 5 6 7 ."
# ]
# ]
# ]
# ]
You can use the all features of
Near phrase search condition such as the max
interval, $
for the last phrase and the additional last
interval.
Execution example:
select NearTokens --match_columns content --query '*NPP2,-1"(a x) (b c y) (d$ .$)"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ],
# [
# 8,
# "x 1 y 2 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 9,
# "x 1 2 y 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 10,
# "x 1 2 3 y 4 . x 1 y 2 z 3 4 5 6 7 ."
# ]
# ]
# ]
# ]
New in version 12.0.1: The max intervals of each phrase.
You can also specify the max intervals of each phrase like Near search condition.
Here is an example:
Execution example:
select NearTokens --match_columns content --query '*NPP11,0,5|5"(a x) (b y) (c z)"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 6
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ],
# [
# 8,
# "x 1 y 2 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 9,
# "x 1 2 y 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 10,
# "x 1 2 3 y 4 . x 1 y 2 z 3 4 5 6 7 ."
# ]
# ]
# ]
# ]
select NearTokens --match_columns content --query '*NPP11,0,5|6"(a x) (b y) (c z)"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 6
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ],
# [
# 8,
# "x 1 y 2 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 9,
# "x 1 2 y 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 10,
# "x 1 2 3 y 4 . x 1 y 2 z 3 4 5 6 7 ."
# ]
# ]
# ]
# ]
This is more effective than multiple Near phrase search condition .
7.13.1.3.10. Ordered near phrase search condition#
New in version 11.0.9.
It’s syntax is *ONP"phrase1 phrase2 ..."
This conditional expression does ordered near phrase search with
phrase1
, phrase2
and ...
. Ordered near phrase search is
similar to Near phrase search condition but
ordered near phrase search checks phrases order. For example,
*ONP"groonga mroonga pgroonga"
matches groonga mroonga rroonga
pgroonga
but doesn’t match groonga rroonga pgroonga
mroonga
. Because the latter uses different order:
Execution example:
select NearTokens \
--match_columns content \
--query '*ONP"groonga mroonga pgroonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 6,
# "groonga mroonga rroonga pgroonga"
# ]
# ]
# ]
# ]
You can use the all features of
Near phrase search condition such as the max
interval and the additional last interval. But you don’t need to
specify $
for the last phrase because the last phrase in query is
the last phrase.
New in version 12.0.1: The max intervals of each phrase.
You can also specify the max intervals of each phrase like Near search condition.
7.13.1.3.11. Ordered near phrase product search condition#
New in version 11.1.1.
Its syntax is *ONPP"(phrase1_1 phrase1_2 ...) (phrase2_1 phrase2_2
...) ..."
.
This conditional expression does ordered near phrase product
search. Ordered near phrase product search is similar to
Near phrase product search condition but ordered
near phrase product search checks phrases order like
Ordered near phrase search condition. For example,
*ONPP"(a b c) (d e)"
matches a 1 d
but doesn’t match d 1
a
. Because the latter uses different order.
Here is a simple example:
Execution example:
select NearTokens \
--match_columns content \
--query '*ONPP"(a x) (b y)"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 8
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "a 1 2 3 4 5 b 6 7 8 9 10 c"
# ],
# [
# 2,
# "a 1 2 3 4 5 b 6 7 8 9 10 11 c"
# ],
# [
# 3,
# "a 1 2 3 4 5 b 6 7 8 9 10 11 12 c"
# ],
# [
# 4,
# "a 1 b 2 c 3 d"
# ],
# [
# 5,
# "a b c d 1 2 3"
# ],
# [
# 8,
# "x 1 y 2 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 9,
# "x 1 2 y 3 4 . x 1 2 y 3 z 4 5 6 7 ."
# ],
# [
# 10,
# "x 1 2 3 y 4 . x 1 y 2 z 3 4 5 6 7 ."
# ]
# ]
# ]
# ]
You can use the all features of
Near phrase search condition such as the max
interval and the additional last interval. But you don’t need to
specify $
for the last phrase because the last phrase in query is
the last phrase.
New in version 12.0.1: The max intervals of each phrase.
You can also specify the max intervals of each phrase like Near search condition.
7.13.1.3.12. Similar search condition#
TODO
7.13.1.3.13. Equal condition#
Its syntax is column:value
.
It matches records that column
value is equal to value
.
It doesn’t require the default match columns such as full text
search condition
and phrase search condition
.
Here is a simple example:
Execution example:
select Entries --query _key:Groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]
The expression matches records that _key
column value is
equal to Groonga
.
7.13.1.3.14. Not equal condition#
Its syntax is column:!value
.
It matches records that column
value isn’t equal to value
.
It doesn’t require the default match columns such as full text
search condition
and phrase search condition
.
Here is a simple example:
Execution example:
select Entries --query _key:!Groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
The expression matches records that _key
column value is not equal
to Groonga
.
7.13.1.3.15. Less than condition#
Its syntax is column:<value
.
It matches records that column
value is less than value
.
If column
type is numerical type such as Int32
, column
value and value
are compared as number. If column
type is text
type such as ShortText
, column
value and value
are
compared as bit sequence.
It doesn’t require the default match columns such as full text
search condition
and phrase search condition
.
Here is a simple example:
Execution example:
select Entries --query n_likes:<10
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is less
than 10
.
7.13.1.3.16. Greater than condition#
Its syntax is column:>value
.
It matches records that column
value is greater than value
.
If column
type is numerical type such as Int32
, column
value and value
are compared as number. If column
type is text
type such as ShortText
, column
value and value
are
compared as bit sequence.
It doesn’t require the default match columns such as full text
search condition
and phrase search condition
.
Here is a simple example:
Execution example:
select Entries --query n_likes:>10
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is greater
than 10
.
7.13.1.3.17. Less than or equal to condition#
Its syntax is column:<=value
.
It matches records that column
value is less than or equal to
value
.
If column
type is numerical type such as Int32
, column
value and value
are compared as number. If column
type is text
type such as ShortText
, column
value and value
are
compared as bit sequence.
It doesn’t require the default match columns such as full text
search condition
and phrase search condition
.
Here is a simple example:
Execution example:
select Entries --query n_likes:<=10
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is less
than or equal to 10
.
7.13.1.3.18. Greater than or equal to condition#
Its syntax is column:>=value
.
It matches records that column
value is greater than or equal to
value
.
If column
type is numerical type such as Int32
, column
value and value
are compared as number. If column
type is text
type such as ShortText
, column
value and value
are
compared as bit sequence.
It doesn’t require the default match columns such as full text
search condition
and phrase search condition
.
Here is a simple example:
Execution example:
select Entries --query n_likes:>=10
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is
greater than or equal to 10
.
7.13.1.3.19. Regular expression condition#
New in version 5.0.1.
Its syntax is column:~pattern
.
It matches records that column
value is matched to
pattern
. pattern
must be valid
Regular expression.
The following example uses .roonga
as pattern. It matches
Groonga
, Mroonga
and so on.
Execution example:
select Entries --query content:~.roonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
In most cases, regular expression is evaluated sequentially. So it may be slow against many records.
In some cases, Groonga evaluates regular expression by index. It’s very fast. See Regular expression for details.
7.13.1.4. Combined expression#
Here is available combined expression list.
7.13.1.4.1. Logical OR#
Its syntax is a OR b
.
a
and b
are conditional expressions, conbinded expressions or
assignment expressions.
If at least one of a
and b
are matched, a OR b
is matched.
Here is a simple example:
Execution example:
select Entries --query 'n_likes:>10 OR content:@senna'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is
greater than 10
or contain a word senna
in content
column
value.
7.13.1.4.2. Logical AND#
Its syntax is a + b
or just a b
.
a
and b
are conditional expressions, conbinded expressions or
assignment expressions.
If both a
and b
are matched, a + b
is matched.
You can specify +
the first expression such as +a
. The +
is just ignored.
Here is a simple example:
Execution example:
select Entries --query 'n_likes:>=10 + content:@groonga'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is
greater than or equal to 10
and contain a word groonga
in
content
column value.
7.13.1.4.3. Logical AND NOT#
Its syntax is a - b
.
a
and b
are conditional expressions, conbinded expressions or
assignment expressions.
If a
is matched and b
is not matched, a - b
is matched.
You can not specify -
the first expression such as -a
. It’s
syntax error.
Here is a simple example:
Execution example:
select Entries --query 'n_likes:>=10 - content:@groonga'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is
greater than or equal to 10
and don’t contain a word groonga
in
content
column value.
7.13.1.4.4. Grouping#
Its syntax is (...)
. ...
is space separated expression list.
(...)
groups one ore more expressions and they can be
processed as an expression. a b OR c
means that a
and b
are matched or c
is matched. a (b OR c)
means that a
and
one of b
and c
are matched.
Here is a simple example:
Execution example:
select Entries --query 'n_likes:<5 content:@senna OR content:@fast'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
select Entries --query 'n_likes:<5 (content:@senna OR content:@fast)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ]
# ]
# ]
# ]
The first expression doesn’t use grouping. It matches records that
n_likes:<5
and content:@senna
are matched or
content:@fast
is matched.
The second expression uses grouping. It matches records that
n_likes:<5
and one of content:@senna
or content:@fast
are matched.
7.13.1.5. Assignment expression#
This section is for advanced users. Because assignment expression is
disabled in --query
option of select by
default. You need to specify ALLOW_COLUMN|ALLOW_UPDATE
as
--query_flags
option value to enable assignment expression.
Assignment expression in query syntax has some limitations. So you should use Script syntax instead of query syntax for assignment.
There is only one syntax for assignment expression. It’s column:=value
.
value
is assigend to column
. value
is always processed as
string in query syntax. value
is casted to the type of column
automatically. It causes some limitations. For example, you cannot use
boolean literal such as true
and false
for Bool
type
column. You need to use empty string for false
but query syntax
doesn’t support column:=
syntax.
See Cast about cast.