7.13.2. Script syntax#
Script syntax is a syntax to specify complex search condition. It is
similar to ECMAScript. For example, _key == "book"
means that
groonga searches records that _key
value is "book"
. All values
are string in Query syntax but its own type in script
syntax. For example, "book"
is string, 1
is integer,
TokenBigram
is the object whose name is TokenBigram
and so on.
Script syntax doesn’t support full ECMAScript syntax. For example,
script syntax doesn’t support statement such as if
control
statement, for
iteration statement and variable definition
statement. Function definion is not supported too. But script syntax
addes the original additional operators. They are described after
ECMAScript syntax is described.
7.13.2.1. Security#
For security reason, you should not pass an input from users to Groonga directly. If there is an evil user, the user may input a query that retrieves records that should not be shown to the user.
Think about the following case.
A Groonga application constructs a Groonga request by the following program:
filter = "column @ \"#{user_input}\""
select_options = {
# ...
:filter => filter,
}
groonga_client.select(select_options)
user_input
is an input from user. If the input is query
,
here is the constructed filter parameter:
column @ "query"
If the input is x" || true || "
, here is the constructed
filter parameter:
column @ "x" || true || ""
This query matches to all records. The user will get all records from your database. The user may be evil.
It’s better that you just receive an user input as a value. It means
that you don’t accept that user input can contain operator such as
@
and &&
. If you accept operator, user can create evil query.
If user input has only value, you blocks evil query by escaping user input value. Here is a list how to escape user input value:
True value: Convert it to
true
.False value: Convert it to
false
.Numerical value: Convert it to Integer or Float. For example,
1.2
,-10
,314e-2
and so on.String value: Replace
"
with\"
and\
with\\
in the string value and surround substituted string value by"
. For example,double " quote and back \ slash
should be converted to"double \" quote and back \\ slash"
.
7.13.2.2. Sample data#
Here are a schema definition and sample data to show usage.
Execution example:
table_create Entries TABLE_PAT_KEY ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries content COLUMN_SCALAR Text
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Entries
[
{"_key": "The first post!",
"content": "Welcome! This is my first post!",
"n_likes": 5},
{"_key": "Groonga",
"content": "I started to use Groonga. It's very fast!",
"n_likes": 10},
{"_key": "Mroonga",
"content": "I also started to use Mroonga. It's also very fast! Really fast!",
"n_likes": 15},
{"_key": "Good-bye Senna",
"content": "I migrated all Senna system!",
"n_likes": 3},
{"_key": "Good-bye Tritonn",
"content": "I also migrated all Tritonn system!",
"n_likes": 3}
]
# [[0,1337566253.89858,0.000355720520019531],5]
There is a table, Entries
, for blog entries. An entry has title,
content and the number of likes for the entry. Title is key of
Entries
. Content is value of Entries.content
column. The
number of likes is value of Entries.n_likes
column.
Entries._key
column and Entries.content
column are indexed
using TokenBigram
tokenizer. So both Entries._key
and
Entries.content
are fulltext search ready.
OK. The schema and data for examples are ready.
7.13.2.3. Literals#
7.13.2.3.1. Integer#
Integer literal is sequence of 0
to 9
such as
1234567890
. +
or -
can be prepended as sign such as
+29
and -29
. Integer literal must be decimal. Octal notation,
hex and so on can’t be used.
The maximum value of integer literal is 9223372036854775807
(= 2
** 63 - 1
). The minimum value of integer literal is
-9223372036854775808
(= -(2 ** 63)
).
7.13.2.3.2. Float#
Float literal is sequence of 0
to 9
, .
and 0
to 9
such as 3.14
. +
or -
can be prepended as sign such as
+3.14
and -3.14
. ${RADIX}e${EXPORNENTIAL}
and
${RADIX}E${EXPORNENTIAL}
formats are also supported. For example,
314e-2
is the same as 3.14
.
7.13.2.3.3. String#
String literal is "..."
. You need to escape "
in literal by
prepending \\'' such as ``\"
. For example, "Say \"Hello!\"."
is
a literal for Say "Hello!".
string.
String encoding must be the same as encoding of database. The default
encoding is UTF-8. It can be changed by --with-default-encoding
configure option, --encodiong
groonga executable file option
and so on.
7.13.2.3.4. Boolean#
Boolean literal is true
and false
. true
means true and
false
means false.
7.13.2.3.5. Null#
Null literal is null
. Groonga doesn’t support null value but null
literal is supported.
7.13.2.3.6. Time#
Note
This is the groonga original notation.
Time literal doesn’t exit. There are string time notation, integer time notation and float time notation.
String time notation is "YYYY/MM/DD hh:mm:ss.uuuuuu"
or
"YYYY-MM-DD hh:mm:ss.uuuuuu"
. YYYY
is year, MM
is month,
DD
is day, hh
is hour, mm
is minute, ss
is second and
uuuuuu
is micro second. It is local time. For example,
"2012/07/23 02:41:10.436218"
is 2012-07-23T02:41:10.436218
in
ISO 8601 format.
Integer time notation is the number of seconds that have elapsed since
midnight UTC, January 1, 1970. It is also known as POSIX time. For
example, 1343011270
is 2012-07-23T02:41:10Z
in ISO 8601 format.
Float time notation is the number of seconds and micro seconds that
have elapsed since midnight UTC, January 1, 1970. For example,
1343011270.436218
is 2012-07-23T02:41:10.436218Z
in ISO 8601
format.
7.13.2.3.7. Geo point#
Note
This is the groonga original notation.
Geo point literal doesn’t exist. There is string geo point notation.
String geo point notation has the following patterns:
"LATITUDE_IN_MSECxLONGITUDE_IN_MSEC"
"LATITUDE_IN_MSEC,LONGITUDE_IN_MSEC"
"LATITUDE_IN_DEGREExLONGITUDE_IN_DEGREE"
"LATITUDE_IN_DEGREE,LONGITUDE_IN_DEGREE"
x
and ,
can be used for separator. Latitude and longitude can
be represented in milliseconds or degree.
7.13.2.3.8. Array#
Array literal is [element1, element2, ...]
.
7.13.2.3.9. Object literal#
Object literal is {name1: value1, name2: value2, ...}
. Groonga
doesn’t support object literal yet.
7.13.2.4. Control syntaxes#
Script syntax doesn’t support statement. So you cannot use control
statement such as if
. You can only use A ? B : C
expression as
control syntax.
A ? B : C
returns B
if A
is true, C
otherwise.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes == (_id == 1 ? 5 : 3)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
The expression matches records that _id
column value is equal to 1
and n_likes
column value is equal to 5
or _id
column value is
not equal to 1 and n_likes
column value is equal to 3
.
7.13.2.5. Grouping#
Its syntax is (...)
. ...
is comma separated expression list.
(...)
groups one ore more expressions and they can be processed as
an expression. a && b || c
means that a
and b
are matched
or c
is matched. a && (b || c)
means that a
and one of
b
and c
are matched.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes < 5 && content @ "senna" || content @ "fast"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
select Entries --filter 'n_likes < 5 && (content @ "senna" || content @ "fast")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ]
# ]
# ]
# ]
The first expression doesn’t use grouping. It matches records that
n_likes < 5
and content @ "senna"
are matched or
content @ "fast"
is matched.
The second expression uses grouping. It matches records that n_likes
< 5
and one of content @ "senna"
or content @ "fast"
are
matched.
7.13.2.6. Function call#
Its syntax is name(arugment1, argument2, ...)
.
name(argument1, argument2, ...)
calls a function that is named
name
with arguments argument1
, argument2
and ...
.
See Function for available functin list.
Here is a simple example:
Execution example:
select Entries --filter 'edit_distance(_key, "Groonga") <= 1'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The expression uses edit_distance. It
matches records that _key
column value is similar to
"Groonga"
. Similality of "Groonga"
is computed as edit
distance. If edit distance is less than or equal to 1, the value is
treated as similar. In this case, "Groonga"
and "Mroonga"
are
treated as similar.
7.13.2.7. Basic operators#
Groonga supports operators defined in ECMAScript.
7.13.2.7.1. Arithmetic operators#
Here are arithmetic operators.
7.13.2.7.1.1. Addition operator#
Its syntax is number1 + number2
.
The operator adds number1
and number2
and returns the result.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes == 10 + 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 15
(= 10 + 5
).
7.13.2.7.1.2. Subtraction operator#
Its syntax is number1 - number2
.
The operator subtracts number2
from number1
and returns the result.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes == 20 - 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 15
(= 20 - 5
).
7.13.2.7.1.3. Multiplication operator#
Its syntax is number1 * number2
.
The operator multiplies number1
and number2
and returns the result.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes == 3 * 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 15
(= 3 * 5
).
7.13.2.7.1.4. Division operator#
Its syntax is number1 / number2
and number1 % number2
.
The operator divides number2
by number1
. /
returns the
quotient of result. %
returns the remainder of result.
Here is simple examples.
Execution example:
select Entries --filter 'n_likes == 26 / 7'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 3
(= 26 / 7
).
Execution example:
select Entries --filter 'n_likes == 26 % 7'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 5
(= 26 % 7
).
7.13.2.7.2. Logical operators#
Here are logical operators.
7.13.2.7.2.1. Logical NOT operator#
Its syntax is !condition
.
The operator inverts boolean value of condition
.
Here is a simple example:
Execution example:
select Entries --filter '!(n_likes == 5)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is not
equal to 5
.
7.13.2.7.2.2. Logical AND operator#
Its syntax is condition1 && condition2
.
The operator returns true if both of condition1
and
condition2
are true, false otherwise.
Here is a simple example:
Execution example:
select Entries --filter 'content @ "fast" && n_likes >= 10'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The expression matches records that content
column value has the
word fast
and n_likes
column value is greater or equal to
10
.
7.13.2.7.2.3. Logical OR operator#
Its syntax is condition1 || condition2
.
The operator returns true if either condition1
or condition2
is
true, false otherwise.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes == 5 || n_likes == 10'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 5
or 10
.
7.13.2.7.2.4. Logical AND NOT operator#
Its syntax is condition1 &! condition2
.
The operator returns true if condition1
is true but condition2
is false, false otherwise. It returns difference set.
Here is a simple example:
Execution example:
select Entries --filter 'content @ "fast" &! content @ "mroonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]
The expression matches records that content
column value has the
word fast
but doesn’t have the word mroonga
.
7.13.2.7.3. Bitwise operators#
Here are bitwise operators.
7.13.2.7.3.1. Bitwise NOT operator#
Its syntax is ~number
.
The operator returns bitwise NOT of number
.
Here is a simple example:
Execution example:
select Entries --filter '~n_likes == -6'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 5
because bitwise NOT of 5
is equal to -6
.
7.13.2.7.3.2. Bitwise AND operator#
Its syntax is number1 & number2
.
The operator returns bitwise AND between number1
and number2
.
Here is a simple example:
Execution example:
select Entries --filter '(n_likes & 1) == 1'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is even
number because bitwise AND between an even number and 1
is equal
to 1
and bitwise AND between an odd number and 1
is equal to
0
.
7.13.2.7.4. Bitwise OR operator#
Its syntax is number1 | number2
.
The operator returns bitwise OR between number1
and number2
.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes == (1 | 4)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 5
(= 1 | 4
).
7.13.2.7.5. Bitwise XOR operator#
Its syntax is number1 ^ number2
.
The operator returns bitwise XOR between number1
and number2
.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes == (10 ^ 15)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 5
(= 10 ^ 15
).
7.13.2.7.6. Shift operators#
Here are shift operators.
7.13.2.7.6.1. Left shift operator#
Its syntax is number1 << number2
.
The operator performs a bitwise left shift operation on number1
by
number2
.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes == (5 << 1)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 10
(= 5 << 1
).
7.13.2.7.6.2. Signed right shift operator#
Its syntax is number1 >> number2
.
The operator shifts bits of number1
to right by number2
. The sign
of the result is the same as number1
.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes == -(-10 >> 1)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 5
(= -(-10 >> 1)
= -(-5)
).
7.13.2.7.6.3. Unsigned right shift operator#
Its syntax is number1 >>> number2
.
The operator shifts bits of number1
to right by number2
. The
leftmost number2
bits are filled by 0
.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes == (2147483648 - (-10 >>> 1))'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 5
(= 2147483648 - (-10 >>> 1)
= 2147483648 - 2147483643
).
7.13.2.7.7. Comparison operators#
Here are comparison operators.
7.13.2.7.7.1. Equal operator#
Its syntax is object1 == object2
.
The operator returns true if object1
equals to object2
, false
otherwise.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes == 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is equal
to 5
.
7.13.2.7.7.2. Not equal operator#
Its syntax is object1 != object2
.
The operator returns true if object1
does not equal to
object2
, false otherwise.
Here is a simple example:
Execution example:
select Entries --filter 'n_likes != 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
The expression matches records that n_likes
column value is not
equal to 5
.
7.13.2.7.7.3. Less than operator#
TODO: …
7.13.2.7.7.4. Less than or equal to operator#
TODO: …
7.13.2.7.7.5. Greater than operator#
TODO: …
7.13.2.7.7.6. Greater than or equal to operator#
TODO: …
7.13.2.8. Assignment operators#
7.13.2.8.1. Addition assignment operator#
Its syntax is column1 += column2
.
The operator performs addition assignment operation on column1 by column2.
Execution example:
select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score += n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 4
# ],
# [
# "Good-bye Tritonn",
# 3,
# 4
# ],
# [
# "Groonga",
# 10,
# 11
# ],
# [
# "Mroonga",
# 15,
# 16
# ],
# [
# "The first post!",
# 5,
# 6
# ]
# ]
# ]
# ]
The value of _score
by --filter
is always 1 in this case,
then performs addition assignment operation such as ‘_score = _score + n_likes’ for each records.
For example, the value of _score
about the record which stores “Good-bye Senna” as the _key
is 3.
So the expression 1 + 3
is evaluated and stored to _score
column as the execution result.
7.13.2.8.2. Subtraction assignment operator#
Its syntax is column1 -= column2
.
The operator performs subtraction assignment operation on column1 by column2.
Execution example:
select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score -= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# -2
# ],
# [
# "Good-bye Tritonn",
# 3,
# -2
# ],
# [
# "Groonga",
# 10,
# -9
# ],
# [
# "Mroonga",
# 15,
# -14
# ],
# [
# "The first post!",
# 5,
# -4
# ]
# ]
# ]
# ]
The value of _score
by --filter
is always 1 in this case,
then performs subtraction assignment operation such as ‘_score = _score - n_likes’ for each records.
For example, the value of _score
about the record which stores “Good-bye Senna” as the _key
is 3.
So the expression 1 - 3
is evaluated and stored to _score
column as the execution result.
7.13.2.8.3. Multiplication assignment operator#
Its syntax is column1 *= column2
.
The operator performs multiplication assignment operation on column1 by column2.
Execution example:
select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score *= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 3
# ],
# [
# "Good-bye Tritonn",
# 3,
# 3
# ],
# [
# "Groonga",
# 10,
# 10
# ],
# [
# "Mroonga",
# 15,
# 15
# ],
# [
# "The first post!",
# 5,
# 5
# ]
# ]
# ]
# ]
The value of _score
by --filter
is always 1 in this case,
then performs subtraction assignment operation such as ‘_score = _score * n_likes’ for each records.
For example, the value of _score
about the record which stores “Good-bye Senna” as the _key
is 3.
So the expression 1 * 3
is evaluated and stored to _score
column as the execution result.
7.13.2.8.4. Division assignment operator#
Its syntax is column1 /= column2
.
The operator performs division assignment operation on column1 by column2.
Execution example:
select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score /= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 0
# ],
# [
# "Good-bye Tritonn",
# 3,
# 0
# ],
# [
# "Groonga",
# 10,
# 0
# ],
# [
# "Mroonga",
# 15,
# 0
# ],
# [
# "The first post!",
# 5,
# 0
# ]
# ]
# ]
# ]
The value of _score
by --filter
is always 1 in this case,
then performs subtraction assignment operation such as ‘_score = _score / n_likes’ for each records.
For example, the value of _score
about the record which stores “Good-bye Senna” as the _key
is 3.
So the expression 1 / 3
is evaluated and stored to _score
column as the execution result.
7.13.2.8.5. Modulo assignment operator#
Its syntax is column1 %= column2
.
The operator performs modulo assignment operation on column1 by column2.
Execution example:
select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score %= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 1
# ],
# [
# "Good-bye Tritonn",
# 3,
# 1
# ],
# [
# "Groonga",
# 10,
# 1
# ],
# [
# "Mroonga",
# 15,
# 1
# ],
# [
# "The first post!",
# 5,
# 1
# ]
# ]
# ]
# ]
The value of _score
by --filter
is always 1 in this case,
then performs subtraction assignment operation such as ‘_score = _score % n_likes’ for each records.
For example, the value of _score
about the record which stores “Good-bye Senna” as the _key
is 3.
So the expression 1 % 3
is evaluated and stored to _score
column as the execution result.
7.13.2.8.6. Bitwise left shift assignment operator#
Its syntax is column1 <<= column2
.
The operator performs left shift assignment operation on column1 by column2.
Execution example:
select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score <<= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 8
# ],
# [
# "Good-bye Tritonn",
# 3,
# 8
# ],
# [
# "Groonga",
# 10,
# 1024
# ],
# [
# "Mroonga",
# 15,
# 32768
# ],
# [
# "The first post!",
# 5,
# 32
# ]
# ]
# ]
# ]
The value of _score
by --filter
is always 1 in this case,
then performs subtraction assignment operation such as ‘_score = _score << n_likes’ for each records.
For example, the value of _score
about the record which stores “Good-bye Senna” as the _key
is 3.
So the expression 1 << 3
is evaluated and stored to _score
column as the execution result.
7.13.2.8.7. Bitwise signed right shift assignment operator#
Its syntax is column2 >>= column2
.
The operator performs signed right shift assignment operation on column1 by column2.
7.13.2.8.8. Bitwise unsigned right shift assignment operator#
Its syntax is column1 >>>= column2
.
The operator performs unsigned right shift assignment operation on column1 by column2.
7.13.2.8.9. Bitwise AND assignment operator#
Its syntax is column1 &= column2
.
The operator performs bitwise AND assignment operation on column1 by column2.
Execution example:
select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score &= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 1
# ],
# [
# "Good-bye Tritonn",
# 3,
# 1
# ],
# [
# "Groonga",
# 10,
# 0
# ],
# [
# "Mroonga",
# 15,
# 1
# ],
# [
# "The first post!",
# 5,
# 1
# ]
# ]
# ]
# ]
The value of _score
by --filter
is always 1 in this case,
then performs subtraction assignment operation such as ‘_score = _score & n_likes’ for each records.
For example, the value of _score
about the record which stores “Groonga” as the _key
is 10.
So the expression 1 & 10
is evaluated and stored to _score
column as the execution result.
7.13.2.8.10. Bitwise OR assignment operator#
Its syntax is column1 |= column2
.
The operator performs bitwise OR assignment operation on column1 by column2.
Execution example:
select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score |= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 3
# ],
# [
# "Good-bye Tritonn",
# 3,
# 3
# ],
# [
# "Groonga",
# 10,
# 11
# ],
# [
# "Mroonga",
# 15,
# 15
# ],
# [
# "The first post!",
# 5,
# 5
# ]
# ]
# ]
# ]
The value of _score
by --filter
is always 1 in this case,
then performs subtraction assignment operation such as ‘_score = _score | n_likes’ for each records.
For example, the value of _score
about the record which stores “Groonga” as the _key
is 10.
So the expression 1 | 10
is evaluated and stored to _score
column as the execution result.
7.13.2.8.11. Bitwise XOR assignment operator#
Its syntax is column1 ^= column2
.
The operator performs bitwise XOR assignment operation on column1 by column2.
Execution example:
select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score ^= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 2
# ],
# [
# "Good-bye Tritonn",
# 3,
# 2
# ],
# [
# "Groonga",
# 10,
# 11
# ],
# [
# "Mroonga",
# 15,
# 14
# ],
# [
# "The first post!",
# 5,
# 4
# ]
# ]
# ]
# ]
The value of _score
by --filter
is always 1 in this case,
then performs subtraction assignment operation such as ‘_score = _score ^ n_likes’ for each records.
For example, the value of _score
about the record which stores “Good-bye Senna” as the _key
is 3.
So the expression 1 ^ 3
is evaluated and stored to _score
column as the execution result.
7.13.2.9. Original operators#
Script syntax adds the original binary opearators to ECMAScript
syntax. They operate search specific operations. They are starts with
@
or *
.
7.13.2.9.1. Match operator#
Its syntax is column @ value
.
The operator searches value
by inverted index of column
.
Normally, full text search is operated but tag search can be operated.
Because tag search is also implemented by inverted index.
Query syntax uses this operator by default.
Here is a simple example:
Execution example:
select Entries --filter 'content @ "fast"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I started to use Groonga. It's very fast!"
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!"
# ]
# ]
# ]
# ]
The expression matches records that contain a word fast
in
content
column value.
7.13.2.9.2. Prefix search operator#
Its syntax is column @^ value
.
The operator does prefix search with value
. Prefix search searches
records that contain a word that starts with value
.
You can use fast prefix search against a column. The column must be
indexed and index table must be patricia trie table
(TABLE_PAT_KEY
) or double array trie table
(TABLE_DAT_KEY
). You can also use fast prefix search against
_key
pseudo column of patricia trie table or double array trie
table. You don’t need to index _key
.
Prefix search can be used with other table types but it causes all records scan. It’s not problem for small records but it spends more time for large records.
Here is a simple example:
Execution example:
select Entries --filter '_key @^ "Goo"' --output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Good-bye Senna"
# ],
# [
# "Good-bye Tritonn"
# ]
# ]
# ]
# ]
The expression matches records that contain a word that starts with
Goo
in _key
pseudo column value. Good-bye Senna
and
Good-bye Tritonn
are matched with the expression.
7.13.2.9.3. Suffix search operator#
Its syntax is column @$ value
.
This operator does suffix search with value
. Suffix search
searches records that contain a word that ends with value
.
You can use fast suffix search against a column. The column must be
indexed and index table must be patricia trie table
(TABLE_PAT_KEY
) with KEY_WITH_SIS
flag. You can also use fast
suffix search against _key
pseudo column of patricia trie table
(TABLE_PAT_KEY
) with KEY_WITH_SIS
flag. You don’t need to
index _key
. We recommended that you use index column based fast
suffix search instead of _key
based fast suffix search. _key
based fast suffix search returns automatically registered
substrings. (TODO: write document about suffix search and link to it
from here.)
Note
Fast suffix search can be used only for non-ASCII characters such as hiragana in Japanese. You cannot use fast suffix search for ASCII character.
Suffix search can be used with other table types or patricia trie
table without KEY_WITH_SIS
flag but it causes all records
scan. It’s not problem for small records but it spends more time for
large records.
Here is a simple example. It uses fast suffix search for hiragana in Japanese that is one of non-ASCII characters.
Execution example:
table_create Titles TABLE_NO_KEY
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Titles content COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
table_create SuffixSearchTerms TABLE_PAT_KEY|KEY_WITH_SIS ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create SuffixSearchTerms index COLUMN_INDEX Titles content
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Titles
[
{"content": "ぐるんが"},
{"content": "むるんが"},
{"content": "せな"},
{"content": "とりとん"}
]
# [[0,1337566253.89858,0.000355720520019531],4]
select Titles --query 'content:$んが'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 2,
# "むるんが"
# ],
# [
# 1,
# "ぐるんが"
# ]
# ]
# ]
# ]
The expression matches records that have value that ends with んが
in content
column value. ぐるんが
and むるんが
are matched
with the expression.
7.13.2.9.4. Near search operator#
Its syntax is one of them:
column *N "word1 word2 ..."
column *N${MAX_INTERVAL} "word1 word2 ..."
column *N${MAX_INTERVAL},${MAX_TOKEN_INTERVAL_1}|${MAX_TOKEN_INTERVAL_2}|... "word1 word2 ..."
Here are the examples of the second form:
column *N29 "word1 word2 ..."
column *N-1 "word1 word2 ..."
The first example means that 29
is used for the max interval.
The second example means that -1
is used for the max interval.
-1
max interval means no limit.
Here are examples of the third form:
column *N10,2|3 "word1 word2 word3"
column *N10,2 "word1 word2 word3"
The first example means that 2
is used for the max interval of the
first interval and 3
is used for the max interval of the second
interval.
The second example means that 2
is used for the first max interval
of the first interval and -1
is used for the max interval of the
second interval. Because the omitted max interval is treated as
-1
.
The max intervals of each token (word) are described later.
The operator does near search with words word1 word2 ...
. Near
search searches records that contain the words and the words are
appeared in the specified order and the max interval.
The max interval is 10
by default. The unit of the max interval is
the number of characters in N-gram family tokenizers and the number of
words in morphological analysis family tokenizers.
However, TokenBigram
doesn’t split ASCII only word into tokens.
Because TokenBigram
uses white-space-separate like tokenize method
for ASCII characters in this case.
So the unit for ASCII words with TokenBigram
is the number of
words even if TokenBigram
is a N-gram family tokenizer.
Note that an index column for full text search must be defined for
column
.
Here is a simple example:
Execution example:
select Entries --filter 'content *N "I fast"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I started to use Groonga. It's very fast!"
# ]
# ]
# ]
# ]
select Entries --filter 'content *N "I Really"' --output_columns content
# [[0,1337566253.89858,0.000355720520019531],[[[0],[["content","Text"]]]]]
select Entries --filter 'content *N "also Really"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!"
# ]
# ]
# ]
# ]
The first expression matches records that contain I
and fast
and the max interval of those words are in 10 words. So the record
that its content is I started to use Groonga. It's very fast!
is matched.
The number of words between I
and fast
is 7.
The second expression matches records that contain I
and
Really
and the max interval of those words are in 10 words. So the
record that its content is I also started to use mroonga. It's also
very fast! Really fast!
is not matched. The number of words between
I
and Really
is 11.
The third expression matches records that contain also
and
Really
and the max interval of those words are in 10 words. So
the record that its content is I also st arted to use mroonga. It's
also very fast! Really fast!
is matched. The number of words between
also
and Really
is 10.
New in version 12.0.1: The max intervals of each token.
You can specify the max intervals of each token. The default is no limit. It means that all intervals of each token are valid as long as the max interval is satisfied.
Here is an example that use 2
for the max interval of the first
interval and 4
for the max interval of the second interval:
content *N10,2|4 "a b c"
10
is the max interval.
|
is the separator of the max intervals of each token.
This matches a x b x x x c
. But this doesn’t match a x x b c
,
a b x x x x c
and so on because the former has 3
interval for
the first interval that is larger than 2
and the latter has 5
interval for the second interval that is later than 4
.
Here is an example that specifies the max intervals of each token:
Execution example:
select Entries --filter 'content *N11,5|3 "first welcome post"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
select Entries --filter 'content *N11,4|3 "first welcome post"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ]
# ]
# ]
# ]
You can omit one or more intervals. Omitted intervals are treated as
-1
. It means that *N11,5
equals *N11,5|-1
. -1
means
that no limit.
Here is an example that omits an interval:
Execution example:
select Entries --filter 'content *N11,5 "first welcome post"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
select Entries --filter 'content *N11,5|-1 "first welcome post"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
You can specify extra intervals. They are just ignored:
Execution example:
select Entries --filter 'content *N11,5|6|1|2|3 "first welcome post"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
select Entries --filter 'content *N11,5|6 "first welcome post"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]
7.13.2.9.5. Near phrase search operator#
Its syntax is one of them:
column *NP "phrase1 phrase2 ..."
column *NP${MAX_INTERVAL} "phrase1 phrase2 ..."
column *NP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL} "phrase1 phrase2 ..."
column *NP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL},${MAX_PHRASE_INTERVAL_1}|${MAX_PHRASE_INTERVAL_2}|... "phrase1 phrase2 ..."
Here are examples of the second form:
column *NP29 "phrase1 phrase2 ..."
column *NP-1 "phrase1 phrase2 ..."
The first example means that 29
is used for the max interval.
The second example means that -1
is used for the max interval.
The max interval is described later.
Here are examples of the third form:
column *NP10,29 "phrase1 phrase2 ..."
column *NP10,-1 "phrase1 phrase2 ..."
The first example means that 29
is used for the additional last
interval.
The second example means that -1
is used for the additional last
interval.
The additional last interval is described later.
New in version 12.0.1: The max intervals of each phrase.
Here are examples of the forth form:
column *NP10,0,2|3 "phrase1 phrase2 phrase3"
column *NP10,0,2 "phrase1 phrase2 phrase3"
The first example means that 2
is used for the max interval of the
first interval and 3
is used for the max interval of the second
interval.
The second example means that 2
is used for the first max interval
of the first interval and -1
is used for the max interval of the
second interval. Because the omitted max interval is treated as
-1
.
See Near phrase search operator for the max intervals of each phrase.
The operator does near phrase search with phrases phrase1 phrase2
...
. Near phrase search searches records that contain the phrases
and the phrases are appeared in the specified order and the max
interval.
The max interval is 10
by default. The unit of the max interval is
the number of characters in N-gram family tokenizers and the number of
words in morphological analysis family tokenizers.
However, TokenBigram
doesn’t split ASCII only word into tokens.
Because TokenBigram
uses white-space-separate like tokenize method
for ASCII characters in this case.
So the unit for ASCII words with TokenBigram
is the number of
words even if TokenBigram
is a N-gram family tokenizer.
Note that an index column for full text search must be defined for
column
.
TODO: Use index that has TokenNgram("unify_alphabet", false)
tokenizer to show difference with near search with English text.
Here is a simple example:
Execution example:
select Entries --filter 'content *NP "I fast"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I started to use Groonga. It's very fast!"
# ]
# ]
# ]
# ]
select Entries --filter 'content *NP "I Really"' --output_columns content
# [[0,1337566253.89858,0.000355720520019531],[[[0],[["content","Text"]]]]]
select Entries --filter 'content *NP "also Really"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!"
# ]
# ]
# ]
# ]
The first expression matches records that contain I
and fast
and the max interval of those words are in 10 words. So the record
that its content is I also started to use mroonga. It's also very
fast! ...
is matched. The number of words between I
and fast
is just 10.
The second expression matches records that contain I
and
Really
and the max interval of those words are in 10 words. So the
record that its content is I also started to use mroonga. It's also
very fast! Really fast!
is not matched. The number of words between
I
and Really
is 14.
The third expression matches records that contain also
and
Really
and the max interval of those words are in 10 words. So
the record that its content is I also st arted to use mroonga. It's
also very fast! Really fast!
is matched. The number of words between
also
and Really
is 10.
Here is an example to use the custom max interval:
Execution example:
select Entries --filter 'content *NP14 "I Really"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!"
# ]
# ]
# ]
# ]
select Entries --filter 'content *NP-1 "I Really"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!"
# ]
# ]
# ]
# ]
The first expression matches I also started to use
mroonga. It's also very fast! Really fast!
because the number of
words between I
and Really
is 14.
The second expression also matches I also started to use
mroonga. It's also very fast! Really fast!
because -1
means that
there is no limitation the number of words between I
and
Really
.
You can use additional interval only for the last phrase. It means
that you can accept more distance only between the second to last
phrase and the last phrase. This is useful for implementing a near
phrase search in the same sentence. If you specify .
(sentence end
phrase) as the last phrase and specify -1
as the additional last
interval, the other specified phrases must be appeared before
.
. You must append $
to the last phrase like .$
.
Here is an example that uses -1
as the additional last interval of
the given phrases:
column *NP10,-1 "a b .$"
Here is an example to customize the additional last interval of the given phrases:
Execution example:
select Entries --filter 'content *NP1,-1 "I started .$"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I started to use Groonga. It's very fast!"
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!"
# ]
# ]
# ]
# ]
You can also use positive number for the additional last interval. If you specify positive number as the additional last interval, all of the following conditions must be satisfied:
The interval between the first phrase and the second to last phrase is less than or equals to
the max interval
.The interval between the first phrase and the last phrase is less than or equals to
the max interval
+the additional last interval
.
If you specify negative number as the additional last interval, the second condition isn’t required. Appearing the last phrase is just needed.
Here is an example to use positive number as the additional last interval:
Execution example:
select Entries --filter 'content *NP1,4 "I started .$"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I started to use Groonga. It's very fast!"
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!"
# ]
# ]
# ]
# ]
7.13.2.9.6. Near phrase product search operator#
New in version 11.1.1.
Its syntax is one of them:
column *NPP "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *NPP${MAX_INTERVAL} "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *NPP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL} "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *NPP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL},${MAX_PHRASE_INTERVAL_1}|${MAX_PHRASE_INTERVAL_2}|... "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
Here are examples of the second form:
column *NPP29 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *NPP-1 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
The first example means that 29
is used for the max interval.
The second example means that -1
is used for the max interval.
Here are examples of the third form:
column *NPP10,29 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *NPP10,-1 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
The first example means that 29
is used for the additional last
interval.
The second example means that -1
is used for the additional last
interval.
New in version 12.0.1: The max intervals of each phrase.
Here are examples of the forth form:
column *NPP10,0,2|3 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) (phrase3-1 phrase3-2 ...)"
column *NPP10,0,2 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) (phrase3-1 phrase3-2 ...)"
The first example means that 2
is used for the max interval of the
first interval and 3
is used for the max interval of the second
interval.
The second example means that 2
is used for the first max interval
of the first interval and -1
is used for the max interval of the
second interval. Because the omitted max interval is treated as
-1
.
See Near phrase search operator for the max intervals of each phrase.
This operator does multiple
Near phrase search operator. Phrases for each
Near phrase search operator are computed as
product of {phrase1_1, phrase1_2, ...}
, {phrase2_1, phrase2_2,
...}
and ...
. For example, column *NPP "(a b c) (d e)"
uses
the following phrases for near phrase searches:
a d
a e
b d
b e
c d
c e
Here is a simple example:
Execution example:
select Entries \
--filter 'content *NPP "(I It) (migrated fast)"' \
--output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I started to use Groonga. It's very fast!"
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!"
# ],
# [
# "I migrated all Senna system!"
# ],
# [
# "I also migrated all Tritonn system!"
# ]
# ]
# ]
# ]
You can use the all features of
Near phrase search operator such as the max
interval, $
for the last phrase and the additional last interval.
Execution example:
select Entries \
--filter 'content *NPP2,-1 "(I It) (migrated fast) (.$)"' \
--output_columns content
# [[0,1337566253.89858,0.000355720520019531],[[[0],[["content","Text"]]]]]
This is more effective than multiple Near phrase search operator .
7.13.2.9.7. Ordered near phrase search operator#
New in version 11.0.9.
Its syntax is one of them:
column *ONP "phrase1 phrase2 ..."
column *ONP${MAX_INTERVAL} "phrase1 phrase2 ..."
column *ONP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL} "phrase1 phrase2 ..."
column *ONP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL},${MAX_PHRASE_INTERVAL_1}|${MAX_PHRASE_INTERVAL_2}|... "phrase1 phrase2 ..."
Here are examples of the second form:
column *ONP29 "phrase1 phrase2 ..."
column *ONP-1 "phrase1 phrase2 ..."
The first example means that 29
is used for the max interval.
The second example means that -1
is used for the max interval.
Here are examples of the third form:
column *ONP10,29 "phrase1 phrase2 ..."
column *ONP10,-1 "phrase1 phrase2 ..."
The first example means that 29
is used for the additional last
interval.
The second example means that -1
is used for the additional last
interval.
New in version 12.0.1: The max intervals of each phrase.
Here are examples of the forth form:
column *ONP10,0,2|3 "phrase1 phrase2 phrase3"
column *ONP10,0,2 "phrase1 phrase2 phrase3"
The first example means that 2
is used for the max interval of the
first interval and 3
is used for the max interval of the second
interval.
The second example means that 2
is used for the first max interval
of the first interval and -1
is used for the max interval of the
second interval. Because the omitted max interval is treated as
-1
.
See Near phrase search operator for the max intervals of each phrase.
This operator does ordered near phrase search with phrase1
,
phrase2
and ...
. Ordered near phrase search is similar to
Near phrase search operator but ordered near
phrase search checks phrases order. For example, column *ONP
"groonga mroonga pgroonga"
matches groonga mroonga rroonga
pgroonga
but doesn’t match groonga rroonga pgroonga
mroonga
. Because the latter uses different order.
Here is a simple example:
Execution example:
select Entries \
--filter 'content *ONP "I Groonga"' \
--output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I started to use Groonga. It's very fast!"
# ]
# ]
# ]
# ]
select Entries \
--filter 'content *ONP "Groonga I"' \
--output_columns content
# [[0,1337566253.89858,0.000355720520019531],[[[0],[["content","Text"]]]]]
You can use the all features of
Near phrase search operator such as the max
interval and the additional last interval. But you don’t need to
specify $
for the last phrase because the last phrase in query is
the last phrase.
7.13.2.9.8. Ordered near phrase product search operator#
New in version 11.1.1.
Its syntax is one of them:
column *ONPP "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *ONPP${MAX_INTERVAL} "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *ONPP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL} "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *ONPP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL},${MAX_PHRASE_INTERVAL_1}|${MAX_PHRASE_INTERVAL_2}|... "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
Here are examples of the second form:
column *ONPP29 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *ONPP-1 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
The first example means that 29
is used for the max interval.
The second example means that -1
is used for the max interval.
Here are examples of the third form:
column *ONPP10,29 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *ONPP10,-1 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
The first example means that 29
is used for the additional last
interval.
The second example means that -1
is used for the additional last
interval.
New in version 12.0.1: The max intervals of each phrase.
Here are examples of the forth form:
column *ONPP10,0,2|3 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) (phrase3-1 phrase3-2 ...)"
column *ONPP10,0,2 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) (phrase3-1 phrase3-2 ...)"
The first example means that 2
is used for the max interval of the
first interval and 3
is used for the max interval of the second
interval.
The second example means that 2
is used for the first max interval
of the first interval and -1
is used for the max interval of the
second interval. Because the omitted max interval is treated as
-1
.
See Near phrase search operator for the max intervals of each phrase.
This operator does ordered near phrase product search. Ordered near
phrase product search is similar to
Near phrase product search operator but ordered
near phrase product search checks phrases order like
Ordered near phrase search operator. For example,
column *ONPP "(a b c) (d e)"
matches a 1 d
but doesn’t match
d 1 a
. Because the latter uses different order.
Here is a simple example:
Execution example:
select Entries \
--filter 'content *ONPP "(I It) (migrated fast) (.)"' \
--output_columns content
# [[0,1337566253.89858,0.000355720520019531],[[[0],[["content","Text"]]]]]
You can use the all features of
Near phrase search operator such as the max
interval and the additional last interval. But you don’t need to
specify $
for the last phrase because the last phrase in query is
the last phrase.
7.13.2.9.9. Similar search operator#
Its syntax is column *S "document"
.
The operator does similar search with document document
. Similar
search searches records that have similar content to
document
.
Note that an index column for full text search must be defined for
column
.
Here is a simple example:
Execution example:
select Entries --filter 'content *S "I migrated all Solr system!"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I migrated all Senna system!"
# ],
# [
# "I also migrated all Tritonn system!"
# ]
# ]
# ]
# ]
The expression matches records that have similar content to I
migrated all Solr system!
. In this case, records that have I
migrated all XXX system!
content are matched.
You should use TokenMecab
tokenizer for similar search against Japanese documents.
Because TokenMecab
will tokenize target documents to almost words, it improves similar search precision.
7.13.2.9.10. Term extract operator#
Its syntax is _key *T "document"
.
The operator extracts terms from document
. Terms must be
registered as keys of the table of _key
.
Note that the table must be patricia trie (TABLE_PAT_KEY
) or
double array trie (TABLE_DAT_KEY
). You can’t use hash table
(TABLE_HASH_KEY
) and array (TABLE_NO_KEY
) because they don’t
support longest common prefix search. Longest common prefix search is
used to implement the operator.
Here is a simple example:
Execution example:
table_create Words TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Words
[
{"_key": "groonga"},
{"_key": "mroonga"},
{"_key": "Senna"},
{"_key": "Tritonn"}
]
# [[0,1337566253.89858,0.000355720520019531],4]
select Words --filter '_key *T "Groonga is the successor project to Senna."' --output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "groonga"
# ],
# [
# "senna"
# ]
# ]
# ]
# ]
The expression extrcts terms that included in document Groonga is
the successor project to Senna.
. In this case, NormalizerAuto
normalizer is specified to Words
. So Groonga
can be extracted
even if it is loaded as groonga
into Words
. All of extracted
terms are also normalized.
7.13.2.9.11. Regular expression operator#
New in version 5.0.1.
Its syntax is column @~ "pattern"
.
The operator searches records by the regular expression
pattern
. If a record’s column
value is matched to pattern
,
the record is matched.
pattern
must be valid regular expression syntax. See
Regular expression about regular expression syntax
details.
The following example uses .roonga
as pattern. It matches
Groonga
, Mroonga
and so on.
Execution example:
select Entries --filter 'content @~ ".roonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
In most cases, regular expression is evaluated sequentially. So it may be slow against many records.
In some cases, Groonga evaluates regular expression by index. It’s very fast. See Regular expression for details.