7.13.2. Script syntax#

Script syntax is a syntax to specify complex search condition. It is similar to ECMAScript. For example, _key == "book" means that groonga searches records that _key value is "book". All values are string in Query syntax but its own type in script syntax. For example, "book" is string, 1 is integer, TokenBigram is the object whose name is TokenBigram and so on.

Script syntax doesn’t support full ECMAScript syntax. For example, script syntax doesn’t support statement such as if control statement, for iteration statement and variable definition statement. Function definion is not supported too. But script syntax addes the original additional operators. They are described after ECMAScript syntax is described.

7.13.2.1. Security#

For security reason, you should not pass an input from users to Groonga directly. If there is an evil user, the user may input a query that retrieves records that should not be shown to the user.

Think about the following case.

A Groonga application constructs a Groonga request by the following program:

filter = "column @ \"#{user_input}\""
select_options = {
  # ...
  :filter => filter,
}
groonga_client.select(select_options)

user_input is an input from user. If the input is query, here is the constructed filter parameter:

column @ "query"

If the input is x" || true || ", here is the constructed filter parameter:

column @ "x" || true || ""

This query matches to all records. The user will get all records from your database. The user may be evil.

It’s better that you just receive an user input as a value. It means that you don’t accept that user input can contain operator such as @ and &&. If you accept operator, user can create evil query.

If user input has only value, you blocks evil query by escaping user input value. Here is a list how to escape user input value:

  • True value: Convert it to true.

  • False value: Convert it to false.

  • Numerical value: Convert it to Integer or Float. For example, 1.2, -10, 314e-2 and so on.

  • String value: Replace " with \" and \ with \\ in the string value and surround substituted string value by ". For example, double " quote and back \ slash should be converted to "double \" quote and back \\ slash".

7.13.2.2. Sample data#

Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries TABLE_PAT_KEY ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries content COLUMN_SCALAR Text
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Entries
[
{"_key":    "The first post!",
 "content": "Welcome! This is my first post!",
 "n_likes": 5},
{"_key":    "Groonga",
 "content": "I started to use Groonga. It's very fast!",
 "n_likes": 10},
{"_key":    "Mroonga",
 "content": "I also started to use Mroonga. It's also very fast! Really fast!",
 "n_likes": 15},
{"_key":    "Good-bye Senna",
 "content": "I migrated all Senna system!",
 "n_likes": 3},
{"_key":    "Good-bye Tritonn",
 "content": "I also migrated all Tritonn system!",
 "n_likes": 3}
]
# [[0,1337566253.89858,0.000355720520019531],5]

There is a table, Entries, for blog entries. An entry has title, content and the number of likes for the entry. Title is key of Entries. Content is value of Entries.content column. The number of likes is value of Entries.n_likes column.

Entries._key column and Entries.content column are indexed using TokenBigram tokenizer. So both Entries._key and Entries.content are fulltext search ready.

OK. The schema and data for examples are ready.

7.13.2.3. Literals#

7.13.2.3.1. Integer#

Integer literal is sequence of 0 to 9 such as 1234567890. + or - can be prepended as sign such as +29 and -29. Integer literal must be decimal. Octal notation, hex and so on can’t be used.

The maximum value of integer literal is 9223372036854775807 (= 2 ** 63 - 1). The minimum value of integer literal is -9223372036854775808 (= -(2 ** 63)).

7.13.2.3.2. Float#

Float literal is sequence of 0 to 9, . and 0 to 9 such as 3.14. + or - can be prepended as sign such as +3.14 and -3.14. ${RADIX}e${EXPORNENTIAL} and ${RADIX}E${EXPORNENTIAL} formats are also supported. For example, 314e-2 is the same as 3.14.

7.13.2.3.3. String#

String literal is "...". You need to escape " in literal by prepending \\'' such as ``\". For example, "Say \"Hello!\"." is a literal for Say "Hello!". string.

String encoding must be the same as encoding of database. The default encoding is UTF-8. It can be changed by --with-default-encoding configure option, --encodiong groonga executable file option and so on.

7.13.2.3.4. Boolean#

Boolean literal is true and false. true means true and false means false.

7.13.2.3.5. Null#

Null literal is null. Groonga doesn’t support null value but null literal is supported.

7.13.2.3.6. Time#

Note

This is the groonga original notation.

Time literal doesn’t exit. There are string time notation, integer time notation and float time notation.

String time notation is "YYYY/MM/DD hh:mm:ss.uuuuuu" or "YYYY-MM-DD hh:mm:ss.uuuuuu". YYYY is year, MM is month, DD is day, hh is hour, mm is minute, ss is second and uuuuuu is micro second. It is local time. For example, "2012/07/23 02:41:10.436218" is 2012-07-23T02:41:10.436218 in ISO 8601 format.

Integer time notation is the number of seconds that have elapsed since midnight UTC, January 1, 1970. It is also known as POSIX time. For example, 1343011270 is 2012-07-23T02:41:10Z in ISO 8601 format.

Float time notation is the number of seconds and micro seconds that have elapsed since midnight UTC, January 1, 1970. For example, 1343011270.436218 is 2012-07-23T02:41:10.436218Z in ISO 8601 format.

7.13.2.3.7. Geo point#

Note

This is the groonga original notation.

Geo point literal doesn’t exist. There is string geo point notation.

String geo point notation has the following patterns:

  • "LATITUDE_IN_MSECxLONGITUDE_IN_MSEC"

  • "LATITUDE_IN_MSEC,LONGITUDE_IN_MSEC"

  • "LATITUDE_IN_DEGREExLONGITUDE_IN_DEGREE"

  • "LATITUDE_IN_DEGREE,LONGITUDE_IN_DEGREE"

x and , can be used for separator. Latitude and longitude can be represented in milliseconds or degree.

7.13.2.3.8. Array#

Array literal is [element1, element2, ...].

7.13.2.3.9. Object literal#

Object literal is {name1: value1, name2: value2, ...}. Groonga doesn’t support object literal yet.

7.13.2.4. Control syntaxes#

Script syntax doesn’t support statement. So you cannot use control statement such as if. You can only use A ? B : C expression as control syntax.

A ? B : C returns B if A is true, C otherwise.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes == (_id == 1 ? 5 : 3)'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         4,
#         "Good-bye Senna",
#         "I migrated all Senna system!",
#         3
#       ],
#       [
#         5,
#         "Good-bye Tritonn",
#         "I also migrated all Tritonn system!",
#         3
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]

The expression matches records that _id column value is equal to 1 and n_likes column value is equal to 5 or _id column value is not equal to 1 and n_likes column value is equal to 3.

7.13.2.5. Grouping#

Its syntax is (...). ... is comma separated expression list.

(...) groups one ore more expressions and they can be processed as an expression. a && b || c means that a and b are matched or c is matched. a && (b || c) means that a and one of b and c are matched.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes < 5 && content @ "senna" || content @ "fast"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         4,
#         "Good-bye Senna",
#         "I migrated all Senna system!",
#         3
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15
#       ]
#     ]
#   ]
# ]
select Entries --filter 'n_likes < 5 && (content @ "senna" || content @ "fast")'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         4,
#         "Good-bye Senna",
#         "I migrated all Senna system!",
#         3
#       ]
#     ]
#   ]
# ]

The first expression doesn’t use grouping. It matches records that n_likes < 5 and content @ "senna" are matched or content @ "fast" is matched.

The second expression uses grouping. It matches records that n_likes < 5 and one of content @ "senna" or content @ "fast" are matched.

7.13.2.6. Function call#

Its syntax is name(arugment1, argument2, ...).

name(argument1, argument2, ...) calls a function that is named name with arguments argument1, argument2 and ....

See Function for available functin list.

Here is a simple example:

Execution example:

select Entries --filter 'edit_distance(_key, "Groonga") <= 1'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15
#       ]
#     ]
#   ]
# ]

The expression uses edit_distance. It matches records that _key column value is similar to "Groonga". Similality of "Groonga" is computed as edit distance. If edit distance is less than or equal to 1, the value is treated as similar. In this case, "Groonga" and "Mroonga" are treated as similar.

7.13.2.7. Basic operators#

Groonga supports operators defined in ECMAScript.

7.13.2.7.1. Arithmetic operators#

Here are arithmetic operators.

7.13.2.7.1.1. Addition operator#

Its syntax is number1 + number2.

The operator adds number1 and number2 and returns the result.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes == 10 + 5'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 15 (= 10 + 5).

7.13.2.7.1.2. Subtraction operator#

Its syntax is number1 - number2.

The operator subtracts number2 from number1 and returns the result.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes == 20 - 5'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 15 (= 20 - 5).

7.13.2.7.1.3. Multiplication operator#

Its syntax is number1 * number2.

The operator multiplies number1 and number2 and returns the result.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes == 3 * 5'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 15 (= 3 * 5).

7.13.2.7.1.4. Division operator#

Its syntax is number1 / number2 and number1 % number2.

The operator divides number2 by number1. / returns the quotient of result. % returns the remainder of result.

Here is simple examples.

Execution example:

select Entries --filter 'n_likes == 26 / 7'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         4,
#         "Good-bye Senna",
#         "I migrated all Senna system!",
#         3
#       ],
#       [
#         5,
#         "Good-bye Tritonn",
#         "I also migrated all Tritonn system!",
#         3
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 3 (= 26 / 7).

Execution example:

select Entries --filter 'n_likes == 26 % 7'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 5 (= 26 % 7).

7.13.2.7.2. Logical operators#

Here are logical operators.

7.13.2.7.2.1. Logical NOT operator#

Its syntax is !condition.

The operator inverts boolean value of condition.

Here is a simple example:

Execution example:

select Entries --filter '!(n_likes == 5)'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         4
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         4,
#         "Good-bye Senna",
#         "I migrated all Senna system!",
#         3
#       ],
#       [
#         5,
#         "Good-bye Tritonn",
#         "I also migrated all Tritonn system!",
#         3
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is not equal to 5.

7.13.2.7.2.2. Logical AND operator#

Its syntax is condition1 && condition2.

The operator returns true if both of condition1 and condition2 are true, false otherwise.

Here is a simple example:

Execution example:

select Entries --filter 'content @ "fast" && n_likes >= 10'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15
#       ]
#     ]
#   ]
# ]

The expression matches records that content column value has the word fast and n_likes column value is greater or equal to 10.

7.13.2.7.2.3. Logical OR operator#

Its syntax is condition1 || condition2.

The operator returns true if either condition1 or condition2 is true, false otherwise.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes == 5 || n_likes == 10'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 5 or 10.

7.13.2.7.2.4. Logical AND NOT operator#

Its syntax is condition1 &! condition2.

The operator returns true if condition1 is true but condition2 is false, false otherwise. It returns difference set.

Here is a simple example:

Execution example:

select Entries --filter 'content @ "fast" &! content @ "mroonga"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10
#       ]
#     ]
#   ]
# ]

The expression matches records that content column value has the word fast but doesn’t have the word mroonga.

7.13.2.7.3. Bitwise operators#

Here are bitwise operators.

7.13.2.7.3.1. Bitwise NOT operator#

Its syntax is ~number.

The operator returns bitwise NOT of number.

Here is a simple example:

Execution example:

select Entries --filter '~n_likes == -6'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 5 because bitwise NOT of 5 is equal to -6.

7.13.2.7.3.2. Bitwise AND operator#

Its syntax is number1 & number2.

The operator returns bitwise AND between number1 and number2.

Here is a simple example:

Execution example:

select Entries --filter '(n_likes & 1) == 1'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         4
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         4,
#         "Good-bye Senna",
#         "I migrated all Senna system!",
#         3
#       ],
#       [
#         5,
#         "Good-bye Tritonn",
#         "I also migrated all Tritonn system!",
#         3
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is even number because bitwise AND between an even number and 1 is equal to 1 and bitwise AND between an odd number and 1 is equal to 0.

7.13.2.7.4. Bitwise OR operator#

Its syntax is number1 | number2.

The operator returns bitwise OR between number1 and number2.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes == (1 | 4)'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 5 (= 1 | 4).

7.13.2.7.5. Bitwise XOR operator#

Its syntax is number1 ^ number2.

The operator returns bitwise XOR between number1 and number2.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes == (10 ^ 15)'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 5 (= 10 ^ 15).

7.13.2.7.6. Shift operators#

Here are shift operators.

7.13.2.7.6.1. Left shift operator#

Its syntax is number1 << number2.

The operator performs a bitwise left shift operation on number1 by number2.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes == (5 << 1)'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 10 (= 5 << 1).

7.13.2.7.6.2. Signed right shift operator#

Its syntax is number1 >> number2.

The operator shifts bits of number1 to right by number2. The sign of the result is the same as number1.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes == -(-10 >> 1)'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 5 (= -(-10 >> 1) = -(-5)).

7.13.2.7.6.3. Unsigned right shift operator#

Its syntax is number1 >>> number2.

The operator shifts bits of number1 to right by number2. The leftmost number2 bits are filled by 0.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes == (2147483648 - (-10 >>> 1))'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 5 (= 2147483648 - (-10 >>> 1) = 2147483648 - 2147483643).

7.13.2.7.7. Comparison operators#

Here are comparison operators.

7.13.2.7.7.1. Equal operator#

Its syntax is object1 == object2.

The operator returns true if object1 equals to object2, false otherwise.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes == 5'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is equal to 5.

7.13.2.7.7.2. Not equal operator#

Its syntax is object1 != object2.

The operator returns true if object1 does not equal to object2, false otherwise.

Here is a simple example:

Execution example:

select Entries --filter 'n_likes != 5'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         4
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         4,
#         "Good-bye Senna",
#         "I migrated all Senna system!",
#         3
#       ],
#       [
#         5,
#         "Good-bye Tritonn",
#         "I also migrated all Tritonn system!",
#         3
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15
#       ]
#     ]
#   ]
# ]

The expression matches records that n_likes column value is not equal to 5.

7.13.2.7.7.3. Less than operator#

TODO: …

7.13.2.7.7.4. Less than or equal to operator#

TODO: …

7.13.2.7.7.5. Greater than operator#

TODO: …

7.13.2.7.7.6. Greater than or equal to operator#

TODO: …

7.13.2.8. Assignment operators#

7.13.2.8.1. Addition assignment operator#

Its syntax is column1 += column2.

The operator performs addition assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score += n_likes'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "Good-bye Senna",
#         3,
#         4
#       ],
#       [
#         "Good-bye Tritonn",
#         3,
#         4
#       ],
#       [
#         "Groonga",
#         10,
#         11
#       ],
#       [
#         "Mroonga",
#         15,
#         16
#       ],
#       [
#         "The first post!",
#         5,
#         6
#       ]
#     ]
#   ]
# ]

The value of _score by --filter is always 1 in this case, then performs addition assignment operation such as ‘_score = _score + n_likes’ for each records.

For example, the value of _score about the record which stores “Good-bye Senna” as the _key is 3.

So the expression 1 + 3 is evaluated and stored to _score column as the execution result.

7.13.2.8.2. Subtraction assignment operator#

Its syntax is column1 -= column2.

The operator performs subtraction assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score -= n_likes'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "Good-bye Senna",
#         3,
#         -2
#       ],
#       [
#         "Good-bye Tritonn",
#         3,
#         -2
#       ],
#       [
#         "Groonga",
#         10,
#         -9
#       ],
#       [
#         "Mroonga",
#         15,
#         -14
#       ],
#       [
#         "The first post!",
#         5,
#         -4
#       ]
#     ]
#   ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as ‘_score = _score - n_likes’ for each records.

For example, the value of _score about the record which stores “Good-bye Senna” as the _key is 3.

So the expression 1 - 3 is evaluated and stored to _score column as the execution result.

7.13.2.8.3. Multiplication assignment operator#

Its syntax is column1 *= column2.

The operator performs multiplication assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score *= n_likes'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "Good-bye Senna",
#         3,
#         3
#       ],
#       [
#         "Good-bye Tritonn",
#         3,
#         3
#       ],
#       [
#         "Groonga",
#         10,
#         10
#       ],
#       [
#         "Mroonga",
#         15,
#         15
#       ],
#       [
#         "The first post!",
#         5,
#         5
#       ]
#     ]
#   ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as ‘_score = _score * n_likes’ for each records.

For example, the value of _score about the record which stores “Good-bye Senna” as the _key is 3.

So the expression 1 * 3 is evaluated and stored to _score column as the execution result.

7.13.2.8.4. Division assignment operator#

Its syntax is column1 /= column2.

The operator performs division assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score /= n_likes'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "Good-bye Senna",
#         3,
#         0
#       ],
#       [
#         "Good-bye Tritonn",
#         3,
#         0
#       ],
#       [
#         "Groonga",
#         10,
#         0
#       ],
#       [
#         "Mroonga",
#         15,
#         0
#       ],
#       [
#         "The first post!",
#         5,
#         0
#       ]
#     ]
#   ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as ‘_score = _score / n_likes’ for each records.

For example, the value of _score about the record which stores “Good-bye Senna” as the _key is 3.

So the expression 1 / 3 is evaluated and stored to _score column as the execution result.

7.13.2.8.5. Modulo assignment operator#

Its syntax is column1 %= column2.

The operator performs modulo assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score %= n_likes'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "Good-bye Senna",
#         3,
#         1
#       ],
#       [
#         "Good-bye Tritonn",
#         3,
#         1
#       ],
#       [
#         "Groonga",
#         10,
#         1
#       ],
#       [
#         "Mroonga",
#         15,
#         1
#       ],
#       [
#         "The first post!",
#         5,
#         1
#       ]
#     ]
#   ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as ‘_score = _score % n_likes’ for each records.

For example, the value of _score about the record which stores “Good-bye Senna” as the _key is 3.

So the expression 1 % 3 is evaluated and stored to _score column as the execution result.

7.13.2.8.6. Bitwise left shift assignment operator#

Its syntax is column1 <<= column2.

The operator performs left shift assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score <<= n_likes'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "Good-bye Senna",
#         3,
#         8
#       ],
#       [
#         "Good-bye Tritonn",
#         3,
#         8
#       ],
#       [
#         "Groonga",
#         10,
#         1024
#       ],
#       [
#         "Mroonga",
#         15,
#         32768
#       ],
#       [
#         "The first post!",
#         5,
#         32
#       ]
#     ]
#   ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as ‘_score = _score << n_likes’ for each records.

For example, the value of _score about the record which stores “Good-bye Senna” as the _key is 3.

So the expression 1 << 3 is evaluated and stored to _score column as the execution result.

7.13.2.8.7. Bitwise signed right shift assignment operator#

Its syntax is column2 >>= column2.

The operator performs signed right shift assignment operation on column1 by column2.

7.13.2.8.8. Bitwise unsigned right shift assignment operator#

Its syntax is column1 >>>= column2.

The operator performs unsigned right shift assignment operation on column1 by column2.

7.13.2.8.9. Bitwise AND assignment operator#

Its syntax is column1 &= column2.

The operator performs bitwise AND assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score &= n_likes'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "Good-bye Senna",
#         3,
#         1
#       ],
#       [
#         "Good-bye Tritonn",
#         3,
#         1
#       ],
#       [
#         "Groonga",
#         10,
#         0
#       ],
#       [
#         "Mroonga",
#         15,
#         1
#       ],
#       [
#         "The first post!",
#         5,
#         1
#       ]
#     ]
#   ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as ‘_score = _score & n_likes’ for each records.

For example, the value of _score about the record which stores “Groonga” as the _key is 10.

So the expression 1 & 10 is evaluated and stored to _score column as the execution result.

7.13.2.8.10. Bitwise OR assignment operator#

Its syntax is column1 |= column2.

The operator performs bitwise OR assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score |= n_likes'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "Good-bye Senna",
#         3,
#         3
#       ],
#       [
#         "Good-bye Tritonn",
#         3,
#         3
#       ],
#       [
#         "Groonga",
#         10,
#         11
#       ],
#       [
#         "Mroonga",
#         15,
#         15
#       ],
#       [
#         "The first post!",
#         5,
#         5
#       ]
#     ]
#   ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as ‘_score = _score | n_likes’ for each records.

For example, the value of _score about the record which stores “Groonga” as the _key is 10.

So the expression 1 | 10 is evaluated and stored to _score column as the execution result.

7.13.2.8.11. Bitwise XOR assignment operator#

Its syntax is column1 ^= column2.

The operator performs bitwise XOR assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score ^= n_likes'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         5
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "Good-bye Senna",
#         3,
#         2
#       ],
#       [
#         "Good-bye Tritonn",
#         3,
#         2
#       ],
#       [
#         "Groonga",
#         10,
#         11
#       ],
#       [
#         "Mroonga",
#         15,
#         14
#       ],
#       [
#         "The first post!",
#         5,
#         4
#       ]
#     ]
#   ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction assignment operation such as ‘_score = _score ^ n_likes’ for each records.

For example, the value of _score about the record which stores “Good-bye Senna” as the _key is 3.

So the expression 1 ^ 3 is evaluated and stored to _score column as the execution result.

7.13.2.9. Original operators#

Script syntax adds the original binary opearators to ECMAScript syntax. They operate search specific operations. They are starts with @ or *.

7.13.2.9.1. Match operator#

Its syntax is column @ value.

The operator searches value by inverted index of column. Normally, full text search is operated but tag search can be operated. Because tag search is also implemented by inverted index.

Query syntax uses this operator by default.

Here is a simple example:

Execution example:

select Entries --filter 'content @ "fast"' --output_columns content
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "content",
#           "Text"
#         ]
#       ],
#       [
#         "I started to use Groonga. It's very fast!"
#       ],
#       [
#         "I also started to use Mroonga. It's also very fast! Really fast!"
#       ]
#     ]
#   ]
# ]

The expression matches records that contain a word fast in content column value.

7.13.2.9.2. Prefix search operator#

Its syntax is column @^ value.

The operator does prefix search with value. Prefix search searches records that contain a word that starts with value.

You can use fast prefix search against a column. The column must be indexed and index table must be patricia trie table (TABLE_PAT_KEY) or double array trie table (TABLE_DAT_KEY). You can also use fast prefix search against _key pseudo column of patricia trie table or double array trie table. You don’t need to index _key.

Prefix search can be used with other table types but it causes all records scan. It’s not problem for small records but it spends more time for large records.

Here is a simple example:

Execution example:

select Entries --filter '_key @^ "Goo"' --output_columns _key
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ]
#       ],
#       [
#         "Good-bye Senna"
#       ],
#       [
#         "Good-bye Tritonn"
#       ]
#     ]
#   ]
# ]

The expression matches records that contain a word that starts with Goo in _key pseudo column value. Good-bye Senna and Good-bye Tritonn are matched with the expression.

7.13.2.9.3. Suffix search operator#

Its syntax is column @$ value.

This operator does suffix search with value. Suffix search searches records that contain a word that ends with value.

You can use fast suffix search against a column. The column must be indexed and index table must be patricia trie table (TABLE_PAT_KEY) with KEY_WITH_SIS flag. You can also use fast suffix search against _key pseudo column of patricia trie table (TABLE_PAT_KEY) with KEY_WITH_SIS flag. You don’t need to index _key. We recommended that you use index column based fast suffix search instead of _key based fast suffix search. _key based fast suffix search returns automatically registered substrings. (TODO: write document about suffix search and link to it from here.)

Note

Fast suffix search can be used only for non-ASCII characters such as hiragana in Japanese. You cannot use fast suffix search for ASCII character.

Suffix search can be used with other table types or patricia trie table without KEY_WITH_SIS flag but it causes all records scan. It’s not problem for small records but it spends more time for large records.

Here is a simple example. It uses fast suffix search for hiragana in Japanese that is one of non-ASCII characters.

Execution example:

table_create Titles TABLE_NO_KEY
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Titles content COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
table_create SuffixSearchTerms TABLE_PAT_KEY|KEY_WITH_SIS ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create SuffixSearchTerms index COLUMN_INDEX Titles content
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Titles
[
{"content": "ぐるんが"},
{"content": "むるんが"},
{"content": "せな"},
{"content": "とりとん"}
]
# [[0,1337566253.89858,0.000355720520019531],4]
select Titles --query 'content:$んが'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "content",
#           "ShortText"
#         ]
#       ],
#       [
#         2,
#         "むるんが"
#       ],
#       [
#         1,
#         "ぐるんが"
#       ]
#     ]
#   ]
# ]

The expression matches records that have value that ends with んが in content column value. ぐるんが and むるんが are matched with the expression.

7.13.2.9.4. Near search operator#

Its syntax is one of them:

column *N "word1 word2 ..."
column *N${MAX_INTERVAL} "word1 word2 ..."
column *N${MAX_INTERVAL},${MAX_TOKEN_INTERVAL_1}|${MAX_TOKEN_INTERVAL_2}|... "word1 word2 ..."

Here are the examples of the second form:

column *N29 "word1 word2 ..."
column *N-1 "word1 word2 ..."

The first example means that 29 is used for the max interval.

The second example means that -1 is used for the max interval. -1 max interval means no limit.

Here are examples of the third form:

column *N10,2|3 "word1 word2 word3"
column *N10,2 "word1 word2 word3"

The first example means that 2 is used for the max interval of the first interval and 3 is used for the max interval of the second interval.

The second example means that 2 is used for the first max interval of the first interval and -1 is used for the max interval of the second interval. Because the omitted max interval is treated as -1.

The max intervals of each token (word) are described later.

The operator does near search with words word1 word2 .... Near search searches records that contain the words and the words are appeared in the specified order and the max interval.

The max interval is 10 by default. The unit of the max interval is the number of characters in N-gram family tokenizers and the number of words in morphological analysis family tokenizers.

However, TokenBigram doesn’t split ASCII only word into tokens. Because TokenBigram uses white-space-separate like tokenize method for ASCII characters in this case.

So the unit for ASCII words with TokenBigram is the number of words even if TokenBigram is a N-gram family tokenizer.

Note that an index column for full text search must be defined for column.

Here is a simple example:

Execution example:

select Entries --filter 'content *N "I fast"'      --output_columns content
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "content",
#           "Text"
#         ]
#       ],
#       [
#         "I started to use Groonga. It's very fast!"
#       ]
#     ]
#   ]
# ]
select Entries --filter 'content *N "I Really"'    --output_columns content
# [[0,1337566253.89858,0.000355720520019531],[[[0],[["content","Text"]]]]]
select Entries --filter 'content *N "also Really"' --output_columns content
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "content",
#           "Text"
#         ]
#       ],
#       [
#         "I also started to use Mroonga. It's also very fast! Really fast!"
#       ]
#     ]
#   ]
# ]

The first expression matches records that contain I and fast and the max interval of those words are in 10 words. So the record that its content is I started to use Groonga. It's very fast! is matched. The number of words between I and fast is 7.

The second expression matches records that contain I and Really and the max interval of those words are in 10 words. So the record that its content is I also started to use mroonga. It's also very fast! Really fast! is not matched. The number of words between I and Really is 11.

The third expression matches records that contain also and Really and the max interval of those words are in 10 words. So the record that its content is I also st arted to use mroonga. It's also very fast! Really fast! is matched. The number of words between also and Really is 10.

New in version 12.0.1: The max intervals of each token.

You can specify the max intervals of each token. The default is no limit. It means that all intervals of each token are valid as long as the max interval is satisfied.

Here is an example that use 2 for the max interval of the first interval and 4 for the max interval of the second interval:

content *N10,2|4 "a b c"

10 is the max interval.

| is the separator of the max intervals of each token.

This matches a x b x x x c. But this doesn’t match a x x b c, a b x x x x c and so on because the former has 3 interval for the first interval that is larger than 2 and the latter has 5 interval for the second interval that is later than 4.

Here is an example that specifies the max intervals of each token:

Execution example:

select Entries --filter 'content *N11,5|3 "first welcome post"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]
select Entries --filter 'content *N11,4|3 "first welcome post"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         0
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ]
#     ]
#   ]
# ]

You can omit one or more intervals. Omitted intervals are treated as -1. It means that *N11,5 equals *N11,5|-1. -1 means that no limit.

Here is an example that omits an interval:

Execution example:

select Entries --filter 'content *N11,5 "first welcome post"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]
select Entries --filter 'content *N11,5|-1 "first welcome post"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]

You can specify extra intervals. They are just ignored:

Execution example:

select Entries --filter 'content *N11,5|6|1|2|3 "first welcome post"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]
select Entries --filter 'content *N11,5|6 "first welcome post"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         1,
#         "The first post!",
#         "Welcome! This is my first post!",
#         5
#       ]
#     ]
#   ]
# ]

7.13.2.9.5. Near phrase search operator#

Its syntax is one of them:

column *NP "phrase1 phrase2 ..."
column *NP${MAX_INTERVAL} "phrase1 phrase2 ..."
column *NP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL} "phrase1 phrase2 ..."
column *NP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL},${MAX_PHRASE_INTERVAL_1}|${MAX_PHRASE_INTERVAL_2}|... "phrase1 phrase2 ..."

Here are examples of the second form:

column *NP29 "phrase1 phrase2 ..."
column *NP-1 "phrase1 phrase2 ..."

The first example means that 29 is used for the max interval.

The second example means that -1 is used for the max interval.

The max interval is described later.

Here are examples of the third form:

column *NP10,29 "phrase1 phrase2 ..."
column *NP10,-1 "phrase1 phrase2 ..."

The first example means that 29 is used for the additional last interval.

The second example means that -1 is used for the additional last interval.

The additional last interval is described later.

New in version 12.0.1: The max intervals of each phrase.

Here are examples of the forth form:

column *NP10,0,2|3 "phrase1 phrase2 phrase3"
column *NP10,0,2 "phrase1 phrase2 phrase3"

The first example means that 2 is used for the max interval of the first interval and 3 is used for the max interval of the second interval.

The second example means that 2 is used for the first max interval of the first interval and -1 is used for the max interval of the second interval. Because the omitted max interval is treated as -1.

See Near phrase search operator for the max intervals of each phrase.

The operator does near phrase search with phrases phrase1 phrase2 .... Near phrase search searches records that contain the phrases and the phrases are appeared in the specified order and the max interval.

The max interval is 10 by default. The unit of the max interval is the number of characters in N-gram family tokenizers and the number of words in morphological analysis family tokenizers.

However, TokenBigram doesn’t split ASCII only word into tokens. Because TokenBigram uses white-space-separate like tokenize method for ASCII characters in this case.

So the unit for ASCII words with TokenBigram is the number of words even if TokenBigram is a N-gram family tokenizer.

Note that an index column for full text search must be defined for column.

TODO: Use index that has TokenNgram("unify_alphabet", false) tokenizer to show difference with near search with English text.

Here is a simple example:

Execution example:

select Entries --filter 'content *NP "I fast"'      --output_columns content
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "content",
#           "Text"
#         ]
#       ],
#       [
#         "I started to use Groonga. It's very fast!"
#       ]
#     ]
#   ]
# ]
select Entries --filter 'content *NP "I Really"'    --output_columns content
# [[0,1337566253.89858,0.000355720520019531],[[[0],[["content","Text"]]]]]
select Entries --filter 'content *NP "also Really"' --output_columns content
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "content",
#           "Text"
#         ]
#       ],
#       [
#         "I also started to use Mroonga. It's also very fast! Really fast!"
#       ]
#     ]
#   ]
# ]

The first expression matches records that contain I and fast and the max interval of those words are in 10 words. So the record that its content is I also started to use mroonga. It's also very fast! ... is matched. The number of words between I and fast is just 10.

The second expression matches records that contain I and Really and the max interval of those words are in 10 words. So the record that its content is I also started to use mroonga. It's also very fast! Really fast! is not matched. The number of words between I and Really is 14.

The third expression matches records that contain also and Really and the max interval of those words are in 10 words. So the record that its content is I also st arted to use mroonga. It's also very fast! Really fast! is matched. The number of words between also and Really is 10.

Here is an example to use the custom max interval:

Execution example:

select Entries --filter 'content *NP14 "I Really"' --output_columns content
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "content",
#           "Text"
#         ]
#       ],
#       [
#         "I also started to use Mroonga. It's also very fast! Really fast!"
#       ]
#     ]
#   ]
# ]
select Entries --filter 'content *NP-1 "I Really"' --output_columns content
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "content",
#           "Text"
#         ]
#       ],
#       [
#         "I also started to use Mroonga. It's also very fast! Really fast!"
#       ]
#     ]
#   ]
# ]

The first expression matches I also started to use mroonga. It's also very fast! Really fast! because the number of words between I and Really is 14.

The second expression also matches I also started to use mroonga. It's also very fast! Really fast! because -1 means that there is no limitation the number of words between I and Really.

You can use additional interval only for the last phrase. It means that you can accept more distance only between the second to last phrase and the last phrase. This is useful for implementing a near phrase search in the same sentence. If you specify . (sentence end phrase) as the last phrase and specify -1 as the additional last interval, the other specified phrases must be appeared before .. You must append $ to the last phrase like .$.

Here is an example that uses -1 as the additional last interval of the given phrases:

column *NP10,-1 "a b .$"

Here is an example to customize the additional last interval of the given phrases:

Execution example:

select Entries --filter 'content *NP1,-1 "I started .$"' --output_columns content
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "content",
#           "Text"
#         ]
#       ],
#       [
#         "I started to use Groonga. It's very fast!"
#       ],
#       [
#         "I also started to use Mroonga. It's also very fast! Really fast!"
#       ]
#     ]
#   ]
# ]

You can also use positive number for the additional last interval. If you specify positive number as the additional last interval, all of the following conditions must be satisfied:

  1. The interval between the first phrase and the second to last phrase is less than or equals to the max interval.

  2. The interval between the first phrase and the last phrase is less than or equals to the max interval + the additional last interval.

If you specify negative number as the additional last interval, the second condition isn’t required. Appearing the last phrase is just needed.

Here is an example to use positive number as the additional last interval:

Execution example:

select Entries --filter 'content *NP1,4 "I started .$"' --output_columns content
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "content",
#           "Text"
#         ]
#       ],
#       [
#         "I started to use Groonga. It's very fast!"
#       ],
#       [
#         "I also started to use Mroonga. It's also very fast! Really fast!"
#       ]
#     ]
#   ]
# ]

7.13.2.9.6. Near phrase product search operator#

New in version 11.1.1.

Its syntax is one of them:

column *NPP "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *NPP${MAX_INTERVAL} "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *NPP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL} "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *NPP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL},${MAX_PHRASE_INTERVAL_1}|${MAX_PHRASE_INTERVAL_2}|... "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."

Here are examples of the second form:

column *NPP29 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *NPP-1 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."

The first example means that 29 is used for the max interval.

The second example means that -1 is used for the max interval.

Here are examples of the third form:

column *NPP10,29 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *NPP10,-1 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."

The first example means that 29 is used for the additional last interval.

The second example means that -1 is used for the additional last interval.

New in version 12.0.1: The max intervals of each phrase.

Here are examples of the forth form:

column *NPP10,0,2|3 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) (phrase3-1 phrase3-2 ...)"
column *NPP10,0,2 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) (phrase3-1 phrase3-2 ...)"

The first example means that 2 is used for the max interval of the first interval and 3 is used for the max interval of the second interval.

The second example means that 2 is used for the first max interval of the first interval and -1 is used for the max interval of the second interval. Because the omitted max interval is treated as -1.

See Near phrase search operator for the max intervals of each phrase.

This operator does multiple Near phrase search operator. Phrases for each Near phrase search operator are computed as product of {phrase1_1, phrase1_2, ...}, {phrase2_1, phrase2_2, ...} and .... For example, column *NPP "(a b c) (d e)" uses the following phrases for near phrase searches:

  • a d

  • a e

  • b d

  • b e

  • c d

  • c e

Here is a simple example:

Execution example:

select Entries \
  --filter 'content *NPP "(I It) (migrated fast)"' \
  --output_columns content
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         4
#       ],
#       [
#         [
#           "content",
#           "Text"
#         ]
#       ],
#       [
#         "I started to use Groonga. It's very fast!"
#       ],
#       [
#         "I also started to use Mroonga. It's also very fast! Really fast!"
#       ],
#       [
#         "I migrated all Senna system!"
#       ],
#       [
#         "I also migrated all Tritonn system!"
#       ]
#     ]
#   ]
# ]

You can use the all features of Near phrase search operator such as the max interval, $ for the last phrase and the additional last interval.

Execution example:

select Entries \
  --filter 'content *NPP2,-1 "(I It) (migrated fast) (.$)"' \
  --output_columns content
# [[0,1337566253.89858,0.000355720520019531],[[[0],[["content","Text"]]]]]

This is more effective than multiple Near phrase search operator .

7.13.2.9.7. Ordered near phrase search operator#

New in version 11.0.9.

Its syntax is one of them:

column *ONP "phrase1 phrase2 ..."
column *ONP${MAX_INTERVAL} "phrase1 phrase2 ..."
column *ONP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL} "phrase1 phrase2 ..."
column *ONP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL},${MAX_PHRASE_INTERVAL_1}|${MAX_PHRASE_INTERVAL_2}|... "phrase1 phrase2 ..."

Here are examples of the second form:

column *ONP29 "phrase1 phrase2 ..."
column *ONP-1 "phrase1 phrase2 ..."

The first example means that 29 is used for the max interval.

The second example means that -1 is used for the max interval.

Here are examples of the third form:

column *ONP10,29 "phrase1 phrase2 ..."
column *ONP10,-1 "phrase1 phrase2 ..."

The first example means that 29 is used for the additional last interval.

The second example means that -1 is used for the additional last interval.

New in version 12.0.1: The max intervals of each phrase.

Here are examples of the forth form:

column *ONP10,0,2|3 "phrase1 phrase2 phrase3"
column *ONP10,0,2 "phrase1 phrase2 phrase3"

The first example means that 2 is used for the max interval of the first interval and 3 is used for the max interval of the second interval.

The second example means that 2 is used for the first max interval of the first interval and -1 is used for the max interval of the second interval. Because the omitted max interval is treated as -1.

See Near phrase search operator for the max intervals of each phrase.

This operator does ordered near phrase search with phrase1, phrase2 and .... Ordered near phrase search is similar to Near phrase search operator but ordered near phrase search checks phrases order. For example, column *ONP "groonga mroonga pgroonga" matches groonga mroonga rroonga pgroonga but doesn’t match groonga rroonga pgroonga mroonga. Because the latter uses different order.

Here is a simple example:

Execution example:

select Entries \
  --filter 'content *ONP "I Groonga"' \
  --output_columns content
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "content",
#           "Text"
#         ]
#       ],
#       [
#         "I started to use Groonga. It's very fast!"
#       ]
#     ]
#   ]
# ]
select Entries \
  --filter 'content *ONP "Groonga I"' \
  --output_columns content
# [[0,1337566253.89858,0.000355720520019531],[[[0],[["content","Text"]]]]]

You can use the all features of Near phrase search operator such as the max interval and the additional last interval. But you don’t need to specify $ for the last phrase because the last phrase in query is the last phrase.

7.13.2.9.8. Ordered near phrase product search operator#

New in version 11.1.1.

Its syntax is one of them:

column *ONPP "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *ONPP${MAX_INTERVAL} "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *ONPP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL} "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *ONPP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL},${MAX_PHRASE_INTERVAL_1}|${MAX_PHRASE_INTERVAL_2}|... "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."

Here are examples of the second form:

column *ONPP29 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *ONPP-1 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."

The first example means that 29 is used for the max interval.

The second example means that -1 is used for the max interval.

Here are examples of the third form:

column *ONPP10,29 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."
column *ONPP10,-1 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) ..."

The first example means that 29 is used for the additional last interval.

The second example means that -1 is used for the additional last interval.

New in version 12.0.1: The max intervals of each phrase.

Here are examples of the forth form:

column *ONPP10,0,2|3 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) (phrase3-1 phrase3-2 ...)"
column *ONPP10,0,2 "(phrase1-1 phrase1-2 ...) (phrase2-1 phrase2-2 ...) (phrase3-1 phrase3-2 ...)"

The first example means that 2 is used for the max interval of the first interval and 3 is used for the max interval of the second interval.

The second example means that 2 is used for the first max interval of the first interval and -1 is used for the max interval of the second interval. Because the omitted max interval is treated as -1.

See Near phrase search operator for the max intervals of each phrase.

This operator does ordered near phrase product search. Ordered near phrase product search is similar to Near phrase product search operator but ordered near phrase product search checks phrases order like Ordered near phrase search operator. For example, column *ONPP "(a b c) (d e)" matches a 1 d but doesn’t match d 1 a. Because the latter uses different order.

Here is a simple example:

Execution example:

select Entries \
  --filter 'content *ONPP "(I It) (migrated fast) (.)"' \
  --output_columns content
# [[0,1337566253.89858,0.000355720520019531],[[[0],[["content","Text"]]]]]

You can use the all features of Near phrase search operator such as the max interval and the additional last interval. But you don’t need to specify $ for the last phrase because the last phrase in query is the last phrase.

7.13.2.9.9. Similar search operator#

Its syntax is column *S "document".

The operator does similar search with document document. Similar search searches records that have similar content to document.

Note that an index column for full text search must be defined for column.

Here is a simple example:

Execution example:

select Entries --filter 'content *S "I migrated all Solr system!"' --output_columns content
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "content",
#           "Text"
#         ]
#       ],
#       [
#         "I migrated all Senna system!"
#       ],
#       [
#         "I also migrated all Tritonn system!"
#       ]
#     ]
#   ]
# ]

The expression matches records that have similar content to I migrated all Solr system!. In this case, records that have I migrated all XXX system! content are matched.

You should use TokenMecab tokenizer for similar search against Japanese documents. Because TokenMecab will tokenize target documents to almost words, it improves similar search precision.

7.13.2.9.10. Term extract operator#

Its syntax is _key *T "document".

The operator extracts terms from document. Terms must be registered as keys of the table of _key.

Note that the table must be patricia trie (TABLE_PAT_KEY) or double array trie (TABLE_DAT_KEY). You can’t use hash table (TABLE_HASH_KEY) and array (TABLE_NO_KEY) because they don’t support longest common prefix search. Longest common prefix search is used to implement the operator.

Here is a simple example:

Execution example:

table_create Words TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Words
[
{"_key": "groonga"},
{"_key": "mroonga"},
{"_key": "Senna"},
{"_key": "Tritonn"}
]
# [[0,1337566253.89858,0.000355720520019531],4]
select Words --filter '_key *T "Groonga is the successor project to Senna."' --output_columns _key
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ]
#       ],
#       [
#         "groonga"
#       ],
#       [
#         "senna"
#       ]
#     ]
#   ]
# ]

The expression extrcts terms that included in document Groonga is the successor project to Senna.. In this case, NormalizerAuto normalizer is specified to Words. So Groonga can be extracted even if it is loaded as groonga into Words. All of extracted terms are also normalized.

7.13.2.9.11. Regular expression operator#

New in version 5.0.1.

Its syntax is column @~ "pattern".

The operator searches records by the regular expression pattern. If a record’s column value is matched to pattern, the record is matched.

pattern must be valid regular expression syntax. See Regular expression about regular expression syntax details.

The following example uses .roonga as pattern. It matches Groonga, Mroonga and so on.

Execution example:

select Entries --filter 'content @~ ".roonga"'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "content",
#           "Text"
#         ],
#         [
#           "n_likes",
#           "UInt32"
#         ]
#       ],
#       [
#         2,
#         "Groonga",
#         "I started to use Groonga. It's very fast!",
#         10
#       ],
#       [
#         3,
#         "Mroonga",
#         "I also started to use Mroonga. It's also very fast! Really fast!",
#         15
#       ]
#     ]
#   ]
# ]

In most cases, regular expression is evaluated sequentially. So it may be slow against many records.

In some cases, Groonga evaluates regular expression by index. It’s very fast. See Regular expression for details.