7.3.35. logical_range_filter#

7.3.35.1. Summary#

Added in version 5.0.0.

logical_range_filter is a sharding version of range_filter. logical_range_filter searches records from multiple tables and outputs them.

logical_range_filter is similar to logical_select. Both of them searches records from multiples tables and outputs them. logical_range_filter stops searching when the number of matched records is requested the number of records. logical_select searches all records and outputs only needed records.

logical_range_filter has performance merit but some restrictions.

If many records are matched and requested records are small, logical_range_filter will be faster than logical_select.

logical_range_filter doesn’t support drilldown because drilldown requires all matched records. logical_range_filter may not find all matched records. So logical_range_filter doesn’t support drilldown.

logical_range_filter doesn’t return the number of all matched records. Because logical_range_filter may not search all matched records.

You need to plugin_register sharding plugin because this command is included in sharding plugin.

7.3.35.2. Syntax#

This command takes many parameters.

The required parameters are logical_table and shard_key:

logical_range_filter
  logical_table
  shard_key
  [min=null]
  [min_border="include"]
  [max=null]
  [max_border="include"]
  [order="ascending"]
  [filter=null]
  [offset=0]
  [limit=10]
  [output_columns="_key, *"]
  [use_range_index=null]
  [post_filter=null]
  [sort_keys=null]

There are some parameters that can be only used as named parameters. You can’t use these parameters as ordered parameters. You must specify parameter name.

Here are parameters that can be only used as named parameters:

  • cache=no

Added in version 7.0.9: This command has the following named parameters for dynamic columns:

  • columns[${NAME}].stage=null

  • columns[${NAME}].flags=COLUMN_SCALAR

  • columns[${NAME}].type=null

  • columns[${NAME}].value=null

  • columns[${NAME}].window.sort_keys=null

  • columns[${NAME}].window.group_keys=null

You can use one or more alphabets, digits, _ for ${NAME}. For example, column1 is a valid ${NAME}. This is the same rule as normal column. See also name.

Parameters that have the same ${NAME} are grouped.

For example, the following parameters specify one dynamic column:

  • --columns[name].stage initial

  • --columns[name].type UInt32

  • --columns[name].value 29

The following parameters specify two dynamic columns:

  • --columns[name1].stage initial

  • --columns[name1].type UInt32

  • --columns[name1].value 29

  • --columns[name2].stage filtered

  • --columns[name2].type Float

  • --columns[name2].value '_score * 0.1'

7.3.35.3. Usage#

Let’s learn about usage with examples. This section shows many popular usages.

You need to register sharding plugin because this command is included in sharding plugin.

Execution example:

plugin_register sharding
# [[0,1337566253.89858,0.000355720520019531],true]

Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries_20150708 TABLE_HASH_KEY ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries_20150708 created_at COLUMN_SCALAR Time
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries_20150708 content COLUMN_SCALAR Text
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries_20150708 n_likes COLUMN_SCALAR UInt32
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries_20150708 tag COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Entries_20150709 TABLE_HASH_KEY ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries_20150709 created_at COLUMN_SCALAR Time
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries_20150709 content COLUMN_SCALAR Text
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries_20150709 n_likes COLUMN_SCALAR UInt32
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries_20150709 tag COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Terms TABLE_PAT_KEY ShortText \
  --default_tokenizer TokenBigram \
  --normalizer NormalizerAuto
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_key_index_20150708 \
  COLUMN_INDEX|WITH_POSITION Entries_20150708 _key
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_content_index_20150708 \
  COLUMN_INDEX|WITH_POSITION Entries_20150708 content
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_key_index_20150709 \
  COLUMN_INDEX|WITH_POSITION Entries_20150709 _key
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_content_index_20150709 \
  COLUMN_INDEX|WITH_POSITION Entries_20150709 content
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Entries_20150708
[
{"_key":       "The first post!",
 "created_at": "2015/07/08 00:00:00",
 "content":    "Welcome! This is my first post!",
 "n_likes":    5,
 "tag":        "Hello"},
{"_key":       "Groonga",
 "created_at": "2015/07/08 01:00:00",
 "content":    "I started to use Groonga. It's very fast!",
 "n_likes":    10,
 "tag":        "Groonga"},
{"_key":       "Mroonga",
 "created_at": "2015/07/08 02:00:00",
 "content":    "I also started to use Mroonga. It's also very fast! Really fast!",
 "n_likes":    15,
 "tag":        "Groonga"}
]
# [[0,1337566253.89858,0.000355720520019531],3]
load --table Entries_20150709
[
{"_key":       "Good-bye Senna",
 "created_at": "2015/07/09 00:00:00",
 "content":    "I migrated all Senna system!",
 "n_likes":    3,
 "tag":        "Senna"},
{"_key":       "Good-bye Tritonn",
 "created_at": "2015/07/09 01:00:00",
 "content":    "I also migrated all Tritonn system!",
 "n_likes":    3,
 "tag":        "Senna"}
]
# [[0,1337566253.89858,0.000355720520019531],2]

There are two tables, Entries_20150708 and Entries_20150709, for blog entries.

Note

You need to use ${LOGICAL_TABLE_NAME}_${YYYYMMDD} naming rule for table names. In this example, LOGICAL_TABLE_NAME is Entries and YYYYMMDD is 20150708 or 20150709.

An entry has title, created time, content, the number of likes for the entry and tag. Title is key of Entries_YYYYMMDD. Created time is value of Entries_YYYYMMDD.created_at column. Content is value of Entries_YYYYMMDD.content column. The number of likes is value of Entries_YYYYMMDD.n_likes column. Tag is value of Entries_YYYYMMDD.tag column.

Entries_YYYYMMDD._key column and Entries_YYYYMMDD.content column are indexed using TokenBigram tokenizer. So both Entries_YYYYMMDD._key and Entries_YYYYMMDD.content are fulltext search ready.

OK. The schema and data for examples are ready.

7.3.35.3.1. Simple usage#

TODO

7.3.35.4. Parameters#

This section describes parameters of logical_range_filter.

7.3.35.4.1. Required parameters#

There are required parameters, logical_table and shard_key.

7.3.35.4.1.1. logical_table#

Specifies logical table name. It means table name without _YYYYMMDD postfix. If you use actual table such as Entries_20150708, Entries_20150709 and so on, logical table name is Entries.

Execution example:

logical_range_filter --logical_table Entries --shard_key created_at
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         "_key",
#         "ShortText"
#       ],
#       [
#         "content",
#         "Text"
#       ],
#       [
#         "created_at",
#         "Time"
#       ],
#       [
#         "n_likes",
#         "UInt32"
#       ],
#       [
#         "tag",
#         "ShortText"
#       ]
#     ],
#     [
#       "The first post!",
#       "Welcome! This is my first post!",
#       1436281200.0,
#       5,
#       "Hello"
#     ],
#     [
#       "Groonga",
#       "I started to use Groonga. It's very fast!",
#       1436284800.0,
#       10,
#       "Groonga"
#     ],
#     [
#       "Mroonga",
#       "I also started to use Mroonga. It's also very fast! Really fast!",
#       1436288400.0,
#       15,
#       "Groonga"
#     ],
#     [
#       "Good-bye Senna",
#       "I migrated all Senna system!",
#       1436367600.0,
#       3,
#       "Senna"
#     ],
#     [
#       "Good-bye Tritonn",
#       "I also migrated all Tritonn system!",
#       1436371200.0,
#       3,
#       "Senna"
#     ]
#   ]
# ]

If nonexistent table is specified, an error is returned.

Execution example:

logical_range_filter --logical_table Nonexistent --shard_key created_at
# [
#   [
#     -22,
#     1337566253.89858,
#     0.000355720520019531,
#     "[logical_range_filter] no shard exists: logical_table: <Nonexistent>: shard_key: <created_at>",
#     [
#       [
#         "execute",
#         "lib/groonga/plugins/sharding/logical_range_filter.rb",
#         2929
#       ]
#     ]
#   ]
# ]

7.3.35.4.1.2. shard_key#

Specifies column name which is treated as shared key in each parted table.

TODO: Add examples

7.3.35.4.2. Optional parameters#

There are optional parameters.

7.3.35.4.2.1. min#

Specifies the min value of shard_key

TODO: Add examples

7.3.35.4.2.2. min_border#

Specifies whether the min value of borderline must be include or not. Specify include or exclude as the value of this parameter.

TODO: Add examples

7.3.35.4.2.3. max#

Specifies the max value of shard_key.

TODO: Add examples

7.3.35.4.2.4. max_border#

Specifies whether the max value of borderline must be include or not. Specify include or exclude as the value of this parameter.

TODO: Add examples

7.3.35.4.2.5. order#

Specifies order of search result. Specify ascending or descending as the value of this parameter.

If we set ascending in this parameter, search results are sorted by ascending order based on shared_key. If we set descending in this parameter, search results are sorted by descending order based on shared_key.

Execution example:

logical_range_filter --logical_table Entries --shard_key created_at --order "descending"
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         "_key",
#         "ShortText"
#       ],
#       [
#         "content",
#         "Text"
#       ],
#       [
#         "created_at",
#         "Time"
#       ],
#       [
#         "n_likes",
#         "UInt32"
#       ],
#       [
#         "tag",
#         "ShortText"
#       ]
#     ],
#     [
#       "Good-bye Tritonn",
#       "I also migrated all Tritonn system!",
#       1436371200.0,
#       3,
#       "Senna"
#     ],
#     [
#       "Good-bye Senna",
#       "I migrated all Senna system!",
#       1436367600.0,
#       3,
#       "Senna"
#     ],
#     [
#       "Mroonga",
#       "I also started to use Mroonga. It's also very fast! Really fast!",
#       1436288400.0,
#       15,
#       "Groonga"
#     ],
#     [
#       "Groonga",
#       "I started to use Groonga. It's very fast!",
#       1436284800.0,
#       10,
#       "Groonga"
#     ],
#     [
#       "The first post!",
#       "Welcome! This is my first post!",
#       1436281200.0,
#       5,
#       "Hello"
#     ]
#   ]
# ]

7.3.35.5. Return value#

The command returns a response with the following format:

[
  HEADER,
  [
    COLUMNS,
    RECORDS
  ]
]

If the command fails, error details are in HEADER.

See Output format for HEADER.

COLUMNS describes about output columns specified by output_columns. It uses the following format:

[
  [COLUMN_NAME_1, COLUMN_TYPE_1],
  [COLUMN_NAME_2, COLUMN_TYPE_2],
  ...,
  [COLUMN_NAME_N, COLUMN_TYPE_N]
]

COLUMNS includes one or more output column information. Each output column information includes the followings:

  • Column name as string

  • Column type as string or null

Column name is extracted from value specified as output_columns.

Column type is Groonga’s type name or null. It doesn’t describe whether the column value is vector or scalar. You need to determine it by whether real column value is array or not.

See Data types for type details.

null is used when column value type isn’t determined. For example, function call in output_columns such as --output_columns "snippet_html(content)" uses null.

Here is an example of COLUMNS:

[
  ["_id",     "UInt32"],
  ["_key",    "ShortText"],
  ["n_likes", "UInt32"],
]

RECORDS includes column values for each matched record. Included records are selected by offset and limit. It uses the following format:

[
  [
    RECORD_1_COLUMN_1,
    RECORD_1_COLUMN_2,
    ...,
    RECORD_1_COLUMN_N
  ],
  [
    RECORD_2_COLUMN_1,
    RECORD_2_COLUMN_2,
    ...,
    RECORD_2_COLUMN_N
  ],
  ...
  [
    RECORD_N_COLUMN_1,
    RECORD_N_COLUMN_2,
    ...,
    RECORD_N_COLUMN_N
  ]
]

Here is an example RECORDS:

[
  [
    1,
    "The first post!",
    5
  ],
  [
    2,
    "Groonga",
    10
  ],
  [
    3,
    "Mroonga",
    15
  ]
]