7.3.65. table_tokenize# Summary#

table_tokenize command tokenizes text by the specified table’s tokenizer. Syntax#

This command takes many parameters.

table and string are required parameters. Others are optional:

table_tokenize table
               [index_column=null] Usage#

Here is a simple example.

Execution example:

plugin_register token_filters/stop_word
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText   --default_tokenizer TokenBigram   --normalizer NormalizerAuto   --token_filters TokenFilterStopWord
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Terms
{"_key": "and", "is_stop_word": true}
# [[0, 1337566253.89858, 0.000355720520019531], 1]
table_tokenize Terms "Hello and Good-bye" --mode GET
# [[0, 1337566253.89858, 0.000355720520019531], []]

Terms table is set TokenBigram tokenizer, NormalizerAuto normalizer, TokenFilterStopWord token filter. It returns tokens that is generated by tokenizeing "Hello and Good-bye" with TokenBigram tokenizer. It is normalized by NormalizerAuto normalizer. and token is removed with TokenFilterStopWord token filter. Parameters#

This section describes all parameters. Parameters are categorized. Required parameters#

There are required parameters, table and string. table#

Specifies the lexicon table. table_tokenize command uses the tokenizer, the normalizer, the token filters that is set the lexicon table. string#

Specifies any string which you want to tokenize.

See string option in tokenize about details. Optional parameters#

There are optional parameters. flags#

Specifies a tokenization customize options. You can specify multiple options separated by “|”.

The default value is NONE.

See flags option in tokenize about details. mode#

Specifies a tokenize mode.

The default value is GET.

See mode option in tokenize about details. index_column#

Specifies an index column.

Return value includes estimated_size of the index.

The estimated_size is useful for checking estimated frequency of tokens. Return value#

table_tokenize command returns tokenized tokens.

See Return value option in tokenize about details. See also#