7.3.66. table_tokenize#

7.3.66.1. Summary#

table_tokenize command tokenizes text by the specified table’s tokenizer.

7.3.66.2. Syntax#

This command takes many parameters.

table and string are required parameters. Others are optional:

table_tokenize table
               string
               [flags=NONE]
               [mode=GET]
               [index_column=null]

7.3.66.3. Usage#

Here is a simple example.

Execution example:

plugin_register token_filters/stop_word
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Terms TABLE_PAT_KEY ShortText   --default_tokenizer TokenBigram   --normalizer NormalizerAuto   --token_filters TokenFilterStopWord
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0,1337566253.89858,0.000355720520019531],1]
table_tokenize Terms "Hello and Good-bye" --mode GET
# [[0,1337566253.89858,0.000355720520019531],[]]

Terms table is set TokenBigram tokenizer, NormalizerAuto normalizer, TokenFilterStopWord token filter. It returns tokens that is generated by tokenizeing "Hello and Good-bye" with TokenBigram tokenizer. It is normalized by NormalizerAuto normalizer. and token is removed with TokenFilterStopWord token filter.

7.3.66.4. Parameters#

This section describes all parameters. Parameters are categorized.

7.3.66.4.1. Required parameters#

There are required parameters, table and string.

7.3.66.4.1.1. table#

Specifies the lexicon table. table_tokenize command uses the tokenizer, the normalizer, the token filters that is set the lexicon table.

7.3.66.4.1.2. string#

Specifies any string which you want to tokenize.

See string option in tokenize about details.

7.3.66.4.2. Optional parameters#

There are optional parameters.

7.3.66.4.2.1. flags#

Specifies a tokenization customize options. You can specify multiple options separated by “|”.

The default value is NONE.

See flags option in tokenize about details.

7.3.66.4.2.2. mode#

Specifies a tokenize mode.

The default value is GET.

See mode option in tokenize about details.

7.3.66.4.2.3. index_column#

Specifies an index column.

Return value includes estimated_size of the index.

The estimated_size is useful for checking estimated frequency of tokens.

7.3.66.5. Return value#

table_tokenize command returns tokenized tokens.

See Return value option in tokenize about details.

7.3.66.6. See also#