7.3.67. `table_tokenize`#

7.3.67.1. Summary#

table_tokenize command tokenizes text by the specified table’s tokenizer.

7.3.67.2. Syntax#

This command takes many parameters.

table and string are required parameters. Others are optional:

table_tokenize table
               string
               [flags=NONE]
               [mode=GET]
               [index_column=null]
               [output_style=full]

7.3.67.3. Usage#

Here is a simple example.

Execution example:

plugin_register token_filters/stop_word
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Terms TABLE_PAT_KEY ShortText   --default_tokenizer TokenBigram   --normalizer NormalizerAuto   --token_filters TokenFilterStopWord
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0,1337566253.89858,0.000355720520019531],1]
table_tokenize Terms "Hello and Good-bye" --mode GET
# [[0,1337566253.89858,0.000355720520019531],[]]

Terms table is set TokenBigram tokenizer, NormalizerAuto normalizer, TokenFilterStopWord token filter. It returns tokens that is generated by tokenizing "Hello and Good-bye" with TokenBigram tokenizer. It is normalized by NormalizerAuto normalizer. and token is removed with TokenFilterStopWord token filter.

7.3.67.4. Parameters#

This section describes all parameters. Parameters are categorized.

7.3.67.4.1. Required parameters#

There are required parameters, table and string.

7.3.67.4.1.1. `table`#

Specifies the lexicon table. table_tokenize command uses the tokenizer, the normalizer, the token filters that is set the lexicon table.

7.3.67.4.1.2. `string`#

Specifies any string which you want to tokenize.

See string option in tokenize about details.

7.3.67.4.2. Optional parameters#

There are optional parameters.

7.3.67.4.2.1. `flags`#

Specifies a tokenization customize options. You can specify multiple options separated by “|”.

The default value is NONE.

See flags option in tokenize about details.

7.3.67.4.2.2. `mode`#

Specifies a tokenize mode.

The default value is GET.

See mode option in tokenize about details.

7.3.67.4.2.3. `index_column`#

Specifies an index column.

Return value includes estimated_size of the index.

The estimated_size is useful for checking estimated frequency of tokens.

7.3.67.4.2.4. `output_style`#

Added in version 15.0.9.

Specifies the output style of the table_tokenize command.

See output_style option in tokenize about details.

7.3.67.5. Return value#

table_tokenize command returns tokenized tokens.

See Return value option in tokenize about details.

7.3.67. table_tokenize#

7.3.67.1. Summary#

7.3.67.2. Syntax#

7.3.67.3. Usage#

7.3.67.4. Parameters#

7.3.67.4.1. Required parameters#

7.3.67.4.1.1. table#

7.3.67.4.1.2. string#

7.3.67.4.2. Optional parameters#

7.3.67.4.2.1. flags#

7.3.67.4.2.2. mode#

7.3.67.4.2.3. index_column#

7.3.67.4.2.4. output_style#