7.3.66. table_tokenize
#
7.3.66.1. Summary#
table_tokenize
command tokenizes text by the specified table’s tokenizer.
7.3.66.2. Syntax#
This command takes many parameters.
table
and string
are required parameters. Others are
optional:
table_tokenize table
string
[flags=NONE]
[mode=GET]
[index_column=null]
7.3.66.3. Usage#
Here is a simple example.
Execution example:
plugin_register token_filters/stop_word
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto --token_filters TokenFilterStopWord
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0,1337566253.89858,0.000355720520019531],1]
table_tokenize Terms "Hello and Good-bye" --mode GET
# [[0,1337566253.89858,0.000355720520019531],[]]
Terms
table is set TokenBigram
tokenizer, NormalizerAuto
normalizer,
TokenFilterStopWord
token filter. It returns tokens that is
generated by tokenizeing "Hello and Good-bye"
with TokenBigram
tokenizer.
It is normalized by NormalizerAuto
normalizer.
and
token is removed with TokenFilterStopWord
token filter.
7.3.66.4. Parameters#
This section describes all parameters. Parameters are categorized.
7.3.66.4.1. Required parameters#
There are required parameters, table
and string
.
7.3.66.4.1.1. table
#
Specifies the lexicon table. table_tokenize
command uses the
tokenizer, the normalizer, the token filters that is set the
lexicon table.
7.3.66.4.1.2. string
#
Specifies any string which you want to tokenize.
7.3.66.4.2. Optional parameters#
There are optional parameters.
7.3.66.4.2.1. flags
#
Specifies a tokenization customize options. You can specify
multiple options separated by “|
”.
The default value is NONE
.
7.3.66.4.2.2. mode
#
Specifies a tokenize mode.
The default value is GET
.
7.3.66.4.2.3. index_column
#
Specifies an index column.
Return value includes estimated_size
of the index.
The estimated_size
is useful for checking estimated frequency of tokens.
7.3.66.5. Return value#
table_tokenize
command returns tokenized tokens.
See Return value option in tokenize about details.