7.9.5. TokenFilterStopWord
#
7.9.5.1. Summary#
TokenFilterStopWord
removes stop words from tokenized token
in searching the documents.
TokenFilterStopWord
can specify stop word after adding the
documents because it removes token in searching the documents.
The stop word is specified is_stop_word
column on lexicon table
when you don’t specify column
option.
7.9.5.2. Syntax#
TokenFilterStopWord
has optional parameter:
TokenFilterStopWord
TokenFilterStopWord("column", "ignore")
7.9.5.3. Usage#
Here is an example that uses TokenFilterStopWord
token filter:
Execution example:
plugin_register token_filters/stop_word
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Memos TABLE_NO_KEY
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Memos content COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStopWord
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0,1337566253.89858,0.000355720520019531],1]
load --table Memos
[
{"content": "Hello"},
{"content": "Hello and Good-bye"},
{"content": "Good-bye"}
]
# [[0,1337566253.89858,0.000355720520019531],3]
select Memos --match_columns content --query "Hello and"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "Hello"
# ],
# [
# 2,
# "Hello and Good-bye"
# ]
# ]
# ]
# ]
and
token is marked as stop word in Terms
table.
"Hello"
that doesn’t have and
in content is matched. Because
and
is a stop word and and
is removed from query.
You can specify stop word in column except is_stop_columns
by columns
option as below.
Execution example:
plugin_register token_filters/stop_word
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Memos TABLE_NO_KEY
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Memos content COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters 'TokenFilterStopWord("column", "ignore")'
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms ignore COLUMN_SCALAR Bool
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Terms
[
{"_key": "and", "ignore": true}
]
# [[0,1337566253.89858,0.000355720520019531],1]
load --table Memos
[
{"content": "Hello"},
{"content": "Hello and Good-bye"},
{"content": "Good-bye"}
]
# [[0,1337566253.89858,0.000355720520019531],3]
select Memos --match_columns content --query "Hello and"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "Hello"
# ],
# [
# 2,
# "Hello and Good-bye"
# ]
# ]
# ]
# ]
7.9.5.4. Parameters#
7.9.5.4.1. Optional parameter#
There is a optional parameters columns
.
7.9.5.4.1.1. columns
#
Specify a column that specified a stop word.