4.11. Query expansion#
Groonga accepts query_expander
parameter for select command.
It enables you to extend your query string.
For example, if user searches “theatre” instead of “theater”, query expansion enables to return search results of “theatre OR theater”. This kind of way reduces search leakages. This is what really user wants.
4.11.1. Preparation#
To use query expansion, you need to create table which stores documents, synonym table which stores query string and replacement string. In synonym table, primary key represents original string, the column of ShortText represents modified string.
Let’s create document table and synonym table.
Execution example:
table_create Doc TABLE_PAT_KEY ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Doc body COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Term TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Term Doc_body COLUMN_INDEX|WITH_POSITION Doc body
# [[0,1337566253.89858,0.000355720520019531],true]
table_create Synonym TABLE_PAT_KEY ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Synonym body COLUMN_VECTOR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Doc
[
{"_key": "001", "body": "Play all night in this theater."},
{"_key": "002", "body": "theatre is British spelling."},
]
# [[0,1337566253.89858,0.000355720520019531],2]
load --table Synonym
[
{"_key": "theater", "body": ["theater", "theatre"]},
{"_key": "theatre", "body": ["theater", "theatre"]},
]
# [[0,1337566253.89858,0.000355720520019531],2]
In this case, it doesn’t occur search leakage because it creates synonym table which accepts “theatre” and “theater” as query string.
4.11.2. Search#
Then, let’s use prepared synonym table.
First, use select command without query_expander
parameter.
Execution example:
select Doc --match_columns body --query "theater"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 1,
# "001",
# "Play all night in this theater."
# ]
# ]
# ]
# ]
select Doc --match_columns body --query "theatre"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 2,
# "002",
# "theatre is British spelling."
# ]
# ]
# ]
# ]
Above query returns the record which completely equal to query string.
Then, use query_expander
parameter against body
column of Synonym
table.
Execution example:
select Doc --match_columns body --query "theater" --query_expander Synonym.body
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 1,
# "001",
# "Play all night in this theater."
# ],
# [
# 2,
# "002",
# "theatre is British spelling."
# ]
# ]
# ]
# ]
select Doc --match_columns body --query "theatre" --query_expander Synonym.body
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 1,
# "001",
# "Play all night in this theater."
# ],
# [
# 2,
# "002",
# "theatre is British spelling."
# ]
# ]
# ]
# ]
In which cases, query string is replaced to “(theater OR theatre)”, thus synonym is considered for full text search.