Groonga 13.0.9 has been released
Groonga 13.0.9 has been released!
How to install: Install
Changes
Here are important changes in this release:
Improvements
-
[select] Changed the default value of
--fuzzy_max_expansions
from 0 to 10.--fuzzy_max_expansions
can limit number of words that has close edit distance to use search process. This argument can help to balance hit numbers and performance of the search. When--fuzzy_max_expansions
is 0, the search use all words that the edit distance are under--fuzzy_max_distance
in the vocabulary list.--fuzzy_max_expansions
is 0 (unlimited) may slow down a search. Therefore, the default value of--fuzzy_max_expansions
is 10 from this release. -
[select] Improved
select
arguments with addition new argument--fuzzy_with_transposition
(experimental).We can choose edit distance
1
or2
for the transposition case by using this argument.If this parameter is
yes
, the edit distance of this case is1
. It's2
otherwise. -
[select] Improved
select
arguments with addition new argument--fuzzy_tokenize
.When
--fuzzy_tokenize
isyes
, Gronga use tokenizer that specifies in--default_tokenizer
in typo tolerance search.The default value of
--fuzzy_tokenize
isno
. The useful case of--fuzzy_tokenize
is the following case.- Search targets are only Japanese data.
- Specify
TokenMecab
in--default_tokenizer
.
-
[load] Added support for
--ifexists
even if we specifiedapache-arrow
intoinput_type
. -
[normalizers] Improved
NormalizerNFKC*
options with addition new optionremove_blank_force
.When
remove_blank_force
isfalse
, Normalizer doesn't ignore space as below.table_create Entries TABLE_NO_KEY column_create Entries body COLUMN_SCALAR ShortText load --table Entries [ {"body": "Groonga はとても速い"}, {"body": "Groongaはとても速い"} ] select Entries --output_columns \ 'highlight(body, \ "gaはとても", "<keyword>", "</keyword>", \ {"normalizers": "NormalizerNFKC150(\\"remove_blank_force\\", false)"} \ )' [ [ 0, 0.0, 0.0 ], [ [ [ 2 ], [ [ "highlight", null ] ], [ "Groonga はとても速い" ], [ "Groon<keyword>gaはとても</keyword>速い" ] ] ] ]
-
[select] Improved
select
arguments with addition new argument--output_trace_log
(experimental).If we specify
yes
in--output_trace_log
and--command_version 3
, Groonga output addition new log as below.table_create Memos TABLE_NO_KEY column_create Memos content COLUMN_SCALAR ShortText table_create Lexicon TABLE_PAT_KEY ShortText --default_tokenizer TokenNgram --normalizer NormalizerNFKC150 column_create Lexicon memos_content COLUMN_INDEX|WITH_POSITION Memos content load --table Memos [ {"content": "This is a pen"}, {"content": "That is a pen"}, {"content": "They are pens"} ] select Memos \ --match_columns content \ --query "Thas OR ere" \ --fuzzy_max_distance 1 \ --output_columns *,_score \ --command_version 3 \ --output_trace_log yes \ --output_type apache-arrow return_code: int32 start_time: timestamp[ns] elapsed_time: double error_message: string error_file: string error_line: uint32 error_function: string error_input_file: string error_input_line: int32 error_input_command: string -- metadata -- GROONGA:data_type: metadata return_code start_time elapsed_time error_message error_file error_line error_function error_input_file error_input_line error_input_command 0 0 1970-01-01T09:00:00+09:00 0.000000 (null) (null) (null) (null) (null) (null) (null) ======================================== depth: uint16 sequence: uint16 name: string value: dense_union<0: uint32=0, 1: string=1> elapsed_time: uint64 -- metadata -- GROONGA:data_type: trace_log depth sequence name value elapsed_time 0 1 0 ii.select.input Thas 0 1 2 0 ii.select.exact.n_hits 0 1 2 2 0 ii.select.fuzzy.input Thas 2 3 2 1 ii.select.fuzzy.input.actual that 3 4 2 2 ii.select.fuzzy.input.actual this 4 5 2 3 ii.select.fuzzy.n_hits 2 5 6 1 1 ii.select.n_hits 2 6 7 1 0 ii.select.input ere 7 8 2 0 ii.select.exact.n_hits 2 8 9 1 1 ii.select.n_hits 2 9 ======================================== content: string _score: double -- metadata -- GROONGA:n_hits: 2 content _score 0 This is a pen 1.000000 1 That is a pen 1.000000
--output_trace_log
is valid in only command version 3.This will be useful for the following cases:
- Detect real words used by fuzzy query.
- Measure elapsed timeout without seeing query log.
-
[snippet] Added support for
normalizers
option.We can use normalizer with option. For example, when we don't want to ignore space in
snippet()
function, we use this option as below.table_create Entries TABLE_NO_KEY column_create Entries content COLUMN_SCALAR ShortText load --table Entries [ {"content": "Groonga and MySQL"}, {"content": "Groonga and My SQL"} ] select Entries \ --output_columns \ ' snippet(content, "MySQL", "<keyword>", "</keyword>", {"normalizers": "NormalizerNFKC150(\\"remove_blank_force\\", false)"} )' [ [ 0, 0.0, 0.0 ], [ [ [ 2 ], [ [ "snippet", null ] ], [ [ "Groonga and <keyword>MySQL</keyword>" ] ], [ null ] ] ] ]
Fixes
-
Fixed a bug in
Time OPERATOR Float{,32}
comparison. GH-1624[Reported by yssrku]Microsecond (small value than second) information in
Float{,32}
isn't used. This is happen only whenTime OPERATOR Float{,32}
.This is happen in
load --ifexists 'A OP B || C OP D'
as below.table_create Reports TABLE_HASH_KEY ShortText column_create Reports content COLUMN_SCALAR Text column_create Reports modified_at COLUMN_SCALAR Time load --table Reports [ {"_key": "a", "content": "", "modified_at": 1663989875.438} ] load \ --table Reports \ --ifexists 'content == "" && modified_at <= 1663989875.437'
However, this isn't happen in
select --filter
. -
Fixed a bug that
alnum(a-zA-Z0-9) + blank
may be detected.If the number of input is 2 such as
ab
and text with some blanks such asa b
is matched,a b
is detected. However, it should not be detected in this case.For example,
a i
is detected when this bug occures as below.table_create Entries TABLE_NO_KEY column_create Entries body COLUMN_SCALAR ShortText load --table Entries [ {"body": "Groonga is fast"} ] select Entries \ --output_columns 'highlight(body, "ai", "<keyword>", "</keyword>")' [ [ 0,0.0,0.0 ], [ [ [ 1 ], [ [ "highlight", null ] ], [ "Groong<keyword>a i</keyword>s fast" ] ] ] ]
However, the above result is unexpected result. We don't want to detect
a i
in the above case.
Conclusion
Please refert to the following news for more details. News Release 13.0.9
Let's search by Groonga!