Groonga 13.0.9 has been released
Groonga 13.0.9 has been released!
How to install: Install
Changes
Here are important changes in this release:
Improvements
-
[select] Changed the default value of
--fuzzy_max_expansionsfrom 0 to 10.--fuzzy_max_expansionscan limit number of words that has close edit distance to use search process. This argument can help to balance hit numbers and performance of the search. When--fuzzy_max_expansionsis 0, the search use all words that the edit distance are under--fuzzy_max_distancein the vocabulary list.--fuzzy_max_expansionsis 0 (unlimited) may slow down a search. Therefore, the default value of--fuzzy_max_expansionsis 10 from this release. -
[select] Improved
selectarguments with addition new argument--fuzzy_with_transposition(experimental).We can choose edit distance
1or2for the transposition case by using this argument.If this parameter is
yes, the edit distance of this case is1. It's2otherwise. -
[select] Improved
selectarguments with addition new argument--fuzzy_tokenize.When
--fuzzy_tokenizeisyes, Gronga use tokenizer that specifies in--default_tokenizerin typo tolerance search.The default value of
--fuzzy_tokenizeisno. The useful case of--fuzzy_tokenizeis the following case.- Search targets are only Japanese data.
- Specify
TokenMecabin--default_tokenizer.
-
[load] Added support for
--ifexistseven if we specifiedapache-arrowintoinput_type. -
[normalizers] Improved
NormalizerNFKC*options with addition new optionremove_blank_force.When
remove_blank_forceisfalse, Normalizer doesn't ignore space as below.table_create Entries TABLE_NO_KEY column_create Entries body COLUMN_SCALAR ShortText load --table Entries [ {"body": "Groonga はとても速い"}, {"body": "Groongaはとても速い"} ] select Entries --output_columns \ 'highlight(body, \ "gaはとても", "<keyword>", "</keyword>", \ {"normalizers": "NormalizerNFKC150(\\"remove_blank_force\\", false)"} \ )' [ [ 0, 0.0, 0.0 ], [ [ [ 2 ], [ [ "highlight", null ] ], [ "Groonga はとても速い" ], [ "Groon<keyword>gaはとても</keyword>速い" ] ] ] ] -
[select] Improved
selectarguments with addition new argument--output_trace_log(experimental).If we specify
yesin--output_trace_logand--command_version 3, Groonga output addition new log as below.table_create Memos TABLE_NO_KEY column_create Memos content COLUMN_SCALAR ShortText table_create Lexicon TABLE_PAT_KEY ShortText --default_tokenizer TokenNgram --normalizer NormalizerNFKC150 column_create Lexicon memos_content COLUMN_INDEX|WITH_POSITION Memos content load --table Memos [ {"content": "This is a pen"}, {"content": "That is a pen"}, {"content": "They are pens"} ] select Memos \ --match_columns content \ --query "Thas OR ere" \ --fuzzy_max_distance 1 \ --output_columns *,_score \ --command_version 3 \ --output_trace_log yes \ --output_type apache-arrow return_code: int32 start_time: timestamp[ns] elapsed_time: double error_message: string error_file: string error_line: uint32 error_function: string error_input_file: string error_input_line: int32 error_input_command: string -- metadata -- GROONGA:data_type: metadata return_code start_time elapsed_time error_message error_file error_line error_function error_input_file error_input_line error_input_command 0 0 1970-01-01T09:00:00+09:00 0.000000 (null) (null) (null) (null) (null) (null) (null) ======================================== depth: uint16 sequence: uint16 name: string value: dense_union<0: uint32=0, 1: string=1> elapsed_time: uint64 -- metadata -- GROONGA:data_type: trace_log depth sequence name value elapsed_time 0 1 0 ii.select.input Thas 0 1 2 0 ii.select.exact.n_hits 0 1 2 2 0 ii.select.fuzzy.input Thas 2 3 2 1 ii.select.fuzzy.input.actual that 3 4 2 2 ii.select.fuzzy.input.actual this 4 5 2 3 ii.select.fuzzy.n_hits 2 5 6 1 1 ii.select.n_hits 2 6 7 1 0 ii.select.input ere 7 8 2 0 ii.select.exact.n_hits 2 8 9 1 1 ii.select.n_hits 2 9 ======================================== content: string _score: double -- metadata -- GROONGA:n_hits: 2 content _score 0 This is a pen 1.000000 1 That is a pen 1.000000--output_trace_logis valid in only command version 3.This will be useful for the following cases:
- Detect real words used by fuzzy query.
- Measure elapsed timeout without seeing query log.
-
[snippet] Added support for
normalizersoption.We can use normalizer with option. For example, when we don't want to ignore space in
snippet()function, we use this option as below.table_create Entries TABLE_NO_KEY column_create Entries content COLUMN_SCALAR ShortText load --table Entries [ {"content": "Groonga and MySQL"}, {"content": "Groonga and My SQL"} ] select Entries \ --output_columns \ ' snippet(content, "MySQL", "<keyword>", "</keyword>", {"normalizers": "NormalizerNFKC150(\\"remove_blank_force\\", false)"} )' [ [ 0, 0.0, 0.0 ], [ [ [ 2 ], [ [ "snippet", null ] ], [ [ "Groonga and <keyword>MySQL</keyword>" ] ], [ null ] ] ] ]
Fixes
-
Fixed a bug in
Time OPERATOR Float{,32}comparison. GH-1624[Reported by yssrku]Microsecond (small value than second) information in
Float{,32}isn't used. This is happen only whenTime OPERATOR Float{,32}.This is happen in
load --ifexists 'A OP B || C OP D'as below.table_create Reports TABLE_HASH_KEY ShortText column_create Reports content COLUMN_SCALAR Text column_create Reports modified_at COLUMN_SCALAR Time load --table Reports [ {"_key": "a", "content": "", "modified_at": 1663989875.438} ] load \ --table Reports \ --ifexists 'content == "" && modified_at <= 1663989875.437'However, this isn't happen in
select --filter. -
Fixed a bug that
alnum(a-zA-Z0-9) + blankmay be detected.If the number of input is 2 such as
aband text with some blanks such asa bis matched,a bis detected. However, it should not be detected in this case.For example,
a iis detected when this bug occures as below.table_create Entries TABLE_NO_KEY column_create Entries body COLUMN_SCALAR ShortText load --table Entries [ {"body": "Groonga is fast"} ] select Entries \ --output_columns 'highlight(body, "ai", "<keyword>", "</keyword>")' [ [ 0,0.0,0.0 ], [ [ [ 1 ], [ [ "highlight", null ] ], [ "Groong<keyword>a i</keyword>s fast" ] ] ] ]However, the above result is unexpected result. We don't want to detect
a iin the above case.
Conclusion
Please refert to the following news for more details. News Release 13.0.9
Let's search by Groonga!