Groonga 12.0.8 has been released
Groonga 12.0.8 has been released!
How to install: Install
Changes
Here are important changes in this release:
Improvements
-
Changed specification of the
escalate()
function (Experimental).We changed to not use results out of
escalate()
.In the previous specification, users had to guess how many results would be passed to
escalate()
to determin the first threshold, which was incovenient.Here is a example for the previous
escalate()
.number_column > 10 && escalate(THRESHOLD_1, CONDITION_1, ..., THRESHOLD_N, CONDITION_N)
CONDITION1
was executed when the results ofnumber_column > 10
was less or equal toTHRESHOLD_1
. Users had to guess how many results would they get fromnumber_column > 10
to determineTHRESHOLD_1
.From this release, the users don't need to guess how many results will they get from
number_column > 10
, making it easier to set the thresholds.With this change, the syntax of
escalate()
changed as follow.The previous syntax
escalate(THRESHOLD_1, CONDITION_1,THRESHOLD_2, CONDITION_2, ..., THRESHOLD_N, CONDITION_N)
The new syntax
escalate(CONDITION_1, THRESHOLD_2, CONDITION_2, ..., THRESHOLD_N, CONDITION_N)
Here are details of the syntax changes.
- Don't require the threshold for the first condition.
- Don't allow empty arguments call. The first condition is required.
- Always execute the first condition.
This function is experimental. These behaviors may be changed in the future.
-
cmake Added a document about how to build Groonga with CMake.
-
others Added descriptions about how to enable/disable Apache Arrow support when building with GNU Autotools.
-
select Add a document about
drilldowns.table
. -
i18n Updated the translation procedure.
Fixes
-
Fixed a bug that Groonga could return incorrect results when we use NormalizerTable and it contains a non-idempotent (results can be changed when executed repeatedly) definition.
This was caused by that we normalized a search value multiple times: after the value was input and after the value was tokenized.
Groonga tokenizes and normalizes the data to be registered using the tokenizer and normalizer set in the index table when adding a record. The search value is also tokenized and normalized using the tokenizer and normalizer set in the index table, and then the search value and the index are matched. If the search value is the same as the data registered in the index, it will be in the same state as stored in the index because both use the same tokenizer and normalizer.
However, Groonga had normalized extra only the search value.
Built-in normalizers like NormalizerAuto did't cause this bug because they are idempotent (results aren't changed if they are executed repeatedly). On the other hand, NormalizerTable allows the users specify their own normalization definitions, so they can specify non-idempotent (results can be changed when executed repeatedly) definitions.
If there were non-idempotent definitions in NormalizerTable, the indexed data and the search value did not match in some cases because the search value was normalized extra.
In such cases, the data that should hit was not hit or the data that should not hit was hit.
Here is a example.
table_create ColumnNormalizations TABLE_NO_KEY column_create ColumnNormalizations target_column COLUMN_SCALAR ShortText column_create ColumnNormalizations normalized COLUMN_SCALAR ShortText load --table ColumnNormalizations [ {"target_column": "a", "normalized": "b"}, {"target_column": "b", "normalized": "c"} ] table_create Targets TABLE_PAT_KEY ShortText column_create Targets column_normalizations_target_column COLUMN_INDEX \ ColumnNormalizations target_column table_create Memos TABLE_NO_KEY column_create Memos content COLUMN_SCALAR ShortText load --table Memos [ {"content":"a"}, {"content":"c"}, ] table_create \ Terms \ TABLE_PAT_KEY \ ShortText \ --default_tokenizer 'TokenNgram' \ --normalizers 'NormalizerTable("normalized", \ "ColumnNormalizations.normalized", \ "target", \ "target_column")' column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content select Memos --query content:@a [[0,1664781132.892326,0.03527212142944336],[[[1],[["_id","UInt32"],["content","ShortText"]],[2,"c"]]]]
The expected result of
select Memos --query content:@a
isa
, but Groonga returnedc
as a result. This was because we normalized the inputa
tob
by definitions ofColumnNormalizations
, and after that, we normalized the normalizedb
again and it was normalized toc
. As a result, the inputa
was converted toc
and matched to{"content":"c"}
of theMemos
table.
Known Issues
-
Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
-
*<
and*>
only valid when we usequery()
the right side of filter condition. If we specify as below,*<
and*>
work as&&
.'content @ "Groonga" *< content @ "Mroonga"'
-
Groonga may not return records that should match caused by
GRN_II_CURSOR_SET_MIN_ENABLE
.
Conclusion
Please refert to the following news for more details.
Let's search by Groonga!