BloGroonga

2022-10-03

Groonga 12.0.8 has been released

Groonga 12.0.8 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • Changed specification of the escalate() function (Experimental).

    We changed to not use results out of escalate().

    In the previous specification, users had to guess how many results would be passed to escalate() to determin the first threshold, which was incovenient.

    Here is a example for the previous escalate().

    number_column > 10 && escalate(THRESHOLD_1, CONDITION_1,
                                   ...,
                                   THRESHOLD_N, CONDITION_N)
    

    CONDITION1 was executed when the results of number_column > 10 was less or equal to THRESHOLD_1 . Users had to guess how many results would they get from number_column > 10 to determine THRESHOLD_1.

    From this release, the users don't need to guess how many results will they get from number_column > 10, making it easier to set the thresholds.

    With this change, the syntax of escalate() changed as follow.

    The previous syntax

    escalate(THRESHOLD_1, CONDITION_1,THRESHOLD_2, CONDITION_2, ..., THRESHOLD_N, CONDITION_N)
    

    The new syntax

    escalate(CONDITION_1, THRESHOLD_2, CONDITION_2, ..., THRESHOLD_N, CONDITION_N)
    

    Here are details of the syntax changes.

    • Don't require the threshold for the first condition.
    • Don't allow empty arguments call. The first condition is required.
    • Always execute the first condition.

    This function is experimental. These behaviors may be changed in the future.

  • cmake Added a document about how to build Groonga with CMake.

  • others Added descriptions about how to enable/disable Apache Arrow support when building with GNU Autotools.

  • select Add a document about drilldowns.table.

  • i18n Updated the translation procedure.

Fixes

  • Fixed a bug that Groonga could return incorrect results when we use NormalizerTable and it contains a non-idempotent (results can be changed when executed repeatedly) definition.

    This was caused by that we normalized a search value multiple times: after the value was input and after the value was tokenized.

    Groonga tokenizes and normalizes the data to be registered using the tokenizer and normalizer set in the index table when adding a record. The search value is also tokenized and normalized using the tokenizer and normalizer set in the index table, and then the search value and the index are matched. If the search value is the same as the data registered in the index, it will be in the same state as stored in the index because both use the same tokenizer and normalizer.

    However, Groonga had normalized extra only the search value.

    Built-in normalizers like NormalizerAuto did't cause this bug because they are idempotent (results aren't changed if they are executed repeatedly). On the other hand, NormalizerTable allows the users specify their own normalization definitions, so they can specify non-idempotent (results can be changed when executed repeatedly) definitions.

    If there were non-idempotent definitions in NormalizerTable, the indexed data and the search value did not match in some cases because the search value was normalized extra.

    In such cases, the data that should hit was not hit or the data that should not hit was hit.

    Here is a example.

    table_create ColumnNormalizations TABLE_NO_KEY
    column_create ColumnNormalizations target_column COLUMN_SCALAR ShortText
    column_create ColumnNormalizations normalized COLUMN_SCALAR ShortText
    
    load --table ColumnNormalizations
    [
    {"target_column": "a", "normalized": "b"},
    {"target_column": "b", "normalized": "c"}
    ]
    
    table_create Targets TABLE_PAT_KEY ShortText
    column_create Targets column_normalizations_target_column COLUMN_INDEX \
      ColumnNormalizations target_column
    
    table_create Memos TABLE_NO_KEY
    column_create Memos content COLUMN_SCALAR ShortText
    
    load --table Memos
    [
    {"content":"a"},
    {"content":"c"},
    ]
    
    table_create \
      Terms \
      TABLE_PAT_KEY \
      ShortText \
      --default_tokenizer 'TokenNgram' \
      --normalizers 'NormalizerTable("normalized", \
                                    "ColumnNormalizations.normalized", \
                                    "target", \
                                    "target_column")'
    
    column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
    
    select Memos --query content:@a
    [[0,1664781132.892326,0.03527212142944336],[[[1],[["_id","UInt32"],["content","ShortText"]],[2,"c"]]]]
    

    The expected result of select Memos --query content:@a is a, but Groonga returned c as a result. This was because we normalized the input a to b by definitions of ColumnNormalizations, and after that, we normalized the normalized b again and it was normalized to c. As a result, the input a was converted to c and matched to {"content":"c"} of the Memos table.

Known Issues

  • Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.

  • *< and *> only valid when we use query() the right side of filter condition. If we specify as below, *< and *> work as &&.

    • 'content @ "Groonga" *< content @ "Mroonga"'
  • Groonga may not return records that should match caused by GRN_II_CURSOR_SET_MIN_ENABLE.

Conclusion

Please refert to the following news for more details.

News Release 12.0.8

Let's search by Groonga!