BloGroonga

2022-10-31

PGroonga (fast full text search module for PostgreSQL) 2.4.1 has been released

PGroonga 2.4.1 has been released! PGroonga makes PostgreSQL fast full text search for all languages.

If you are new user, see also About PGroonga.

Highlight

Here are highlights in PGroonga 2.4.1:

  • Added support for PostgreSQL 15.

  • Dropped support for PostgreSQL 10.

    Because PostgreSQL 10 will reach EOL on November 2022.

  • [&@~ operator for jsonb type] Added translation about how to perform full text search with indexes against jsonb type values.

How to upgrade

This version is compatible with before versions. You can upgrade by steps in "Compatible case" in Upgrade document.

Conclusion

Try PGroonga when you want to perform fast full text search against all languages on PostgreSQL!

2022-10-31

Groonga 12.0.9 has been released

Groonga 12.0.9 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • AlmaLinux Added support for AlmaLinux 9.

    We had added this support at Groonga 12.0.8 but haven't announced it.

  • escalate Added a document for the escalate() function.

  • normalizers Added NormalizerHTML. (Experimental)

    NormalizerHTML is a normalizer for HTML.

    Currently NormalizerHTML supports removing tags like <span> or </span> and expanding character references like &amp; or &#38;.

    Here are sample queries for NormalizerHTML.

    normalize NormalizerHTML "<span> Groonga &amp; Mroonga &#38; Rroonga </span>"
    [[0,1666923364.883798,0.0005481243133544922],{"normalized":" Groonga & Mroonga & Rroonga ","types":[],"checks":[]}]
    

    In this sample <span> and </span> are removed, and &amp; and &#38; are expanded to &.

    We can specify whether removing the tags with the remove_tag option. (The default value of the remove_tag option is true.)

    normalize 'NormalizerHTML("remove_tag", false)' "<span> Groonga &amp; Mroonga &#38; Rroonga </span>"
    [[0,1666924069.278549,0.0001978874206542969],{"normalized":"<span> Groonga & Mroonga & Rroonga </span>","types":[],"checks":[]}]
    

    In this sample, <span> and </span> are not removed.

    We can specify whether expanding the character references with the expand_character_reference option. (The default value of the expand_character_reference option is true.)

    normalize 'NormalizerHTML("expand_character_reference", false)' "<span> Groonga &amp; Mroonga &#38; Rroonga </span>"
    [[0,1666924357.099782,0.0002346038818359375],{"normalized":" Groonga &amp; Mroonga &#38; Rroonga ","types":[],"checks":[]}]
    

    In this sample, &amp; and &#38; are not expanded.

  • [httpd] Updated bundled nginx to 1.23.2.

    Contains security fixes of CVE-2022-41741 and CVE-2022-41742. Please refer to https://nginx.org/en/CHANGES about these security fixes.

  • Suppressed logging a lot of same messages when no memory is available.

    Groonga could log a lot of mmap failed!!!! when no memory is available. We improved to log the above message as less duplicates as possible.

Fixes

  • select Fixed a bug that Groonga could crash or return incorrect results when specifying n_workers.

    This bug had occurred when using n_workers with a value greater than 1 and drilldowns[{LABEL}].filter at the same time.

    The reason why this bug occurred was because Groonga referenced incorrect values (objects) when performing internal parallel processing. So if the condition above was satisfied, Groonga sometimes crashed or returned incorrect results depending on the timing of the parallel processing.

Known Issues

  • Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.

  • *< and *> only valid when we use query() the right side of filter condition. If we specify as below, *< and *> work as &&.

    • 'content @ "Groonga" *< content @ "Mroonga"'
  • Groonga may not return records that should match caused by GRN_II_CURSOR_SET_MIN_ENABLE.

Conclusion

Please refert to the following news for more details.

News Release 12.0.9

Let's search by Groonga!

2022-10-07

PGroonga (fast full text search module for PostgreSQL) 2.4.0 has been released

PGroonga 2.4.0 has been released! PGroonga makes PostgreSQL fast full text search for all languages.

If you are new user, see also About PGroonga.

Highlight

Here are highlights in PGroonga 2.4.0:

  • Ubuntu Added support for PostgreSQL 10, 11, 12, 13, and 14 for PGDG packages on Ubuntu 22.04 (Jammy Jellyfish)

How to upgrade

This version is compatible with before versions. You can upgrade by steps in "Compatible case" in Upgrade document.

Conclusion

Try PGroonga when you want to perform fast full text search against all languages on PostgreSQL!

2022-10-03

Groonga 12.0.8 has been released

Groonga 12.0.8 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • Changed specification of the escalate() function (Experimental).

    We changed to not use results out of escalate().

    In the previous specification, users had to guess how many results would be passed to escalate() to determin the first threshold, which was incovenient.

    Here is a example for the previous escalate().

    number_column > 10 && escalate(THRESHOLD_1, CONDITION_1,
                                   ...,
                                   THRESHOLD_N, CONDITION_N)
    

    CONDITION1 was executed when the results of number_column > 10 was less or equal to THRESHOLD_1 . Users had to guess how many results would they get from number_column > 10 to determine THRESHOLD_1.

    From this release, the users don't need to guess how many results will they get from number_column > 10, making it easier to set the thresholds.

    With this change, the syntax of escalate() changed as follow.

    The previous syntax

    escalate(THRESHOLD_1, CONDITION_1,THRESHOLD_2, CONDITION_2, ..., THRESHOLD_N, CONDITION_N)
    

    The new syntax

    escalate(CONDITION_1, THRESHOLD_2, CONDITION_2, ..., THRESHOLD_N, CONDITION_N)
    

    Here are details of the syntax changes.

    • Don't require the threshold for the first condition.
    • Don't allow empty arguments call. The first condition is required.
    • Always execute the first condition.

    This function is experimental. These behaviors may be changed in the future.

  • cmake Added a document about how to build Groonga with CMake.

  • others Added descriptions about how to enable/disable Apache Arrow support when building with GNU Autotools.

  • select Add a document about drilldowns.table.

  • i18n Updated the translation procedure.

Fixes

  • Fixed a bug that Groonga could return incorrect results when we use NormalizerTable and it contains a non-idempotent (results can be changed when executed repeatedly) definition.

    This was caused by that we normalized a search value multiple times: after the value was input and after the value was tokenized.

    Groonga tokenizes and normalizes the data to be registered using the tokenizer and normalizer set in the index table when adding a record. The search value is also tokenized and normalized using the tokenizer and normalizer set in the index table, and then the search value and the index are matched. If the search value is the same as the data registered in the index, it will be in the same state as stored in the index because both use the same tokenizer and normalizer.

    However, Groonga had normalized extra only the search value.

    Built-in normalizers like NormalizerAuto did't cause this bug because they are idempotent (results aren't changed if they are executed repeatedly). On the other hand, NormalizerTable allows the users specify their own normalization definitions, so they can specify non-idempotent (results can be changed when executed repeatedly) definitions.

    If there were non-idempotent definitions in NormalizerTable, the indexed data and the search value did not match in some cases because the search value was normalized extra.

    In such cases, the data that should hit was not hit or the data that should not hit was hit.

    Here is a example.

    table_create ColumnNormalizations TABLE_NO_KEY
    column_create ColumnNormalizations target_column COLUMN_SCALAR ShortText
    column_create ColumnNormalizations normalized COLUMN_SCALAR ShortText
    
    load --table ColumnNormalizations
    [
    {"target_column": "a", "normalized": "b"},
    {"target_column": "b", "normalized": "c"}
    ]
    
    table_create Targets TABLE_PAT_KEY ShortText
    column_create Targets column_normalizations_target_column COLUMN_INDEX \
      ColumnNormalizations target_column
    
    table_create Memos TABLE_NO_KEY
    column_create Memos content COLUMN_SCALAR ShortText
    
    load --table Memos
    [
    {"content":"a"},
    {"content":"c"},
    ]
    
    table_create \
      Terms \
      TABLE_PAT_KEY \
      ShortText \
      --default_tokenizer 'TokenNgram' \
      --normalizers 'NormalizerTable("normalized", \
                                    "ColumnNormalizations.normalized", \
                                    "target", \
                                    "target_column")'
    
    column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
    
    select Memos --query content:@a
    [[0,1664781132.892326,0.03527212142944336],[[[1],[["_id","UInt32"],["content","ShortText"]],[2,"c"]]]]
    

    The expected result of select Memos --query content:@a is a, but Groonga returned c as a result. This was because we normalized the input a to b by definitions of ColumnNormalizations, and after that, we normalized the normalized b again and it was normalized to c. As a result, the input a was converted to c and matched to {"content":"c"} of the Memos table.

Known Issues

  • Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.

  • *< and *> only valid when we use query() the right side of filter condition. If we specify as below, *< and *> work as &&.

    • 'content @ "Groonga" *< content @ "Mroonga"'
  • Groonga may not return records that should match caused by GRN_II_CURSOR_SET_MIN_ENABLE.

Conclusion

Please refert to the following news for more details.

News Release 12.0.8

Let's search by Groonga!