BloGroonga

2024-07-04

PGroonga (fast full text search module for PostgreSQL) 3.2.1 has been released

PGroonga 3.2.1 has been released! PGroonga makes PostgreSQL fast full text search for all languages.

Improvements

Fixes

  • [&@~ operator] Fixed a crash bug with multiple conditions including blank only query condition.

    An error will occur if any of the multiple conditions have a blank space condition as below.

      CREATE TABLE memos (
        id integer,
        content text
      );
      INSERT INTO memos VALUES (1, 'PostgreSQL is a RDBMS.');
      INSERT INTO memos VALUES (2, 'Groonga is fast full text search engine.');
      INSERT INTO memos VALUES (3, 'PGroonga is a PostgreSQL extension that uses Groonga.');
      CREATE INDEX grnindex ON memos USING pgroonga (content);
      SELECT id, content
        FROM memos
       WHERE content &@~ pgroonga_condition('PGroonga') AND
             content &@~ pgroonga_condition(' ');
    

How to upgrade

If you're using PGroonga 2.0.0 or later, you can upgrade by steps in "Compatible case" in Upgrade document.

If you're using PGroonga 1.Y.Z, you can upgrade by steps in "Incompatible case" in Upgrade document.

Support service

If you need commercial support for PGroonga, contact us.

Conclusion

Try PGroonga when you want to perform fast full text search against all languages on PostgreSQL!

2024-07-04

Groonga 14.0.5 has been released

Groonga 14.0.5 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • Added a new feature that objects(table or column) as remove as possible.

    The crash safe feature of PGroonga will use this feature mainly.

    PGroonga will apply PGroonga’s WAL to standby database automatically by using Custom WAL Resource Managers. However, when PGroonga use Custom WAL Resource Managers, all replications are stop if PGroonga fail application of PGroonga’s WAL due to break Groonga’s object. So, if broken objects exist in database, Groonga will try as remove as possible objects by using this feature.

Fixes

  • [query()] Fixed a bug that the order of evaluation of 'A || query("...", "B C")' is wrong

    Here is occurrence condition of this problem.

    • We use OR search and query().
    • We use AND search in query().
    • The order of condition expression is 'A || query("...", "B C")'.

    So, this problem doesn't occur if we use only query() or we don't use AND search in query().

    We expect that {"name": "Alice", "memo": "Groonga user"} is hit in the following example. However, if this problem occurred, the following query had not been hit.

    table_create Users TABLE_NO_KEY
    column_create Users name COLUMN_SCALAR ShortText
    column_create Users memo COLUMN_SCALAR ShortText
    
    load --table Users
    [
    {"name": "Alice", "memo": "Groonga user"},
    {"name": "Bob",   "memo": "Rroonga user"}
    ]
    select Users \
      --output_columns 'name, memo, _score' \
      --filter 'memo @ "Groonga" || query("name", "Bob Rroonga")'
    [[0,0.0,0.0],[[[0],[["name","ShortText"],["memo","ShortText"],["_score","Int32"]]]]]
    

    After the fix, {"name": "Alice", "memo": "Groonga user"} is hit such as the following example.

    select Users \
      --output_columns 'name, memo, _score' \
      --filter 'memo @ "Groonga" || query("name", "Bob Rroonga")'
    [
      [
        0,
        1719376617.537505,
        0.002481460571289062
      ],
      [
        [
          [
            1
          ],
          [
            [
              "name",
              "ShortText"
            ],
            [
              "memo",
              "ShortText"
            ],
            [
              "_score",
              "Int32"
            ]
          ],
          [
            "Alice",
            "Groonga user",
            1
          ]
        ]
      ]
    ]
    
  • [select] Fixed a bug that a condition that evaluate prefix search in advance such as --query "A* OR B" returned wrong search result

    This problem may occur when prefix search evaluate in advance. This problem doesn't occur a condition that evaluate prefix search in the end such as --query A OR B*.

    If this problem occur, the Bo and the li of --query "Bo* OR li" are evaluated as a prefix search. As a result, The following query does not hit. Because li is evaluated as a prefix search as mentioned above.

    table_create Users TABLE_NO_KEY
    column_create Users name COLUMN_SCALAR ShortText
    
    load --table Users
    [
    ["name"],
    ["Alice"]
    ]
    
    select Users \
      --match_columns name \
      --query "Bo* OR li"
    [
      [
        0,
        1719377505.628048,
        0.0007376670837402344
      ],
      [
        [
          [
            0
          ],
          [
            [
              "_id",
              "UInt32"
            ],
            [
              "name",
              "ShortText"
            ]
          ]
        ]
      ]
    ]
    

Conclusion

Please refert to the following news for more details. News Release 14.0.5

Let's search by Groonga!

2024-05-29

Groonga 14.0.4 has been released

Groonga 14.0.4 has been released!

How to install: Install

Changes

Here are important changes in this release:

Fixes

  • [query_parallel_or] Fixed a bug that the match_escalation_threshold or force_match_escalation options were ignored when using query_parallel_or().

    Before the fix, even when match_escalation_threshold was set to disable match escalation, the results still escalated when we use query_parallel_or(). This problem occurred only query_parallel_or(). query() don't occur this problem.

    Generally, we don't disable match escalation. Because we want to get something search results. The number of hits is 0 is the unwelcome result of us. Therefore, this problem has no effect by many users. However, it has effect by user who use stop word as below.

    plugin_register token_filters/stop_word
    
    table_create Memos TABLE_NO_KEY
    column_create Memos content COLUMN_SCALAR ShortText
    
    table_create Terms TABLE_PAT_KEY ShortText \
      --default_tokenizer TokenBigram \
      --normalizer NormalizerAuto \
      --token_filters TokenFilterStopWord
    column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
    column_create Terms is_stop_word COLUMN_SCALAR Bool
    load --table Terms
    [
       {"_key": "and", "is_stop_word": true}
    ]
    
    load --table Memos
    [
      {"content": "Hello"},
      {"content": "Hello and Good-bye"},
      {"content": "Good-bye"}
    ]
    
    select Memos \
      --filter 'query_parallel_or(["content", "content", "content", "content"], \
                "and", \
                {"options": {"TokenFilterStopWord.enable": true}})' \
      --match_escalation_threshold -1 \
      --sort_keys -_score
    

    We don't want to match a keyword that is registered as stopword. Therefore, we set -1 to match_escalation_threshold in the above example.

    We expect that Groonga doesn't return records in the above example because of escalation disable and search keyword(and) is registered as stopword. However, If this problem occur, Groonga returns match record. Because if we use query_parallel_or(), match_escalation_threshold doesn't work.

  • Fixed a bug that full-text search againt a reference column of a vector didn't work.

    This problem has occured Groonga v14.0.0 or later. This problem has effect if we execute full-text search against a reference column of a vector.

    We expected that Groonga returns [1, "Linux MySQL"] and [2, "MySQL Groonga"] as below example. However, before the fix, Groonga always returned 0 hits as below because of we executed full-text search on a reference column of a vector.

    table_create bugs TABLE_PAT_KEY UInt32
    
    table_create tags TABLE_PAT_KEY ShortText --default_tokenizer TokenDelimit
    column_create tags name COLUMN_SCALAR ShortText
    
    column_create bugs tags COLUMN_VECTOR tags
    
    load --table bugs
    [
      ["_key", "tags"],
      [1, "Linux MySQL"],
      [2, "MySQL Groonga"],
      [3, "Mroonga"]
    ]
    
    column_create tags bugs_tags_index COLUMN_INDEX bugs tags
    
    select --table bugs --filter 'tags @ "MySQL"'
    [
      [
        0,
        0.0,
        0.0
      ],
      [
        [
          [
            0
          ],
          [
            [
              "_id",
              "UInt32"
            ],
            [
              "_key",
              "UInt32"
            ],
            [
              "tags",
              "tags"
            ]
          ]
        ]
      ]
    ]
    

Conclusion

Please refert to the following news for more details. News Release 14.0.4

Let's search by Groonga!

2024-05-09

Groonga 14.0.3 has been released

Groonga 14.0.3 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • We optimized performance as below.

    • We optimized performance of OR and AND search when the number of hits were many.

    • We optimized performance of prefix search(@^).

    • We optimized performance of AND search when the number of records of A more than B in condition of A AND B.

    • We optimized performance of search when we used many dynamic columns.

  • token_ngram Added new option ignore_blank.

    We can replace TokenBigramIgnoreBlank with TokenNgram("ignore_blank", true) as below.

    Here is example of use TokenBigram.

    tokenize TokenBigram "! ! !" NormalizerAuto
    [
      [
        0,
        1715155644.64263,
        0.001013517379760742
      ],
      [
        {
          "value": "!",
          "position": 0,
          "force_prefix": false,
          "force_prefix_search": false
        },
        {
          "value": "!",
          "position": 1,
          "force_prefix": false,
          "force_prefix_search": false
        },
        {
          "value": "!",
          "position": 2,
          "force_prefix": false,
          "force_prefix_search": false
        }
      ]
    ]
    

    Here is example of use TokenBigramIgnoreBlank.

    tokenize TokenBigramIgnoreBlank "! ! !" NormalizerAuto
    [
      [
        0,
        1715155680.323451,
        0.0009913444519042969
      ],
      [
        {
          "value": "!!!",
          "position": 0,
          "force_prefix": false,
          "force_prefix_search": false
        }
      ]
    ]
    

    Here is example of use TokenNgram("ignore_blank", true).

    tokenize 'TokenNgram("ignore_blank", true)' "! ! !" NormalizerAuto
    [
      [
        0,
        1715155762.340685,
        0.001041412353515625
      ],
      [
        {
          "value": "!!!",
          "position": 0,
          "force_prefix": false,
          "force_prefix_search": false
        }
      ]
    ]
    
  • ubuntu Add support for Ubuntu 24.04 LTS (Noble Numbat).

Fixes

  • request_cancel Fix a bug that Groonga may crash when we execute request_cancel command while we execute the other query.

  • Fixed the unexpected error when using --post_filter with --offset greater than the post-filtered result

    In the same situation, using --filter with --offset doesn't raise the error. This inconsistency in behavior between --filter and --post-filter has now been resolved.

    table_create Users TABLE_PAT_KEY ShortText
    column_create Users age COLUMN_SCALAR UInt32
    load --table Users
    [
      ["_key", "age"],
      ["Alice", 21],
      ["Bob", 22],
      ["Chris", 23],
      ["Diana", 24],
      ["Emily", 25]
    ]
    select Users \
      --filter 'age >= 22' \
      --post_filter 'age <= 24' \
      --offset 3 \
      --sort_keys -age --output_pretty yes
    [
      [
        -68,
        1715224057.317582,
        0.001833438873291016,
        "[table][sort] grn_output_range_normalize failed",
        [
          [
            "grn_table_sort",
            "/home/horimoto/Work/free-software/groonga.tag/lib/sort.c",
            1052
          ]
        ]
      ]
    ]
    
  • Fixed a bug where incorrect search result could be returned when not all phrases within (...) matched using near phrase product.

    For example, there is no record which matched (2) condition using --query '*NPP1"(a) (2)"'. In this case, the expected behavior would be return no record. However, the actual behavior was equal to the query --query '*NPP and "(a)" as below. This means that despite no records matched (2), records like ax1 and axx1 were incorrectly returned.

    table_create Entries TABLE_NO_KEY
    column_create Entries content COLUMN_SCALAR Text
    
    table_create Terms TABLE_PAT_KEY ShortText   --default_tokenizer TokenNgram
    column_create Terms entries_content COLUMN_INDEX|WITH_POSITION Entries content
    load --table Entries
    [
      {"content": "ax1"},
      {"content": "axx1"}
    ]
    
    select Entries \
      --match_columns content \
      --query '*NPP1"(a) (2)"' \
      --output_columns 'content'
    [
      [
        0,
        1715224211.050228,
        0.001366376876831055
      ],
      [
        [
          [
            2
          ],
          [
            [
              "content",
              "Text"
            ]
          ],
          [
            "ax1"
          ],
          [
            "axx1"
          ]
        ]
      ]
    ]
    
  • Fixed a bug that rehash failed or data in a table broke when rehash occurred that the table with TABLE_HASH_KEY has 2^28 or more records.

  • Fixed a bug that highlight position slipped out of place in the following cases.

    • If full width space existed before highlight target characters as below.

      We expected that Groonga returned "Groonga <span class=\"keyword\">高</span>速!". However, Groonga returned "Groonga <span class=\"keyword\">高速</span>!" as below.

      table_create Entries TABLE_NO_KEY
      column_create Entries body COLUMN_SCALAR ShortText
      
      table_create Terms TABLE_PAT_KEY ShortText \
        --default_tokenizer 'TokenNgram("report_source_location", true)' \
        --normalizer 'NormalizerNFKC150("report_source_offset", true)'
      column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
      
      load --table Entries
      [
        {"body": "Groonga 高速!"}
      ]
      select Entries \
        --output_columns \
        --match_columns body \
        --query '高' \
        --output_columns 'highlight_html(body, Terms)'
      [
        [
          0,
          1715215640.979517,
          0.001608610153198242
        ],
        [
          [
            [
              1
            ],
            [
              [
                "highlight_html",
                null
              ]
            ],
            [
              "Groonga <span class=\"keyword\">高速</span>!"
            ]
          ]
        ]
      ]
      
    • If we used TokenNgram("loose_blank", true) and if highlight target characters included full width space as below.

      We expected that Groonga returned "<span class=\"keyword\">山田 太郎</span>". However, Groonga returned "<span class=\"keyword\">山田 太</span>" as below.

      table_create Entries TABLE_NO_KEY
      column_create Entries body COLUMN_SCALAR ShortText
      
      table_create Terms TABLE_PAT_KEY ShortText \
        --default_tokenizer 'TokenNgram("loose_blank", true, "report_source_location", true)' \
        --normalizer 'NormalizerNFKC150("report_source_offset", true)'
      column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
      
      load --table Entries
      [
        {"body": "山田 太郎"}
      ]
      
      select Entries --output_columns \
        --match_columns body --query '山田太郎' \
        --output_columns 'highlight_html(body, Terms)' --output_pretty yes
      [
        [
          0,
          1715220409.096246,
          0.0004854202270507812
        ],
        [
          [
            [
              1
            ],
            [
              [
                "highlight_html",
                null
              ]
            ],
            [
              "<span class=\"keyword\">山田 太</span>"
            ]
          ]
        ]
      ]
      
    • If white space existed in the front of highlight target characters as below.

      We expected that Groonga returned " <span class=\"keyword\">山</span>田太郎". However, Groonga returned " <span class=\"keyword\">山</span>" as below.

      table_create Entries TABLE_NO_KEY
      column_create Entries body COLUMN_SCALAR ShortText
      
      table_create Terms TABLE_PAT_KEY ShortText \
        --default_tokenizer 'TokenNgram("report_source_location", true)' \
        --normalizer 'NormalizerNFKC150("report_source_offset", true)'
      column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
      
      load --table Entries
      [
        {"body": " 山田太郎"}
      ]
      
      select Entries \
        --output_columns \
        --match_columns body \
        --query '山' \
        --output_columns 'highlight_html(body, Terms)' --output_pretty yes
      [
        [
          0,
          1715221627.002193,
          0.001977920532226562
        ],
        [
          [
            [
              1
            ],
            [
              [
                "highlight_html",
                null
              ]
            ],
            [
              " <span class=\"keyword\">山</span>"
            ]
          ]
        ]
      ]
      
    • If the second character of highlight target was full width space as below.

      We expected that Groonga returned "<span class=\"keyword\">山 田</span>太郎". However, Groonga returned "<span class=\"keyword\">山 田太</span>郎" as below.

      table_create Entries TABLE_NO_KEY
      column_create Entries body COLUMN_SCALAR ShortText
      
      table_create Terms TABLE_PAT_KEY ShortText \
        --default_tokenizer 'TokenNgram("report_source_location", true)' \
        --normalizer 'NormalizerNFKC150("report_source_offset", true)'
      column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
      
      load --table Entries
      [
        {"body": "山 田太郎"}
      ]
      
      select Entries \
        --output_columns \
        --match_columns body \
        --query '山 田' \
        --output_columns 'highlight_html(body, Terms)'
      [
        [
          0,
          1715222501.496007,
          0.0005536079406738281
        ],
        [
          [
            [
              0
            ],
            [
              [
                "highlight_html",
                "<span class=\"keyword\">山 田太</span>郎"
              ]
            ]
          ]
        ]
      ]
      

Conclusion

Please refert to the following news for more details. News Release 14.0.3

Let's search by Groonga!

2024-03-29

Groonga 14.0.2 has been released

Groonga 14.0.2 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • Reduced a log level of a log when Groonga setting normalizers/tokenizer/token_filters against temporary table.

    For example, the target log of this modification is the following log.

    DDL:1234567890:set_normalizers NormalizerAuto
    

    PGroonga sets normalizers against temporary table on start. So, this log becomes noise. Because this log become output when PGroonga start because of PGroonga’s default log level is notice.

    Therefore, we reduce log level to debug for the log since this release. Thus, this log does not output when PGroonga start in default.

Conclusion

Please refert to the following news for more details. News Release 14.0.2

Let's search by Groonga!