Groonga 14.0.3 has been released

BloGroonga

2024-05-09

Groonga 14.0.3 has been released

Groonga 14.0.3 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

We optimized performance as below.
- We optimized performance of OR and AND search when the number of hits were many.
- We optimized performance of prefix search(@^).
- We optimized performance of AND search when the number of records of A more than B in condition of A AND B.
- We optimized performance of search when we used many dynamic columns.

token_ngram Added new option ignore_blank.

We can replace TokenBigramIgnoreBlank with TokenNgram("ignore_blank", true) as below.

Here is example of use TokenBigram.

tokenize TokenBigram "! ! !" NormalizerAuto
[
  [
    0,
    1715155644.64263,
    0.001013517379760742
  ],
  [
    {
      "value": "!",
      "position": 0,
      "force_prefix": false,
      "force_prefix_search": false
    },
    {
      "value": "!",
      "position": 1,
      "force_prefix": false,
      "force_prefix_search": false
    },
    {
      "value": "!",
      "position": 2,
      "force_prefix": false,
      "force_prefix_search": false
    }
  ]
]

Here is example of use TokenBigramIgnoreBlank.

tokenize TokenBigramIgnoreBlank "! ! !" NormalizerAuto
[
  [
    0,
    1715155680.323451,
    0.0009913444519042969
  ],
  [
    {
      "value": "!!!",
      "position": 0,
      "force_prefix": false,
      "force_prefix_search": false
    }
  ]
]

Here is example of use TokenNgram("ignore_blank", true).

tokenize 'TokenNgram("ignore_blank", true)' "! ! !" NormalizerAuto
[
  [
    0,
    1715155762.340685,
    0.001041412353515625
  ],
  [
    {
      "value": "!!!",
      "position": 0,
      "force_prefix": false,
      "force_prefix_search": false
    }
  ]
]

ubuntu Add support for Ubuntu 24.04 LTS (Noble Numbat).

Fixes

request_cancel Fix a bug that Groonga may crash when we execute request_cancel command while we execute the other query.

Fixed the unexpected error when using --post_filter with --offset greater than the post-filtered result

In the same situation, using --filter with --offset doesn't raise the error. This inconsistency in behavior between --filter and --post-filter has now been resolved.

table_create Users TABLE_PAT_KEY ShortText
column_create Users age COLUMN_SCALAR UInt32
load --table Users
[
  ["_key", "age"],
  ["Alice", 21],
  ["Bob", 22],
  ["Chris", 23],
  ["Diana", 24],
  ["Emily", 25]
]
select Users \
  --filter 'age >= 22' \
  --post_filter 'age <= 24' \
  --offset 3 \
  --sort_keys -age --output_pretty yes
[
  [
    -68,
    1715224057.317582,
    0.001833438873291016,
    "[table][sort] grn_output_range_normalize failed",
    [
      [
        "grn_table_sort",
        "/home/horimoto/Work/free-software/groonga.tag/lib/sort.c",
        1052
      ]
    ]
  ]
]

Fixed a bug where incorrect search result could be returned when not all phrases within (...) matched using near phrase product.

For example, there is no record which matched (2) condition using --query '*NPP1"(a) (2)"'. In this case, the expected behavior would be return no record. However, the actual behavior was equal to the query --query '*NPP and "(a)" as below. This means that despite no records matched (2), records like ax1 and axx1 were incorrectly returned.

table_create Entries TABLE_NO_KEY
column_create Entries content COLUMN_SCALAR Text

table_create Terms TABLE_PAT_KEY ShortText   --default_tokenizer TokenNgram
column_create Terms entries_content COLUMN_INDEX|WITH_POSITION Entries content
load --table Entries
[
  {"content": "ax1"},
  {"content": "axx1"}
]

select Entries \
  --match_columns content \
  --query '*NPP1"(a) (2)"' \
  --output_columns 'content'
[
  [
    0,
    1715224211.050228,
    0.001366376876831055
  ],
  [
    [
      [
        2
      ],
      [
        [
          "content",
          "Text"
        ]
      ],
      [
        "ax1"
      ],
      [
        "axx1"
      ]
    ]
  ]
]

Fixed a bug that rehash failed or data in a table broke when rehash occurred that the table with TABLE_HASH_KEY has 2^28 or more records.

Fixed a bug that highlight position slipped out of place in the following cases.

If full width space existed before highlight target characters as below.

We expected that Groonga returned "Groonga　高速！". However, Groonga returned "Groonga　高速！" as below.

table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText

table_create Terms TABLE_PAT_KEY ShortText \
  --default_tokenizer 'TokenNgram("report_source_location", true)' \
  --normalizer 'NormalizerNFKC150("report_source_offset", true)'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body

load --table Entries
[
  {"body": "Groonga　高速！"}
]
select Entries \
  --output_columns \
  --match_columns body \
  --query '高' \
  --output_columns 'highlight_html(body, Terms)'
[
  [
    0,
    1715215640.979517,
    0.001608610153198242
  ],
  [
    [
      [
        1
      ],
      [
        [
          "highlight_html",
          null
        ]
      ],
      [
        "Groonga　<span class=\"keyword\">高速</span>！"
      ]
    ]
  ]
]

If we used TokenNgram("loose_blank", true) and if highlight target characters included full width space as below.

We expected that Groonga returned "山田太郎". However, Groonga returned "山田太" as below.

table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText

table_create Terms TABLE_PAT_KEY ShortText \
  --default_tokenizer 'TokenNgram("loose_blank", true, "report_source_location", true)' \
  --normalizer 'NormalizerNFKC150("report_source_offset", true)'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body

load --table Entries
[
  {"body": "山田 太郎"}
]

select Entries --output_columns \
  --match_columns body --query '山田太郎' \
  --output_columns 'highlight_html(body, Terms)' --output_pretty yes
[
  [
    0,
    1715220409.096246,
    0.0004854202270507812
  ],
  [
    [
      [
        1
      ],
      [
        [
          "highlight_html",
          null
        ]
      ],
      [
        "<span class=\"keyword\">山田 太</span>"
      ]
    ]
  ]
]

If white space existed in the front of highlight target characters as below.

We expected that Groonga returned " 山田太郎". However, Groonga returned " 山" as below.

table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText

table_create Terms TABLE_PAT_KEY ShortText \
  --default_tokenizer 'TokenNgram("report_source_location", true)' \
  --normalizer 'NormalizerNFKC150("report_source_offset", true)'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body

load --table Entries
[
  {"body": " 山田太郎"}
]

select Entries \
  --output_columns \
  --match_columns body \
  --query '山' \
  --output_columns 'highlight_html(body, Terms)' --output_pretty yes
[
  [
    0,
    1715221627.002193,
    0.001977920532226562
  ],
  [
    [
      [
        1
      ],
      [
        [
          "highlight_html",
          null
        ]
      ],
      [
        " <span class=\"keyword\">山</span>"
      ]
    ]
  ]
]

If the second character of highlight target was full width space as below.

We expected that Groonga returned "山　田太郎". However, Groonga returned "山　田太郎" as below.

table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText

table_create Terms TABLE_PAT_KEY ShortText \
  --default_tokenizer 'TokenNgram("report_source_location", true)' \
  --normalizer 'NormalizerNFKC150("report_source_offset", true)'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body

load --table Entries
[
  {"body": "山　田太郎"}
]

select Entries \
  --output_columns \
  --match_columns body \
  --query '山　田' \
  --output_columns 'highlight_html(body, Terms)'
[
  [
    0,
    1715222501.496007,
    0.0005536079406738281
  ],
  [
    [
      [
        0
      ],
      [
        [
          "highlight_html",
          "<span class=\"keyword\">山　田太</span>郎"
        ]
      ]
    ]
  ]
]

Conclusion

Please refert to the following news for more details. News Release 14.0.3

Let's search by Groonga!

BloGroonga