News - 15 series#

Release 15.1.7 - 2025-09-29#

In this release, NormalizerNFKC can normalize Japanese iteration marks and fixed installation failure on AlmaLinux 10.

Improvements#

[grndb] Improved error handling for large database files#

Previously, grndb terminated abnormally when processing database files that exceeded the filesystem stat limit (such as files files larger than 2GB on Windows). In this release, when grndb processes such files in the Groonga database directory (the db file and related db.* files), it records an error for each problematic file and continues to completion without aborting.

[unify_iteration_mark] Added support for iteration marks with unify_iteration_mark option#

The unify_iteration_mark option now supports additional iteration mark characters. This option treats iteration marks as repeats of the immediately preceding character as below.

  • Hiragana Iteration Mark ゝ (U+309D)

  • Hiragana Voiced Iteration Mark ゞ (U+309E)

  • Katakana Iteration Mark ヽ (U+30FD)

  • Katakana Voiced Iteration Mark ヾ (U+30FE)

  • Ideographic Iteration Mark 々 (U+3005) - limitation: only repeats the immediately preceding single character

  • Vertical Ideographic Iteration Mark 〻 (U+303B) - limitation: only repeats the immediately preceding single character

Here is an example of using unify_iteration_mark option.

normalize \
  'NormalizerNFKC("unify_iteration_mark", true)' \
  "こゝろ"
[
  [
    0,
    1758763896.821301,
    0.0001749992370605469
  ],
  {
    "normalized": "こころ",
    "types": [
    ],
    "checks": [
    ]
  }
]

Note

For Ideographic Iteration Mark (々) and Vertical Ideographic Iteration Mark (〻), this feature only repeats the immediately preceding single character. Patterns beyond repeat the previous one character are not supported like the following cases.

Examples:

  • “部分々々” -> “部分部分”

  • “古々々米” -> “古古古米”

Added new command to list available commands#

A new command_list command has been added that returns a list of all available Groonga commands. Currently, this command returns only the command ID and name for each command. Using this command could enable automatic generation of client library APIs and help implement Groonga MCP (Model Context Protocol) servers. In future releases, we plan to expand the output to include command summaries, descriptions, and detailed argument information.

command_list
[
  [
    0,
    1758764636.669152,
    0.0002362728118896484
  ],
  {
    "cache_limit": {
      "id": 150,
      "name": "cache_limit"
    },
    ...
  }
]

[TokenFilterStem] Added support for non-ASCII alphabets#

GH-2539

Reported by Tsai, Xing Wei

Previously, the TokenFilterStem filter only worked with ASCII alphabets. Now it supports stemming for non-ASCII alphabets such as Arabic.

Here is an example of using TokenFilterStem with Arabic text:

table_create Terms TABLE_PAT_KEY ShortText \
  --default_tokenizer TokenNgram \
  --normalizer 'NormalizerNFKC("version", "16.0.0")' \
  --token_filters 'TokenFilterStem("algorithm", "arabic")'

table_tokenize Terms "الكتاب مفيد" --mode ADD
[
  [
    0,
    0.0,
    0.0
  ],
  [
    {
      "value": "كتاب",
      "position": 0,
      "force_prefix": false,
      "force_prefix_search": false
    },
    {
      "value": "مفيد",
      "position": 1,
      "force_prefix": false,
      "force_prefix_search": false
    }
  ]
]

Fixes#

[AlmaLinux] Fixed installation failure on AlmaLinux 10#

Previously, installation could fail with a dnf GPG check error due to the following outdated RPM GPG key in groonga-release. We’ve removed the old key and now ship only the RSA4096 key, so installs work as expected now.

$ dnf install -y --enablerepo=epel --enablerepo=crb groonga
...
error: Certificate 72A7496B45499429:
  Policy rejects 72A7496B45499429: No binding signature at time 2025-09-24T09:35:25Z
Key import failed (code 2). Failing package is: groonga-15.1.5-1.el10.x86_64
 GPG Keys are configured as: file:///etc/pki/rpm-gpg/RPM-GPG-KEY-34839225, file:///etc/pki/rpm-gpg/RPM-GPG-KEY-45499429
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
Error: GPG check FAILED
Who should upgrade?#

Most users do not need to upgrade. Only users who installed groonga-release on AlmaLinux 10 before 2025/09/24 need to upgrade to the latest package using the steps below. If you installed groonga-release on AlmaLinux 10 on or after 2025/09/24, no action is required.

$ sudo dnf upgrade --refresh groonga-release

After upgrading the package, groonga-release contains only the new key:

$ dnf repoquery -l --installed groonga-release
/etc/pki/rpm-gpg
/etc/pki/rpm-gpg/RPM-GPG-KEY-34839225
/etc/yum.repos.d
/etc/yum.repos.d/groonga-almalinux.repo
/etc/yum.repos.d/groonga-amazon-linux.repo

Thanks#

  • Tsai, Xing Wei

Release 15.1.5 - 2025-08-29#

In this release, we supported KEY_LARGE flag for TABLE_PAT_KEY!

Debian 13 “Trixie” has been released on 9 August 2025. Groonga is already supported on Debian 13! Please try it!

Improvements#

[table_create] Added support for KEY_LARGE flag for TABLE_PAT_KEY#

You can now use the KEY_LARGE flag with TABLE_PAT_KEY tables to expand the maximum total key size from 4GiB to 1TiB, similar to TABLE_HASH_KEY tables as below. This allows you to store more keys in total.

table_create LargePaths TABLE_PAT_KEY|KEY_LARGE ShortText

[NormalizerNFKC] Added support for unify_hyphen_and_prolonged_sound_mark and remove_symbol combination#

Previously, when both unify_hyphen_and_prolonged_sound_mark and remove_symbol options were enabled together, This combination didn’t remove hyphen characters as expected because the hyphen characters were not properly treated as symbols to be removed.

This release fixes this issue. So, hyphen characters are properly removed from the normalized text as below.

normalize \
  'NormalizerNFKC("remove_symbol", true, \
  "unify_hyphen_and_prolonged_sound_mark", true)' \
  "090ー1234-5678"
[
  [
    0,
    1756363926.409565,
    0.0003023147583007812
  ],
  {
    "normalized": "09012345678",
    "types": [
    ],
    "checks": [
    ]
  }
]

[AlmaLinux] Added support for AlmaLinux 10#

AlmaLinux 10 packages are now available. You can install Groonga on AlmaLinux 10 using the standard package installation methods.

Fixes#

[Others: Build with CMake] Fixed how to build/install#

GH-2479

Patched by Tsutomu Katsube

The documentation included an incorrect -B option in the cmake --build and cmake --install commands, which caused build errors.

The corrected commands are now:

cmake --build <Build directory path>
cmake --install <Build directory path>

[truncate] Fixed a bug where KEY_LARGE flag was lost after executing truncate command#

This issue meant that when you executed the truncate command on a TABLE_HASH_KEY table with the KEY_LARGE flag, the table could no longer hold more than 4 GiB of total key data, because the KEY_LARGE flag was removed during the truncation.

[grndb] Fixed a bug that the database checked by grndb check command was not enough#

The issue occurred because the grndb check command did not check part of the database files.

However, even without this modification, grndb check correctly returns the results of table and column checks. grndb cehck can check database correctly by this modification.

Thanks#

  • Tsutomu Katsube

Release 15.1.4 - 2025-07-29#

In this release, we fixed a bug in the interval calculation between phrases in *ONPP operator.

Fixes#

[Ordered near phrase product search] Fixed a bug in the interval calculation between phrases#

This problem may occur when we use *ONPP with MAX_ELEMENT_INTERVAL such as *ONPP-1,0,10"(abc bcd) (defg)". If you don’t use MAX_ELEMENT_INTERVAL, this problem doesn’t occur.

Please refer to the following links for usage and syntax of *ONPP.

If this problem occurs, the following things may happen.

  • Groonga may return records that shouldn’t be matched.

  • Groonga may not return records that should be matched.

Release 15.1.3 - 2025-07-18#

Improvements#

[Apache Arrow] Added support for Apache Arrow C++ 21.0.0#

Release 15.1.2 - 2025-07-07#

Improvements#

[Windows] Drop support for Groonga package that is built with Visual Studio 2019#

We don’t provide the following packages since this release.

  • groonga-xx.x.x-x64-vs2019.zip

  • groonga-xx.x.x-x64-vs2019-with-vcruntime.zip

Fixes#

[Near phrase search] Fixed a bug that interval between phrases calculation#

This problem may occur when we use *NP, *NPP, or *ONP with MAX_ELEMENT_INTERVAL as below.

  • *NP-1,0,12"abc ef"

  • *NPP-1,0,10"(abc bcd) (ef)"

  • *ONP-1,0,5|6 "abc defghi jklmnop"

If you don’t use MAX_ELEMENT_INTERVAL, this problem doesn’t occur.

Please refer to the following links about usage and syntax of *NP, *NPP, or *ONP.

If this problem occurs, the following things may happen.

  • Groonga may return records shouldn’t be a hit.

  • Groonga may not return records that should be returned as hits.

Release 15.1.1 - 2025-06-02#

This release updates TokenMecab to preserve user-defined entries with spaces as single tokens.

Improvements#

TokenMecab: Fix unintended splitting of user-defined entries with spaces#

Previously, TokenMecab split user-defined entries containing spaces (e.g., “search engine”) into separate tokens (“search” and “engine”). This release fixes this issue, so entries with embedded spaces are now preserved and handled as single tokens like “search engine” as follows.

tokenize TokenMecab "search engine" --output_pretty yes
[
  [
    0,
    1748413131.972704,
    0.0003032684326171875
  ],
  [
    {
      "value": "search engine",
      "position": 0,
      "force_prefix": false,
      "force_prefix_search": false
    }
  ]
]

Fixes#

Fixed many typos in documentation#

GH-2332, GH-2333, GH-2334, GH-2335, GH-2336, GH-2337, GH-2338

Patched by Vasilii Lakhin.

Thanks#

  • Vasilii Lakhin

Release 15.0.9 - 2025-05-08#

This release adds the tokenizer’s option to make token inspection simpler and improves negative-division semantics for unsigned integer.

Improvements#

tokenize/table_tokenize: Added output_style option#

This output_style option to the tokenize/table_tokenize command makes it easier to focus on the tokens when you don’t need the full attribute set.

Here is example of using output_style option.

tokenize TokenNgram "Fulltext Search" --output_style simple
[
  [
    0,
    1746573056.540744,
    0.0007045269012451172
  ],
  [
    "Fu",
    "ul",
    "ll",
    "lt",
    "te",
    "ex",
    "xt",
    "t ",
    " S",
    "Se",
    "ea",
    "ar",
    "rc",
    "ch",
    "h"
  ]
]

Clarified X / negative value semantics#

Previously, only dividing X by -1/1.0 returns -X for unsigned integers. From this release, dividing by any negative value will yield the mathematically expected negative result as follows.

  • Before: X / -2 might not return -(X / 2).

  • After: X / -2 always returns -(X / 2).

This is a backward incompatible change but we assume that no user depends on this behavior.

Release 15.0.4 - 2025-03-29#

Improvements#

Clarified X / -1 and X / -1.0 semantics#

In many languages, X / -1 and X / -1.0 return -X. But Groonga may not return -X when X is unsigned integer.

X / -1 and X / -1.0 always return -X from this release.

This is a backward incompatible change but we assume that no user depends on this behavior.

Release 15.0.3 - 2025-03-10#

Improvements#

Offline index construction: Added support for parallel construction with TABLE_HASH_KEY lexicon#

Parallel offline index construction iterates sorted terms internally. TABLE_PAT_KEY and TABLE_PAT_KEY can do it effectively because they are based on tree. But TABLE_HASH_KEY can’t do it effectively because it’s not based on tree. So we didn’t support parallel offline index construction with TABLE_HASH_KEY lexicon.

This release adds support for parallel offline index construction with TABLE_HASH_KEY lexicon. It sort terms in a normal way. So it’s not so effective. Parallel offline index construction with TABLE_HASH_KEY lexicon will be slower than TABLE_PAT_KEY/TABLE_DAT_KEY. But it may be faster than sequential offline index construction with TABLE_HASH_KEY lexicon.

Release 15.0.2 - 2025-02-21#

Fixes#

Offline index construction: Fixed a bug that options may be ignored in parallel construction#

Groonga may ignore options of Normalizers, Tokenizers and/or Token filters in the target index when offline index construction is executed in parallel.

This issue may occur when:

If NormalizerTable is used and this happens, the offline index construction is failed. Because NormalizerTable has a required parameter. If options are ignored, the required parameter is missing.

Release 15.0.1 - 2025-02-20#

Improvements#

[Ubuntu] Dropped support for Ubuntu 20.04 (Focal Fossa)#

Ubuntu 20.04 will reach EOL in May 2025, so support for it has been dropped starting with this release.

Release 15.0.0 - 2025-02-09#

This is our annual major release! This release doesn’t have any backward incompatible changes! So you can upgrade Groonga without migrating your existing databases. You can still use your existing databases as-is.

Improvements#

TABLE_PAT_KEY: Added support for Float32 as key type#

GH-2211

TABLE_PAT_KEY encodes/decodes numeric keys for fast search internally. So TABLE_PAT_KEY must know how to encode/decode keys. TABLE_PAT_KEY didn’t know how to encode/decode Float32 before this release. Now, TABLE_PAT_KEY can encode/decode Float32. So you can use Float32 as a TABLE_PAT_KEY key type like other numeric types such as Int32 and Float now.