BloGroonga

2018-04-29

Groonga 8.0.2 has been released

Groonga 8.0.2 has been released!

In this release, you can "define" custom tokenizer and normalizer via options, without any programming. It helps you to search sources including many orthographical variants.

How to install: Install

Changes

Here are important changes in this release:

  • [logical_range_filter] Added sort_keys option.
  • Added a new function time_format(). You can specify time format against a column of Time type, with the format same to strftime.
  • [tokenizers] Support new tokenizer TokenNgram. You can define its behavior dynamically.
  • [normalizers] Support new normalizer NormalizerNFKC100. It is based on Unicode NFKC for Unicode 10.0.
  • [normalizers] Support options for normalizers NormalizerNFKC51 and NormalizerNFKC100. You can change normalizer's behavior dynamically.
  • [dump][schema] Add support for options of tokenizer and normalizer. As the result, Groonga 8.0.1 and earlier versions cannot import dump and schema generated by Groonga 8.0.2 or later, and they will occurs error due to unsupported information.

[logical_range_filter] Added sort_keys option

logical_range_filter now supports a new option sort_keys, corresponding to sort_keys in select.

Note that it works only for single search target shard and doesn't work for multiple search target shards. For more details, see the command reference.

Added a new function time_format()

Now you can specify time format against a column of Time type, with the format same to strftime.

For example, the following command line will output the _key column as both UNIX time and a human readable format like 2018-04-29T10:30:00:

select Timestamps --sortby _id --limit -1 --output_columns '_key, time_format(_key, "%Y-%m-%dT%H:%M:%S")'

[tokenizers] Support new tokenizer TokenNgram

Now a new tokenizer TokenNgram is available. You can define its behavior dynamically via its options. Options are given via the style 'TokenNgram("[name 1]", [value 1], "[name 2]", [value 2], ...). For example:

table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer 'TokenNgram("n", 2, "loose_symbol", true)' --normalizer NormalizerAuto

[normalizers] Support new normalizer NormalizerNFKC100

Now a new normalizer NormalizerNFKC100, based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 10.0 is available.

Both it and NormalizerNFKC51 supports options. For more details, see the next section.

[normalizers] Support options for normalizers NormalizerNFKC51 and NormalizerNFKC100

Both normalizers NormalizerNFKC51 and NormalizerNFKC100 now support options to change their behavior dyanmically. Options are given via the style 'NormalizerNFKC100("[name 1]", [value 1], "[name 2]", [value 2], ...). For example:

table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer 'NormalizerNFKC100("unify_kana", true, "unify_kana_case", true)'

[dump][schema] Add support for options of tokenizer and normalizer

dump and schema commands now report options for tokenizers (TokenNgram) and normalizers (NormalizerNFKC51 and NormalizerNFKC100.), like:

table_create Site TABLE_HASH_KEY ShortText
column_create Site title COLUMN_SCALAR ShortText

table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer "NormalizerNFKC100(\"unify_kana\", true, \"unify_kana_case\", true)"

As the result, Groonga 8.0.1 and earlier versions cannot import results of dump and schema including such options information.

Tokenizers and normalizers without options are still reported same as on the old versions, so you need to be careful only when you use new features of tokenizers or normalizers described above.

Conclusion

See Release 8.0.2 2018-04-29 about detailed changes since 8.0.1

Let's search by Groonga!