Groonga 8.0.2 has been released
Groonga 8.0.2 has been released!
In this release, you can "define" custom tokenizer and normalizer via options, without any programming. It helps you to search sources including many orthographical variants.
How to install: Install
Changes
Here are important changes in this release:
- [logical_range_filter] Added
sort_keys
option. - Added a new function
time_format()
. You can specify time format against a column ofTime
type, with the format same tostrftime
. - [tokenizers] Support new tokenizer
TokenNgram
. You can define its behavior dynamically. - [normalizers] Support new normalizer
NormalizerNFKC100
. It is based on Unicode NFKC for Unicode 10.0. - [normalizers] Support options for normalizers
NormalizerNFKC51
andNormalizerNFKC100
. You can change normalizer's behavior dynamically. - [dump][schema] Add support for options of tokenizer and normalizer. As the result, Groonga 8.0.1 and earlier versions cannot import dump and schema generated by Groonga 8.0.2 or later, and they will occurs error due to unsupported information.
[logical_range_filter] Added sort_keys
option
logical_range_filter
now supports a new option sort_keys
, corresponding to sort_keys
in select.
Note that it works only for single search target shard and doesn't work for multiple search target shards. For more details, see the command reference.
Added a new function time_format()
Now you can specify time format against a column of Time
type, with the format same to strftime
.
For example, the following command line will output the _key
column as both UNIX time and a human readable format like 2018-04-29T10:30:00
:
select Timestamps --sortby _id --limit -1 --output_columns '_key, time_format(_key, "%Y-%m-%dT%H:%M:%S")'
[tokenizers] Support new tokenizer TokenNgram
Now a new tokenizer TokenNgram
is available.
You can define its behavior dynamically via its options.
Options are given via the style 'TokenNgram("[name 1]", [value 1], "[name 2]", [value 2], ...)
.
For example:
table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer 'TokenNgram("n", 2, "loose_symbol", true)' --normalizer NormalizerAuto
[normalizers] Support new normalizer NormalizerNFKC100
Now a new normalizer NormalizerNFKC100
, based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 10.0 is available.
Both it and NormalizerNFKC51
supports options.
For more details, see the next section.
[normalizers] Support options for normalizers NormalizerNFKC51
and NormalizerNFKC100
Both normalizers NormalizerNFKC51
and NormalizerNFKC100
now support options to change their behavior dyanmically.
Options are given via the style 'NormalizerNFKC100("[name 1]", [value 1], "[name 2]", [value 2], ...)
.
For example:
table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer 'NormalizerNFKC100("unify_kana", true, "unify_kana_case", true)'
[dump][schema] Add support for options of tokenizer and normalizer
dump
and schema
commands now report options for tokenizers (TokenNgram
) and normalizers (NormalizerNFKC51
and NormalizerNFKC100
.), like:
table_create Site TABLE_HASH_KEY ShortText
column_create Site title COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer "NormalizerNFKC100(\"unify_kana\", true, \"unify_kana_case\", true)"
As the result, Groonga 8.0.1 and earlier versions cannot import results of dump
and schema
including such options information.
Tokenizers and normalizers without options are still reported same as on the old versions, so you need to be careful only when you use new features of tokenizers or normalizers described above.
Conclusion
See Release 8.0.2 2018-04-29 about detailed changes since 8.0.1
Let's search by Groonga!