Groonga 8.0.9 has been released
Groonga 8.0.9 has been released!
How to install: Install
Changes
Here are important changes in this release:
- The TokenDelimit tokenizer now supports any delimiter not only whitespaces.
- Improvements around normalizers and token filters mainly for better internationalization.
The TokenDelimit tokenizer now supports any delimiter not only whitespaces
New options delimiter and pattern are now available for TokenDelimit to specify any delimiter, like:
% groonga
> tokenize 'TokenDelimit("delimiter", ",")' "A,B"
=> "A", "B"
> tokenize 'TokenDelimit("delimiter", ",")' "A , B"
=> "A ", " B" (whitespace still there)
> tokenize 'TokenDelimit("pattern", "\\\\s*,\\\\s*")' "A, B ,C"
=> "A", "B", "C"
Please note that characters not specified by the delimiter option are not treated as delimiters like as the second example.
The pattern option accepts a regular experssion, and it will be useful for input like as the third example containing random whitespaces.
Improvements around normalizers and token filters mainly for better internationalization
The NormalizerNFKC100 normalizer now supports a new option unify_to_romaji to convert both hiragana and katakana to romaji, like:
% groonga
> normalize 'NormalizerNFKC100("unify_to_romaji", true)' "リンゴ みかん"
=> "ringo mikan"
And a new built-in token filter TokenFilterNFKC100 is added.
It also can covert katakana to hiragana like NormalizerNFKC100 with the unify_kana option, like:
% groonga
> tokenize TokenMecab "リンゴおいしい" --token_filters TokenFilterNFKC100
=> "リンゴ", "おいしい" ("リンゴ" was normalized)
> tokenize TokenMecab "リンゴおいしい" --token_filters 'TokenFilterNFKC100("unify_kana", true)'
=> "りんご", "おいしい" ("リンゴ" was normalized and converted)
The TokenFilterStem filter now supports a new option algorithm for stemming not only in English but also in other languages: French, Spanish, Portuguese, Italian, Romanian, German, Dutch, Swedish, Norwegian, Danish, Russian, and Finnish.
The test for the option describes its usage.
The TokenFilterStopWord filter now supports a new option column to change the name of a column for stop words from is_stop_word to any other.
The test for the option describes its usage.
Conclusion
See Release 8.0.9 2018-11-29 about detailed changes since 8.0.8
Let's search by Groonga!