Groonga 8.0.9 has been released
Groonga 8.0.9 has been released!
How to install: Install
Changes
Here are important changes in this release:
- The TokenDelimit tokenizer now supports any delimiter not only whitespaces.
- Improvements around normalizers and token filters mainly for better internationalization.
The TokenDelimit tokenizer now supports any delimiter not only whitespaces
New options delimiter
and pattern
are now available for TokenDelimit
to specify any delimiter, like:
% groonga
> tokenize 'TokenDelimit("delimiter", ",")' "A,B"
=> "A", "B"
> tokenize 'TokenDelimit("delimiter", ",")' "A , B"
=> "A ", " B" (whitespace still there)
> tokenize 'TokenDelimit("pattern", "\\\\s*,\\\\s*")' "A, B ,C"
=> "A", "B", "C"
Please note that characters not specified by the delimiter
option are not treated as delimiters like as the second example.
The pattern
option accepts a regular experssion, and it will be useful for input like as the third example containing random whitespaces.
Improvements around normalizers and token filters mainly for better internationalization
The NormalizerNFKC100
normalizer now supports a new option unify_to_romaji
to convert both hiragana and katakana to romaji, like:
% groonga
> normalize 'NormalizerNFKC100("unify_to_romaji", true)' "リンゴ みかん"
=> "ringo mikan"
And a new built-in token filter TokenFilterNFKC100
is added.
It also can covert katakana to hiragana like NormalizerNFKC100
with the unify_kana
option, like:
% groonga
> tokenize TokenMecab "リンゴおいしい" --token_filters TokenFilterNFKC100
=> "リンゴ", "おいしい" ("リンゴ" was normalized)
> tokenize TokenMecab "リンゴおいしい" --token_filters 'TokenFilterNFKC100("unify_kana", true)'
=> "りんご", "おいしい" ("リンゴ" was normalized and converted)
The TokenFilterStem
filter now supports a new option algorithm
for stemming not only in English but also in other languages: French, Spanish, Portuguese, Italian, Romanian, German, Dutch, Swedish, Norwegian, Danish, Russian, and Finnish.
The test for the option describes its usage.
The TokenFilterStopWord
filter now supports a new option column
to change the name of a column for stop words from is_stop_word
to any other.
The test for the option describes its usage.
Conclusion
See Release 8.0.9 2018-11-29 about detailed changes since 8.0.8
Let's search by Groonga!