BloGroonga

2018-11-29

Groonga 8.0.9 has been released

Groonga 8.0.9 has been released!

How to install: Install

Changes

Here are important changes in this release:

The TokenDelimit tokenizer now supports any delimiter not only whitespaces

New options delimiter and pattern are now available for TokenDelimit to specify any delimiter, like:

% groonga
> tokenize 'TokenDelimit("delimiter", ",")' "A,B"
=> "A", "B"
> tokenize 'TokenDelimit("delimiter", ",")' "A , B"
=> "A ", " B" (whitespace still there)
> tokenize 'TokenDelimit("pattern", "\\\\s*,\\\\s*")' "A, B  ,C"
=> "A", "B", "C"

Please note that characters not specified by the delimiter option are not treated as delimiters like as the second example. The pattern option accepts a regular experssion, and it will be useful for input like as the third example containing random whitespaces.

Improvements around normalizers and token filters mainly for better internationalization

The NormalizerNFKC100 normalizer now supports a new option unify_to_romaji to convert both hiragana and katakana to romaji, like:

% groonga
> normalize 'NormalizerNFKC100("unify_to_romaji", true)' "リンゴ みかん"
=> "ringo mikan"

And a new built-in token filter TokenFilterNFKC100 is added. It also can covert katakana to hiragana like NormalizerNFKC100 with the unify_kana option, like:

% groonga
> tokenize TokenMecab "リンゴおいしい" --token_filters TokenFilterNFKC100
=> "リンゴ", "おいしい" ("リンゴ" was normalized)
> tokenize TokenMecab "リンゴおいしい" --token_filters 'TokenFilterNFKC100("unify_kana", true)'
=> "りんご", "おいしい" ("リンゴ" was normalized and converted)

The TokenFilterStem filter now supports a new option algorithm for stemming not only in English but also in other languages: French, Spanish, Portuguese, Italian, Romanian, German, Dutch, Swedish, Norwegian, Danish, Russian, and Finnish. The test for the option describes its usage.

The TokenFilterStopWord filter now supports a new option column to change the name of a column for stop words from is_stop_word to any other. The test for the option describes its usage.

Conclusion

See Release 8.0.9 2018-11-29 about detailed changes since 8.0.8

Let's search by Groonga!