BloGroonga

2018-11-29

Groonga 8.0.9 has been released

Groonga 8.0.9 has been released!

How to install: Install

Changes

Here are important changes in this release:

The TokenDelimit tokenizer now supports any delimiter not only whitespaces

New options delimiter and pattern are now available for TokenDelimit to specify any delimiter, like:

% groonga
> tokenize 'TokenDelimit("delimiter", ",")' "A,B"
=> "A", "B"
> tokenize 'TokenDelimit("delimiter", ",")' "A , B"
=> "A ", " B" (whitespace still there)
> tokenize 'TokenDelimit("pattern", "\\\\s*,\\\\s*")' "A, B  ,C"
=> "A", "B", "C"

Please note that characters not specified by the delimiter option are not treated as delimiters like as the second example. The pattern option accepts a regular experssion, and it will be useful for input like as the third example containing random whitespaces.

Improvements around normalizers and token filters mainly for better internationalization

The NormalizerNFKC100 normalizer now supports a new option unify_to_romaji to convert both hiragana and katakana to romaji, like:

% groonga
> normalize 'NormalizerNFKC100("unify_to_romaji", true)' "リンゴ みかん"
=> "ringo mikan"

And a new built-in token filter TokenFilterNFKC100 is added. It also can covert katakana to hiragana like NormalizerNFKC100 with the unify_kana option, like:

% groonga
> tokenize TokenMecab "リンゴおいしい" --token_filters TokenFilterNFKC100
=> "リンゴ", "おいしい" ("リンゴ" was normalized)
> tokenize TokenMecab "リンゴおいしい" --token_filters 'TokenFilterNFKC100("unify_kana", true)'
=> "りんご", "おいしい" ("リンゴ" was normalized and converted)

The TokenFilterStem filter now supports a new option algorithm for stemming not only in English but also in other languages: French, Spanish, Portuguese, Italian, Romanian, German, Dutch, Swedish, Norwegian, Danish, Russian, and Finnish. The test for the option describes its usage.

The TokenFilterStopWord filter now supports a new option column to change the name of a column for stop words from is_stop_word to any other. The test for the option describes its usage.

Conclusion

See Release 8.0.9 2018-11-29 about detailed changes since 8.0.8

Let's search by Groonga!

2018-10-29

Groonga 8.0.8 has been released

Groonga 8.0.8 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • New options for the TokenMecab tokenizer.
  • Supported locking of a database during a io_flush.

New options for the TokenMecab tokenizer

TokenMecab now accepts target_class option:

target_class option searches a token of specifying a part-of-speech. This option can also specify subclasses and exclude or add specific part-of-speech of specific using + or -.

  • + adds part-of-speech of a search target.
    • If you specify only + or ``, search taget are all tokens.
  • - excludes part-of-speech from a search target.

For example, you can search all tokens exclude a pronoun as below.

'TokenMecab("target_class", "-名詞/代名詞", "target_class", "+")'

Supported locking of a database during a io_flush

The feature added to fix a bug that the Groonga is a crash when deleted a table of a target of a io_flush during execution of a io_flush. io_flush locks Groonga database while flushing. So, you can’t run the following commands while io_flush

  • column_create
  • column_remove
  • column_rename
  • logical_table_remove
  • object_remove
  • plugin_register
  • plugin_unregister
  • table_create
  • table_remove
  • table_rename

Conclusion

See Release 8.0.8 2018-10-29 about detailed changes since 8.0.7

Let's search by Groonga!

2018-09-29

Groonga 8.0.7 has been released

Groonga 8.0.7 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • New options for the TokenMecab tokenizer.
  • New options for the TokenNgram tokenizer.
  • Groonga now can grab plugins from multiple directories.

New options for the TokenMecab tokenizer

TokenMecab now accepts these new options:

  • include_class: outputs MeCab's metadata class and subclass.
  • include_reading: outputs MeCab's metadata reading.
  • include_form: outputs MeCab's metadata inflected_type, inflected_form, and base_form.
  • use_reading: allows to search terms by corresponding reading written in kana. This option will help you to search orthographical variants by kana.

For more details, see the reference.

New options for the TokenNgram tokenizer

TokenNgram now accepts these new options:

  • unify_alphabet: TokenNgram("unify_alphabet", false) will work same as TokenBigramSplitAlpha.
  • unify_symbol: TokenNgram("unify_symbol", false) will work same as TokenBigramSplitSymbol.
  • unify_digit: TokenNgram("unify_digit", false) will work same as TokenBigramSplitDigit.

For more details, see the reference.

Groonga now can grab plugins from multiple directories

A new environent variable GRN_PLUGINS_PATH is available to detect plugins from multiple directories. It is a list of path to directories, separated with ; (on Windows) or : (on other platforms).

GRN_PLUGINS_PATH has priority higher than the existing GRN_PLUGINS_DIR.

Note that this does not work on Windows currently.

Conclusion

See Release 8.0.7 2018-09-29 about detailed changes since 8.0.6

Let's search by Groonga!

2018-08-29

Groonga 8.0.6 has been released

Groonga 8.0.6 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • Optimizer is a built-in feature now.
  • Enable sequential search for enough filtered case by default.
  • load command now supports the lock_table option.

Optimizer is a built-in feature now

The optimizer plugin has became a built-in feature. It is disabled by default, and you need to set GRN_EXPR_OPTIMIZE=yes (or use the expression_rewriters plugin as before) for activation.

Enable sequential search for enough filtered case by default

Groonga now finds results by sequential search, from enough narrowed results. It may be faster than regular index search for very narrowed results, like less than 1000 records which is 1% of all results.

You can disable this feature by setting an environment variable: GRN_TABLE_SELECT_ENOUGH_FILTERED_RATIO=0.0

load command now supports the lock_table option

Now load --lock_table yes command line loads data with locking of the target table, while updating columns and applying --each. It avoids conflict of load and delete, but it will reduce the performance of loading.

Conclusion

See Release 8.0.6 2018-08-29 about detailed changes since 8.0.5

Let's search by Groonga!

2018-07-29

Groonga 8.0.5 has been released

Groonga 8.0.5 has been released!

How to install: Install

Changes

Here are important changes in this release:

Added a new function time_classify_day_of_week()

Now a new feature become available to get the day of week from time information of each search result. A new function time_classify_day_of_week() which accept an time value as its only one parameter, will return the day of week from the given time value. It returns a UInt8 value, 0 means Sunday and 6 means Saturday.

Before using the function, you need to register functions/time plugin at first, via a command line like:

plugin_register functions/time

Added a new function time_format_iso8601()

Now you can get any time value formatted in the ISO 8601 form. A new function time_format_iso8601() which accept an time value as its only one parameter, will return a string formatted in the ISO 8601 form, like 2018-07-29T23:59:59.999999+09:00.

Same as the previous one, you need to register functions/time plugin to use this function.

Dropped support of old Ubuntu and Debian versions

As you know, both Ubuntu 17.10 (Artful Aardvark) and Debian jessie are now outdated. Groonga 8.0.5 and later won't be released for those old versions.

Conclusion

See Release 8.0.5 2018-07-29 about detailed changes since 8.0.4

Let's search by Groonga!