BloGroonga

2014-10-29

Groonga 4.0.7 has been released

Groonga 4.0.7 has been released!

How to install: Install

Changes

In this release, two big experimental features:

  • Column value compression support
  • Token filter support

Column value compression support

In this release, Groonga has supported column compression by zlib or LZ4.

You can save disk spaces by this feature. This feature is implemented by @naoa_y. Thanks!

Here is the pros/cons:

  • zlib: compression rate is better than LZ4, but performance is not.
  • LZ4: performance is good, but compression rate is not better than Zlib.

Add COMPRESS_ZLIB or COMPRESS_LZ4 flag to flags for column_create.

column_create Entries content COLUMN_SCALAR|COMPRESS_ZLIB Text
column_create Entries content COLUMN_SCALAR|COMPRESS_LZ4 Text

Here is the brief benchmark provided by @naoa_y. It shows characteristics of each compression.

  • https://github.com/groonga/groonga/pull/221#issuecomment-59627302
  • https://github.com/groonga/groonga/pull/223#issue-46381569

Token filter support

In this release, Groonga has supported token filters.

There are two token filters.

TokenFilterStopWord

If you use TokenFilterStopWord, token which is registered as stop word is just ignored.

Here is the sample query to setup the lexicon and stop word:

table_create Terms TABLE_PAT_KEY ShortText \
  --default_tokenizer TokenBigram \
  --normalizer NormalizerAuto \
  --token_filters TokenFilterStopWord
column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
column_create Terms is_stop_word COLUMN_SCALAR Bool
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]

You need to create is_stop_word column and register stop word as _key column. Note that the value of is_stop_word must be true.

Since "and" is registered as stop word, if you search "Hello and", Groonga ignores "and" from query, then it returns search results as if you just search "Hello".

See documentation about TokenFilterStopWord.

TokenFilterStem

TokenFilterStem supports stemming feature. This feature is accomplished by using libstemmer.

Here is the sample to use TokenFilterStem:

table_create Terms TABLE_PAT_KEY ShortText \
  --default_tokenizer TokenBigram \
  --normalizer NormalizerAuto \
  --token_filters TokenFilterStem

When creating the lexicon table, you need to specify --token_filters TokenFilterStem.

Now, even though search query is "develop", the stemming library regards as same as "developing" or "developed", search result contains those keywords.

See documentation about TokenFilterStem.

For Debian users

Groonga package has been included in the official Debian repository. sid users can install Groonga from the official Debian repository.

For Ubuntu users

Groonga project provides package for Ubuntu 14.10.

Conclusion

See Release 4.0.7 2014/10/29 about detailed changes since 4.0.6.

Let's search by Groonga!