Groonga 4.0.7 has been released
Groonga 4.0.7 has been released!
How to install: Install
Changes
In this release, two big experimental features:
- Column value compression support
- Token filter support
Column value compression support
In this release, Groonga has supported column compression by zlib or LZ4.
You can save disk spaces by this feature. This feature is implemented by @naoa_y. Thanks!
Here is the pros/cons:
- zlib: compression rate is better than LZ4, but performance is not.
- LZ4: performance is good, but compression rate is not better than Zlib.
Add COMPRESS_ZLIB
or COMPRESS_LZ4
flag to flags
for column_create
.
column_create Entries content COLUMN_SCALAR|COMPRESS_ZLIB Text
column_create Entries content COLUMN_SCALAR|COMPRESS_LZ4 Text
Here is the brief benchmark provided by @naoa_y. It shows characteristics of each compression.
- https://github.com/groonga/groonga/pull/221#issuecomment-59627302
- https://github.com/groonga/groonga/pull/223#issue-46381569
Token filter support
In this release, Groonga has supported token filters.
There are two token filters.
TokenFilterStopWord
If you use TokenFilterStopWord
, token which is registered as stop word is just ignored.
Here is the sample query to setup the lexicon and stop word:
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStopWord
column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
column_create Terms is_stop_word COLUMN_SCALAR Bool
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
You need to create is_stop_word
column and register stop word as _key
column. Note that the value of is_stop_word
must be true.
Since "and"
is registered as stop word, if you search "Hello and"
, Groonga ignores "and"
from query, then it returns search results as if you just search "Hello"
.
See documentation about TokenFilterStopWord
.
TokenFilterStem
TokenFilterStem
supports stemming feature. This feature is accomplished by using libstemmer.
Here is the sample to use TokenFilterStem
:
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStem
When creating the lexicon table, you need to specify --token_filters TokenFilterStem
.
Now, even though search query is "develop"
, the stemming library regards as same as "developing"
or "developed"
, search result contains those keywords.
See documentation about TokenFilterStem
.
For Debian users
Groonga package has been included in the official Debian repository. sid users can install Groonga from the official Debian repository.
For Ubuntu users
Groonga project provides package for Ubuntu 14.10.
Conclusion
See Release 4.0.7 2014/10/29 about detailed changes since 4.0.6.
Let's search by Groonga!