BloGroonga

2015-03-29

Groonga 5.0.1 has been released

Groonga 5.0.1 has been released!

How to install: Install

Changes

In this release, there is following the topics:

  • Incompatible changes
    • The two incompatible changes for DB API users
  • Improvements
    • Supported to customize score function
    • [windows] Supported more compact default database size on Windows
  • Experimental improvements
    • Supported regular expression
    • Supported to skip posting list

Incompatible changes - The two incompatible changes for DB API users

In this release, Internal type of _score was changed from 32bit integer number to floating point number. This is incompatible change for DB API users. This isn't incompatible change for query API users. It means that users who just use select command aren't affected. Use the following code that works with both older and newer Groonga:

grn_obj *score;
double score_value;

if (score->header.domain == GRN_DB_FLOAT) {
  score_value = GRN_FLOAT_VALUE(score);
} else {
  score_value = (double)GRN_INT32_VALUE_FLOAT_VALUE(score);
}

Also, custom score function feature introduced API and ABI incompatibilities in DB API layer. If you're using grn_search_optarg, please check that your code initializes your grn_search_optarg by 0 like the following:

grn_search_optarg options;
memset(&options, 0, sizeof(grn_search_optarg));

If your code do the above thing, your code is API compatible and ABI incompatible. You just need to rebuild your code without modification.

If your code doesn't the above thing, you need to added the above thing to your code.

Improvements - Supported to customize score function

In this release, customizing score function feature is supported.

There are two built-in score functions.

  • scorer_tf_idf

    • The frequency of "important term" is important for the score
  • scorer_tf_at_most

    • The matched count is important, but it doesn't exceed the limit

Groonga use the value of TF (Term Frequency) by default. For example, try to search against multiple columns - title column and message column.

Usually, the content of title column represents summary of message column. It is reasonable to give title column the weight. As a result, you can search more important records easily.

Note that there is a case that the score of title column with weight is less than matched message column. It is not intended search results.

If you use scorer_tf_at_most score function, you can limit the value of score to specific value even though too many records are matched.

See following the scorer document.

Improvements - [windows] Supported more compact default database size on Windows

In this release, more compact default database size on Windows is enabled by default.

In the previous versions, Groonga uses more larger default database size (128MiB) on Windows. The size of database increases every you create new column.

It is not efficient, now Groonga supports to expand database size gradually as same as other GNU/Linux environment.

This feature is introduced since Groonga 4.1.1, but not enabled by default because it is experimental feature at that time.

Thank you for Groonga users which gives us feedback of above feature!

Experimental improvements - Supported regular expression

In this release, Groonga supports regular expression.

You can use regular expression in select command with query option or filter option. Note that syntax is a bit different.

query:

${COLUMN}:~${REGULAR_EXPRESSION}

filter:

${COLUMN} @~ ${REGULAR_EXPRESSION}

The difference between ":~" and "@~" is come from design of Groonga.

"query" accepts query syntax, on the other hand, "filter" accepts script syntax.

"filter" is designed for processing more complex search conditions.

You can search by regular expression with index. Create indexes which meets following the conditions:

  • Lexicon must be TABLE_PAT_KEY table.
  • Lexicon must use TokenRegexp tokenizer.
  • Index column must has WITH_POSITION flag.

Thus, create following the column index.

table_create RegexpLexicon TABLE_PAT_KEY ShortText \
  --default_tokenizer TokenRegexp \
  --normalizer NormalizerAuto
column_create RegexpLexicon logs_message_index \
  COLUMN_INDEX|WITH_POSITION Logs message

Then, you can search with regular expression by using index for message column.

See following the document about regular expression.

Experimental improvements - Supported to skip posting list

In this release, Groonga supports to skip posting list when searching popular term and rare term at the same time.

It will improve performance. Set GRN_II_CURSOR_SET_MIN_ENABLE environment variable to 1 to enable the feature.

GRN_II_CURSOR_SET_MIN_ENABLE=1

The feature is disabled by default because it is experimental feature.

Conclusion

See Release 5.0.1 2015-03-29 about detailed changes since 5.0.0.

Let's search by Groonga!