BloGroonga

2014-03-29

Groonga 4.0.1 has been released

Groonga 4.0.1 has been released!

How to install: Install

There are following topics in this release.

  • Resolved increasing the size of database issue
  • Supported weight vector column
  • Supported adjuster option in select command

Resolved increasing the size of database issue

In this release, Groonga has resolved that the size of database increase by updating.

Here is the history of suppressing the size of database:

  • 3.1.0 - Added GRN_JA_SKIP_SAME_VALUE_PUT environment variable

It skips updating database if the value is same. This feature is marked as experimental.

  • 3.1.2 - Enable GRN_JA_SKIP_SAME_VALUE_PUT=yes by default

It forces to enable above flag by default because this feature is reasonable effects to suppress the size of database.

In contrast to previous approach, Groonga can manage to use variable length more effectively so It fixes the issue about increasing the size of database.

Note that you need to recreate the database to suppress the size of database feature.

Here is the summary:

  • You can open the database of previous versions by Groonga 4.0.1 or later.
  • But, you can't open database which is created Groonga 4.0.1, by previous versions.
  • You can use this feature by recreating the database.

In fact, ongaeshi who is developer of Milkode had tested the impact of this feature. Here is the verified graph about increasing the size of database. (Thanks to ongaeshi!!!)

It reveals the fact that if you use previous database as is(4.0.0-72-continue), the size of database just increases, but if you recreate database, you can suppress the size of one (4.0.0-72-new).

Supported weight vector column

In the Groonga 4.0.1 release, vector column can store multiple pairs of key and value. It is weight vector column.

For example, if you want to store user's attribute as tag, you need to use COLUMN_VECTOR as following:

column_create Users tags COLUMN_VECTOR ShortText

But, it is not enough if attribute has a deviation, so you need to use another columns to store the value of weight for each attriubutes as alternative way. (Here is the example schema definition of alternative way)

column_create Users tags COLUMN_VECTOR ShortText
column_create Users tags_A COLUMN_SCALAR Int32
column_create Users tags_B COLUMN_SCALAR Int32
column_create Users tags_C COLUMN_SCALAR Int32...

By supporting weight vector column, Groonga can unify such columns into one column. Use 'WITH_WEIGHT' flag in column definition.

column_create Users tags COLUMN_VECTOR|WITH_WEIGHT ShortText

You can store pair of key and value to vector column.

{"Tag A":weight1, "Tag B":weight2, "Tag C":weight3, ...}

Supported adjuster option in select command

In this release, Groonga supported adjuster option in select command.

In the previous versions, you can treat weight for each column by using match_column.

Here is the difference of match_column and adjuster option:

  • match_column - treat weight for matched column
  • adjuster - treat weight for specific key of column

In combination with weight vector column support, you can customize search results.

For example, consiter to list up the person who use Groonga well. Assume that the value of rate is stored into weight vector column.

Here is the sample schema definition:

table_create User TABLE_HASH_KEY ShortText
column_create User weight COLUMN_VECTOR|WITH_WEIGHT ShortText
column_create User tags COLUMN_VECTOR ShortText

table_create Weight TABLE_HASH_KEY ShortText
column_create Weight weight_index COLUMN_INDEX|WITH_WEIGHT User weight

table_create Tag TABLE_PAT_KEY ShortText
column_create Tag tags_index COLUMN_INDEX User tags

Here is the way to load sample data:

load --table User
[
  {
    "_key":"alice",
    "weight":{"Groonga":30, "Mroonga":20},
    "tags": ["Groonga", "Mroonga"]
  },
  {
    "_key":"bob",
    "weight":{"Groonga":50},
    "tags": ["Groonga"]
  },
  {
    "_key":"carol",
    "weight":{"Groonga":40,"Mroonga":30},
    "tags": ["Groonga", "Mroonga"]
  }
]

In the simple way, you can just use filter option to get the person who use "Groonga".

select User --output_columns _key,_score,* --sortby -_score --filter 'tags @ "Groonga"'

But, we want to consider the rate in this case, so we need to use adjuster option for this purpose.

select User --output_columns _key,_score,* --sortby -_score --filter 'tags @ "Groonga"' --adjuster 'weight @ "Groonga" * 10'

Here is the parameter for adjuster:

'weight @ "Groonga" * 10'

It means that calculate the value of weight for weight column which use "Groonga" as keyword, if "Groonga" exists, multiply 10 for it.

As a result, "bob" is the top of the result:

["bob",511,["Groonga"],{"Groonga":50}],
["carol",411,["Groonga","Mroonga"],{"Groonga":40,"Mroonga":30}],
["alice",311,["Groonga","Mroonga"],{"Groonga":30,"Mroonga":20}]

If you consider the person who use not only "Groonga" but also "Mroonga", specify "Mroonga" for adjuster:

select User --output_columns _key,_score,* --sortby -_score --filter 'tags @ "Groonga"' --adjuster 'weight @ "Mroonga" * 10'

As a result, "carol" is the top of the result:

["carol",311,["Groonga","Mroonga"],{"Groonga":40,"Mroonga":30}],
["alice",211,["Groonga","Mroonga"],{"Groonga":30,"Mroonga":20}],
["bob",1,["Groonga"],{"Groonga":50}]

Conclusion

See Release 4.0.1 2014/03/29 about detailed changes since 4.0.0.

Let's search by Groonga!