BloGroonga

2015-02-09

Groonga 5.0.0 has been released

Groonga 5.0.0 has been released!

How to install: Install

Since Groonga 4.0.0 had been released, many improvements, changes, or bug fixes was shipped.

Changes

In this release, there is following topic:

  • Experimental improvements
    • Supported plugin which enable sharding feature

Experimental improvement - Supported plugin which enable sharding feature

In this release, we added plugin which supports experimental sharding feature. Register this plugin in advance.

$ groonga testdb/db
> register sharding

Here is the simple example which shows how to use this feature. Let's consider to count specified logs which are stored into multiple tables.

Here is the schema and data.

table_create Logs_20150203 TABLE_NO_KEY
column_create Logs_20150203 timestamp COLUMN_SCALAR Time
column_create Logs_20150203 message COLUMN_SCALAR Text

table_create Logs_20150204 TABLE_NO_KEY
column_create Logs_20150204 timestamp COLUMN_SCALAR Time
column_create Logs_20150204 message COLUMN_SCALAR Text

table_create Logs_20150205 TABLE_NO_KEY
column_create Logs_20150205 timestamp COLUMN_SCALAR Time
column_create Logs_20150205 message COLUMN_SCALAR Text

load --table Logs_20150203
[
{"timestamp": "2015-02-03 23:59:58", "message": "Start"},
{"timestamp": "2015-02-03 23:59:58", "message": "Shutdown"},
{"timestamp": "2015-02-03 23:59:59", "message": "Start"},
{"timestamp": "2015-02-03 23:59:59", "message": "Shutdown"}
]

load --table Logs_20150204
[
{"timestamp": "2015-02-04 00:00:00", "message": "Start"},
{"timestamp": "2015-02-04 00:00:00", "message": "Shutdown"},
{"timestamp": "2015-02-04 00:00:01", "message": "Start"},
{"timestamp": "2015-02-04 00:00:01", "message": "Shutdown"},
{"timestamp": "2015-02-04 23:59:59", "message": "Start"},
{"timestamp": "2015-02-04 23:59:59", "message": "Shutdown"}
]

load --table Logs_20150205
[
{"timestamp": "2015-02-05 00:00:00", "message": "Start"},
{"timestamp": "2015-02-05 00:00:00", "message": "Shutdown"},
{"timestamp": "2015-02-05 00:00:01", "message": "Start"},
{"timestamp": "2015-02-05 00:00:01", "message": "Shutdown"}
]

There are three tables which are mapped each day from 2015 Feb 03 to 2015 Feb 05.

  • Logs_20150203
  • Logs_20150204
  • Logs_20150205

Then, it loads data into each table which correspond to.

Let's count logs which contains "Shutdown" in timestamp column and the value of timestamp is "2015-02-04 00:00:00" or later.

Here is the query to achieve above purpose.

logical_count Logs timestamp \
  --filter 'message == "Shutdown"' \
  --min "2015-02-04 00:00:00" \
  --min_border "include"

Pros. You don't need to consider limitation of table.

There is a limitation about the number of records. By sharding feature, you can overcome such limitations.

Cons. You need to create each table and load to it explicitly

There is no convenient query such as PARTITIONING BY in SQL. Thus, you must create table by table_create for each tables which contains _YYYYMMDD postfix in table name.

And more, you must load data into proper table carefully.

Conclusion

See Release 5.0.0 2015-02-09 about detailed changes since 4.1.1.

Let's search by Groonga!