BloGroonga

2012-12-29

Since groonga 2.0.0 had been released, many improvements, changes, or bug fixes was shipped. So it is a time to increment minor version (2.1.0) instead of micro version (2.0.x)! :-)

Groonga 2.1.0 has been released

Groonga 2.1.0 has been released!

How to install: Install

There are three topics for this release.

  • Supported the expression as snippet_html() function arguments
  • Supported --normalizer option for table_create command
  • Supported continuous line in command list

Supported the expression as snippet_html() function arguments

This release began to support the expression as snippet_html() function arguments.

Note that this is experimentally supported API, so this API would be changed in the future.

In previous release, snippet_html() fuction accepts following syntax:

  snippet_html(column name)

In this release, snippet_html() fuction accepts following expression for example:

  snippet_html("STRING" + "STRING")

Here is the more concrete example what this change means.

Schema definition:

  table_create Documents TABLE_NO_KEY
  column_create Documents title COLUMN_SCALAR ShortText
  column_create Documents content COLUMN_SCALAR Text

  table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram
  column_create Terms document_title_index COLUMN_INDEX|WITH_POSITION Documents title
  column_create Terms document_content_index COLUMN_INDEX|WITH_POSITION Documents content

Sample data:

  load --table Documents
  [
  ["title", "content"],
  ["Groonga overview", "Groonga is a fast and accurate full text search engine based on inverted index."],
  ["Full text search and Instant update", "In widely used DBMSs, updates are immediately processed, for example, a newly registered record appears in the result of the next query."],
  ["Column store and aggregate query", "People can collect more than enough data in the Internet era."]
  ]

In previous release, you can't specify multiple column as argument of snippet_html() function, even if you want to search 'Groonga' from title or content column all at once and extract 'Groonga' and surrounding text from Documents table.

There was a limitation that snippet_html() function accepts either title column or content column as argument in previous release.

In this release, you can specify concatenated column name and literal as argument of snippet_html.

Here is the query to search 'Groonga' from title column or content column all at once.

  select Documents 
    --match_columns title||content --query 'Groonga' 
    --output_columns 'snippet_html(title + " " + content)' 
    --command_version 2

  [
    [0,1356406051.43579,0.000200510025024414],
    [
      [
        [1],
        [
          ["snippet_html","null"]
        ],
      [
        ["Groonga overview Groonga is a fast and accurate full text search engine based on inverted index."]
        ]
      ]
    ]
  ]

As a result, specified keyword is surrounded by <span> tag, and keyword 'Groonga' and surrounding text are extracted from title column and content column.

The literal " " is added to insert space for formatting snippet.

You can get a highlighted search results easily.

See following documentation about snippet_html details.

Supported --normalizer option for table_create command

This release began to support normalizer plugin API.

groonga supports NFKC as Unicode normalizing method.

In this release, you can specify normalizer to each table.

By supporting this feature, mroonga may be able to support normalizer which is equal to MySQL COLLATION in the future.

Here is the syntax specifying normalizer.

  table_create Terms TABLE_PAT_KEY ShortText --normalizer NormalizerAuto

Specify --normalizer option with NORMALIZER NAME when you create table.

It is equivalent to specify KEY_NORMALIZE and to specify --normalizer NormalizerAuto.

There is no sample normalizer plugin, but there is a published API in groonga/normalizer.h for developer.

Supported continuous line in command list

This release began to support continuous line in command list.

In the previous release, continuous line is not accepted. So, you must write command list in line previously.

Before:

  table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram

After:

  table_create --name Terms 
               --flags TABLE_PAT_KEY 
               --key_type ShortText 
               --default_tokenizer TokenBigram

Now, you can write command list with continuous line which is represented by '' character.

Conclusion

See Release 2.0.9 2012/11/29 about detailed changes since 2.0.9.

Let's search by groonga!