Since groonga 2.0.0 had been released, many improvements, changes, or bug fixes was shipped. So it is a time to increment minor version (2.1.0) instead of micro version (2.0.x)! :-)
Groonga 2.1.0 has been released
Groonga 2.1.0 has been released!
How to install: Install
There are three topics for this release.
- Supported the expression as
snippet_html()
function arguments - Supported
--normalizer
option fortable_create
command - Supported continuous line in command list
Supported the expression as snippet_html()
function arguments
This release began to support the expression as snippet_html()
function arguments.
Note that this is experimentally supported API, so this API would be changed in the future.
In previous release, snippet_html()
fuction accepts following syntax:
snippet_html(column name)
In this release, snippet_html() fuction accepts following expression for example:
snippet_html("STRING" + "STRING")
Here is the more concrete example what this change means.
Schema definition:
table_create Documents TABLE_NO_KEY
column_create Documents title COLUMN_SCALAR ShortText
column_create Documents content COLUMN_SCALAR Text
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram
column_create Terms document_title_index COLUMN_INDEX|WITH_POSITION Documents title
column_create Terms document_content_index COLUMN_INDEX|WITH_POSITION Documents content
Sample data:
load --table Documents
[
["title", "content"],
["Groonga overview", "Groonga is a fast and accurate full text search engine based on inverted index."],
["Full text search and Instant update", "In widely used DBMSs, updates are immediately processed, for example, a newly registered record appears in the result of the next query."],
["Column store and aggregate query", "People can collect more than enough data in the Internet era."]
]
In previous release, you can't specify multiple column as argument of
snippet_html()
function, even if you want to search 'Groonga' from
title or content column all at once and extract 'Groonga' and
surrounding text from Documents table.
There was a limitation that snippet_html()
function accepts either
title column or content column as argument in previous release.
In this release, you can specify concatenated column name and literal as argument of snippet_html.
Here is the query to search 'Groonga' from title column or content column all at once.
select Documents
--match_columns title||content --query 'Groonga'
--output_columns 'snippet_html(title + " " + content)'
--command_version 2
[
[0,1356406051.43579,0.000200510025024414],
[
[
[1],
[
["snippet_html","null"]
],
[
["Groonga overview Groonga is a fast and accurate full text search engine based on inverted index."]
]
]
]
]
As a result, specified keyword is surrounded by <span>
tag, and
keyword 'Groonga' and surrounding text are extracted from title column
and content column.
The literal " " is added to insert space for formatting snippet.
You can get a highlighted search results easily.
See following documentation about snippet_html details.
Supported --normalizer
option for table_create
command
This release began to support normalizer plugin API.
groonga supports NFKC as Unicode normalizing method.
In this release, you can specify normalizer to each table.
By supporting this feature, mroonga may be able to support normalizer which is equal to MySQL COLLATION in the future.
Here is the syntax specifying normalizer.
table_create Terms TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
Specify --normalizer
option with NORMALIZER NAME when you create
table.
It is equivalent to specify KEY_NORMALIZE
and to specify
--normalizer NormalizerAuto
.
There is no sample normalizer plugin, but there is a published API in
groonga/normalizer.h
for developer.
Supported continuous line in command list
This release began to support continuous line in command list.
In the previous release, continuous line is not accepted. So, you must write command list in line previously.
Before:
table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram
After:
table_create --name Terms
--flags TABLE_PAT_KEY
--key_type ShortText
--default_tokenizer TokenBigram
Now, you can write command list with continuous line which is
represented by ''
character.
Conclusion
See Release 2.0.9 2012/11/29 about detailed changes since 2.0.9.
Let's search by groonga!