Groonga 2.0.9 has been released
Groonga 2.0.9 has been released!
How to install: Install
There are four topics for this release.
- Supported snippet_html() function
- Supported nested index search among related table by column index
- Supported range search by using index
- Supported calculation across meridian, equator, the date line by geo_distance() function
Supported snippet_html() function
This release began to support snippet_html() function which extract keyword and surrounding text. Note that this is experimentally supported API, so this API would be changed in the future.
Use snippet_html() fuction following syntax:
snippet_html(column name)
Here is the more concrete example.
Schema definition:
table_create Documents TABLE_NO_KEY
column_create Documents content COLUMN_SCALAR Text
table_create Terms TABLE_PAT_KEY|KEY_NORMALIZE ShortText --default_tokenizer TokenBigram
column_create Terms documents_content_index COLUMN_INDEX|WITH_POSITION Documents content
Sample data:
load --table Documents
[
["content"],
["Groonga is a fast and accurate full text search engine based on inverted index."],
["Groonga is also a column-oriented database management system (DBMS)."],
["Mroonga was called groonga storage engine."]
]
If you want to search 'groonga' and extract 'groonga' and surrounding text from Documents table, try following:
Here is the query to search 'groonga' with snippet_html function.
select Documents --output_columns "snippet_html(content)" --command_version 2 --match_columns content --query "groonga"
[
[0,1353893385.5454,0.000486850738525391],
[
[
[3],
[["snippet_html","null"]],
[["Groonga is a fast and accurate full text search engine based on inverted index."]],
[["Groonga is also a column-oriented database management system (DBMS)."]],
[["Mroonga was called groonga storage engine."]]
]
]
]
As a result, specified keyword is surrounded by <span>
tag, and
keyword 'groonga' and surrounding text is extracted like a highlighted
search results.
Note that you need to specify '--command_version 2'
in the query.
The reason why function call in --output_column
has supported from
version 2.0.9.
See following documentation about snippet_html details.
Supported nested index search among related table by column index
This release began to support nested index search among related table by column index.
If there are relationships among multiple table with column index, you can search multiple table by specifing column index name.
Here is the concrete example.
there are tables which store blog articles, comments for articles. The table which stores articles has columns for article and comment, and the comment column refers comments table. The table which stores comments has columns for comment and column index to article table.
In the previous release of groonga, if you want to search the articles which contain specified keyword in comment, you need to execute fulltext search for table of comment, then search the records which contains fulltext search results.
Now, you can search the records by specifing the refererence column index at once.
here is the sample how to use this feature.
Schema definition:
table_create Comments TABLE_HASH_KEY UInt32
column_create Comments content COLUMN_SCALAR ShortText
table_create Articles TABLE_NO_KEY
column_create Articles content COLUMN_SCALAR Text
column_create Articles comment COLUMN_SCALAR Comments
table_create Lexicon TABLE_PAT_KEY|KEY_NORMALIZE ShortText --default_tokenizer TokenBigram
column_create Lexicon articles_content COLUMN_INDEX|WITH_POSITION Articles content
column_create Lexicon comments_content COLUMN_INDEX|WITH_POSITION Comments content
column_create Comments article COLUMN_INDEX Articles comment
Sample data:
load --table Comments
[
{"_key": 1, "content": "I'm using groonga too!"},
{"_key": 2, "content": "I'm using groonga and mroonga!"},
{"_key": 3, "content": "I'm using mroonga too!"}
]
load --table Articles
[
{"content": "Groonga is fast!", "comment": 1},
{"content": "Groonga is useful!"},
{"content": "Mroonga is fast!", "comment": 3}
]
You can write the query that search the records which contains specified keyword as a comment, then fetch the articles which refers to it.
select Articles --match_columns comment.content --query groonga --output_columns "_id, _score, *"
You need to concatinate comment column of articles table and content
column of comments table with period(.) as --match_columns
arguments.
At first, this query execute fulltext search from content of comments table, then fetch the records of articles table which refers to already searched records of comments table. (Because of this, if you comment out the query which create column index 'article' of comments table, you can't get intended search results.)
[
[0,1353903149.81632,0.000459432601928711],
[
[
[1],
[["_id","UInt32"],["_score","Int32"],["comment","Comments"],["content","Text"]],
[1,1,1,"Groonga is fast!"]
]
]
]
Now, you can search articles which contains specific keywords as a comment.
Supported range search by using index
This release began to support range search by using index. As a result, you can search in a short time by contrast to previous release.
Here is the sample how to use this feature.
Schema definition:
table_create Shops TABLE_HASH_KEY ShortText
column_create Shops ranking COLUMN_SCALAR UInt32
table_create Rankings TABLE_PAT_KEY UInt32
column_create Rankings shops_ranking COLUMN_INDEX Shops ranking
Sample data (ranking data about 10,000,000 shops):
load --table Shops
[
{"_key": "Shop1", "ranking": 1},
{"_key": "Shop2", "ranking": 2},
{"_key": "Shop3", "ranking": 3},
{"_key": "Shop4", "ranking": 4},
{"_key": "Shop5", "ranking": 5},
{"_key": "Shop6", "ranking": 6},
{"_key": "Shop7", "ranking": 7},
{"_key": "Shop8", "ranking": 8},
{"_key": "Shop9", "ranking": 9},
{"_key": "Shop10", "ranking": 10},
{"_key": "Shop11", "ranking": 11},
...
]
Now, registered shop name as a key, the value of ranking.
Here is the sample query to search top 10 shops of ranking.
In range search, you can specify 'Top 10' expression as
'ranking <= 10'
in this case.
Here is the search results by groonga 2.0.8.
select Shops --filter 'ranking <= 10'
[
[0,1355465886.15137,1.39784264564514],
[
[
[10],
[
["_id","UInt32"],["_key","ShortText"],["ranking","UInt32"]
],
[1,"Shop1",1],
[2,"Shop2",2],
[3,"Shop3",3],
[4,"Shop4",4],
[5,"Shop5",5],
[6,"Shop6",6],
[7,"Shop7",7],
[8,"Shop8",8],
[9,"Shop9",9],
[10,"Shop10",10]
]
]
]
Here is the search results by groonga 2.0.9.
select Shops --filter 'ranking <= 10'
[
[0,1355465837.0779,0.00165677070617676],
[
[
[10],
[
["_id","UInt32"],["_key","ShortText"],["ranking","UInt32"]
],
[1,"Shop1",1],
[2,"Shop2",2],
[3,"Shop3",3],
[4,"Shop4",4],
[5,"Shop5",5],
[6,"Shop6",6],
[7,"Shop7",7],
[8,"Shop8",8],
[9,"Shop9",9],
[10,"Shop10",10]
]
]
]
The search result is same, but the execution time is different.
[0,1355465886.15137,1.39784264564514],
In groonga 2.0.8, it takes 1.39784264564514 seconds.
[0,1355465837.0779,0.00165677070617676],
In groonga 2.0.9, it takes 0.00165677070617676 seconds.
See Output Format about the output of groonga command details.
Version of groonga | groonga 2.0.8 | groonga 2.0.9 |
---|---|---|
Execution time(seconds) | 1.39784264564514 | 0.00165677070617676 |
By upgrading 2.0.8 to 2.0.9, you can see the execution time is clipped to about a few milliseconds.
Here is the measurement environment:
CPU | Intel® Core i7-2640M CPU @ 2.80GHz |
---|---|
Memory | 8GB |
Supported calculation across meridian, equator, the date line by geo_distance() function
This release began to support calculation of the value of distance across meridian, equator, the date line by geo_distance() function.
This functional enhancement is applied to the case which the way to approximate is 'rectangle'.
There are some calculation method how to approximate the value of distance.
Groonga supports folowing three method which has trade-offs in point of view of speed, acculacy.
- Rectangle This regards geographical feature between specified points as level surface. You can calculate the value of distance fast, but the error of distance increases as it approaches the pole.
- Sphere This regards geographical feature between specified points as spherical surface. It is slower than rectangle, but the error of distance becomes smaller than rectangle.
- Ellipsoid This regards geographical feature between specified points as ellipsoid. It is slower than sphere, but the error of distance becomes smaller than sphere.
Here is the sample how to caluculate the value of distance across meridian.
This sample shows the value of distance between Paris(France) to Madrid(Spain). The geographical feature is approximated as level surface (rectangle).
"175904000x8464000" means Paris(France) expressed in milliseconds. "145508000x-13291000" means Madrid(Spain) expressed in milliseconds.
select Geo --output_columns distance --scorer 'distance = geo_distance("175904000x8464000", "145508000x-13291000", "rectangle")'
[
[
0,
1337566253.89858,
0.000355720520019531
],
[
[
[
1
],
[
[
"distance",
"Int32"
]
],
[
1051293
]
]
]
]
See following documentation how to express longitude and latitude in milliseconds
See following documentation how to use geo_distance
Conclusion
See Release 2.0.9 2012/11/29 about detailed changes since 2.0.8.
Let's search by groonga!