News¶
Release 11.0.0 - 2021-02-09¶
This is a major version up! But It keeps backward compatibility. We can upgrade to 11.0.0 without rebuilding database.
Improvements¶
[select] Added support for outputting values of scalar column and vector column via nested index.
The nested index is that has structure as below.
table_create Products TABLE_PAT_KEY ShortText table_create Purchases TABLE_NO_KEY column_create Purchases product COLUMN_SCALAR Products column_create Purchases tag COLUMN_SCALAR ShortText column_create Products purchases COLUMN_INDEX Purchases product
The
Products.purchases
column is a index ofPurchases.product
column in the above example. Also,Purchases.product
is a reference toProducts
table.We had not got the correct search result when we search via nested index until now.
The result had been as follows until now. We can see that
{"product": "apple", "tag": "man"}
is not output.table_create Products TABLE_PAT_KEY ShortText table_create Purchases TABLE_NO_KEY column_create Purchases product COLUMN_SCALAR Products column_create Purchases tag COLUMN_SCALAR ShortText column_create Products purchases COLUMN_INDEX Purchases product load --table Products [ {"_key": "apple"}, {"_key": "banana"}, {"_key": "cacao"} ] load --table Purchases [ {"product": "apple", "tag": "man"}, {"product": "banana", "tag": "man"}, {"product": "cacao", "tag": "woman"}, {"product": "apple", "tag": "many"} ] select Products \ --output_columns _key,purchases.tag [ [ 0, 1612504193.380738, 0.0002026557922363281 ], [ [ [ 3 ], [ [ "_key", "ShortText" ], [ "purchases.tag", "ShortText" ] ], [ "apple", "many" ], [ "banana", "man" ], [ "cacao", "man" ] ] ] ]
The result will be as follows from this release. We can see that
{"product": "apple", "tag": "man"}
is output.select Products \ --output_columns _key,purchases.tag [ [ 0, 0.0, 0.0 ], [ [ [ 3 ], [ [ "_key", "ShortText" ], [ "purchases.tags", "Tags" ] ], [ "apple", [ [ "man", "one" ], [ "child", "many" ] ] ], [ "banana", [ [ "man", "many" ] ] ], [ "cacao", [ [ "woman" ] ] ] ] ] ]
[Windows] Dropped support for packages of Windows version that we had cross compiled by using MinGW on Linux.
Because there aren’t probably many people who use that.
These above packages are that We had provided as below name until now.
groonga-x.x.x-x86.exe
groonga-x.x.x-x86.zip
groonga-x.x.x-x64.exe
groonga-x.x.x-x86.zip
From now on, we use the following packages for Windows.
groonga-latest-x86-vs2019-with-vcruntime.zip
groonga-latest-x64-vs2019-with-vcruntime.zip
If a system already has installed Microsoft Visual C++ Runtime Library, we suggest that we use the following packages.
groonga-latest-x86-vs2019.zip
groonga-latest-x64-vs2019.zip
Fixes¶
Fixed a bug that there is possible that index is corrupt when Groonga executes many additions, delete, and update information in it.
This bug occurs when we only execute many delete information from index. However, it doesn’t occur when we only execute many additions information into index.
We can repair the index that is corrupt by this bug using reconstruction of it.
This bug doesn’t detect unless we reference the broken index. Therefore, the index in our indexes may has already broken.
We can use [index_column_diff] command to confirm whether the index has already been broken or not.
Release 10.1.1 - 2021-01-25¶
Improvements¶
[select] Added support for outputting UInt64 value in Apache Arrow format.
[select] Added support for outputting a number of hits in Apache Arrow format as below.
-- metadata -- GROONGA:n-hits: 10
[query] Added support for optimization of “order by estimated size”.
Normally, we can search at high speed when we first execute a condition which number of hits is a little.
The “B (There are few) && A (There are many)” faster than “A (There are many) && B (There are few)”.
This is a known optimization. However we need reorder conditions by ourselves.
Groonga executes this reorder automatically by using optimization of “order by estimated size”.
This optimization is valid by
GRN_ORDER_BY_ESTIMATED_SIZE_ENABLE=yes
.
[between] Improved performance by the following improvements.
Improved the accuracy of a decision whether the
between()
use sequential search or not.Improved that we set a result of
between()
into a result set in bulk.
[select] Improved performance for a prefix search.
For example, the performance of the following prefix search by using “*” improved.
table_create Memos TABLE_PAT_KEY ShortText table_create Contents TABLE_PAT_KEY ShortText --normalizer NormalizerAuto column_create Contents entries_key_index COLUMN_INDEX Memos _key load --table Memos [ {"_key": "(groonga) Start to try!"}, {"_key": "(mroonga) Installed"}, {"_key": "(groonga) Upgraded!"} ] select \ --table Memos \ --match_columns _key \ --query '\\(groonga\\)*'
[Tokenizers][TokenMecab] Improved performance for parallel construction fo token column. [GitHub#1158][Patched by naoa]
Fixes¶
[sub_filter] Fixed a bug that
sub_filter
doesn’t work inslices[].filter
.For example, the result of
sub_filter
was 0 records by the following query by this bug.table_create Users TABLE_HASH_KEY ShortText column_create Users age COLUMN_SCALAR UInt8 table_create Memos TABLE_NO_KEY column_create Memos user COLUMN_SCALAR Users column_create Memos content COLUMN_SCALAR Text load --table Users [ {"_key": "alice", "age": 9}, {"_key": "bob", "age": 29} ] load --table Memos [ {"user": "alice", "content": "hello"}, {"user": "bob", "content": "world"}, {"user": "alice", "content": "bye"}, {"user": "bob", "content": "yay"} ] select \ --table Memos \ --slices[adult].filter '_id > 1 && sub_filter(user, "age >= 18")'
Fixed a bug that it is possible that we can’t add data and Groonga crash when we repeat much addition of data and deletion of data against a hash table.
Thanks¶
naoa
Release 10.1.0 - 2020-12-29¶
Improvements¶
[highlight_html] Added support for removing leading full width spaces from highlight target. [PGroonga#GitHub#155][Reported by Hung Nguyen V]
Until now, leading full width spaces had also included in the target of highlight as below.
table_create Entries TABLE_NO_KEY column_create Entries body COLUMN_SCALAR ShortText table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body load --table Entries [ {"body": "Groonga 高速!"} ] select Entries --output_columns \ --match_columns body --query '高速' \ --output_columns 'highlight_html(body)' [ [ 0, 0.0, 0.0 ], [ [ [ 1 ], [ [ "highlight_html", null ] ], [ "Groonga<span class=\"keyword\"> 高速</span>!" ] ] ] ]
However, this is needless as the target of highlight. Therefore, in this release,
highlight_html()
removes leading full width spaces.
[status] Added a new item
features
.We can display which Groonga’s features are enabled as below.
status --output_pretty yes [ [ 0, 0.0, 0.0 ], { "alloc_count": 361, "starttime": 1608087311, "start_time": 1608087311, "uptime": 35, "version": "10.1.0", "n_queries": 0, "cache_hit_rate": 0.0, "command_version": 1, "default_command_version": 1, "max_command_version": 3, "n_jobs": 0, "features": { "nfkc": true, "mecab": true, "message_pack": true, "mruby": true, "onigmo": true, "zlib": true, "lz4": false, "zstandard": false, "kqueue": false, "epoll": true, "poll": false, "rapidjson": false, "apache_arrow": false, "xxhash": false } } ]
[status] Added a new item
apache_arrow
.We can display Apache Arrow version that Groonga use as below.
[ [ 0, 1608088628.440753, 0.0006628036499023438 ], { "alloc_count": 360, "starttime": 1608088617, "start_time": 1608088617, "uptime": 11, "version": "10.0.9-39-g5a4c6f3", "n_queries": 0, "cache_hit_rate": 0.0, "command_version": 1, "default_command_version": 1, "max_command_version": 3, "n_jobs": 0, "features": { "nfkc": true, "mecab": true, "message_pack": true, "mruby": true, "onigmo": true, "zlib": true, "lz4": true, "zstandard": false, "kqueue": false, "epoll": true, "poll": false, "rapidjson": false, "apache_arrow": true, "xxhash": false }, "apache_arrow": { "version_major": 2, "version_minor": 0, "version_patch": 0, "version": "2.0.0" } } ]
This item only display when Apache Arrow is valid in Groonga.
[Window function] Added support for processing all tables at once even if target tables straddle a shard. (experimental)
If the target tables straddle a shard, the window function had processed each shard until now.
Therefore, if we used multiple group keys in windows functions, the value of the group keys from the second had to be one kind of value.
However, we can use multiple kind of values for it as below by this improvement.
plugin_register sharding plugin_register functions/time table_create Logs_20170315 TABLE_NO_KEY column_create Logs_20170315 timestamp COLUMN_SCALAR Time column_create Logs_20170315 price COLUMN_SCALAR UInt32 column_create Logs_20170315 item COLUMN_SCALAR ShortText table_create Logs_20170316 TABLE_NO_KEY column_create Logs_20170316 timestamp COLUMN_SCALAR Time column_create Logs_20170316 price COLUMN_SCALAR UInt32 column_create Logs_20170316 item COLUMN_SCALAR ShortText table_create Logs_20170317 TABLE_NO_KEY column_create Logs_20170317 timestamp COLUMN_SCALAR Time column_create Logs_20170317 price COLUMN_SCALAR UInt32 column_create Logs_20170317 item COLUMN_SCALAR ShortText load --table Logs_20170315 [ {"timestamp": "2017/03/15 10:00:00", "price": 1000, "item": "A"}, {"timestamp": "2017/03/15 11:00:00", "price": 900, "item": "A"}, {"timestamp": "2017/03/15 12:00:00", "price": 300, "item": "B"}, {"timestamp": "2017/03/15 13:00:00", "price": 200, "item": "B"} ] load --table Logs_20170316 [ {"timestamp": "2017/03/16 10:00:00", "price": 530, "item": "A"}, {"timestamp": "2017/03/16 11:00:00", "price": 520, "item": "B"}, {"timestamp": "2017/03/16 12:00:00", "price": 110, "item": "A"}, {"timestamp": "2017/03/16 13:00:00", "price": 410, "item": "A"}, {"timestamp": "2017/03/16 14:00:00", "price": 710, "item": "B"} ] load --table Logs_20170317 [ {"timestamp": "2017/03/17 10:00:00", "price": 800, "item": "A"}, {"timestamp": "2017/03/17 11:00:00", "price": 400, "item": "B"}, {"timestamp": "2017/03/17 12:00:00", "price": 500, "item": "B"}, {"timestamp": "2017/03/17 13:00:00", "price": 300, "item": "A"} ] table_create Times TABLE_PAT_KEY Time column_create Times logs_20170315 COLUMN_INDEX Logs_20170315 timestamp column_create Times logs_20170316 COLUMN_INDEX Logs_20170316 timestamp column_create Times logs_20170317 COLUMN_INDEX Logs_20170317 timestamp logical_range_filter Logs \ --shard_key timestamp \ --filter 'price >= 300' \ --limit -1 \ --columns[offsetted_timestamp].stage filtered \ --columns[offsetted_timestamp].type Time \ --columns[offsetted_timestamp].flags COLUMN_SCALAR \ --columns[offsetted_timestamp].value 'timestamp - 39600000000' \ --columns[offsetted_day].stage filtered \ --columns[offsetted_day].type Time \ --columns[offsetted_day].flags COLUMN_SCALAR \ --columns[offsetted_day].value 'time_classify_day(offsetted_timestamp)' \ --columns[n_records_per_day_and_item].stage filtered \ --columns[n_records_per_day_and_item].type UInt32 \ --columns[n_records_per_day_and_item].flags COLUMN_SCALAR \ --columns[n_records_per_day_and_item].value 'window_count()' \ --columns[n_records_per_day_and_item].window.group_keys 'offsetted_day,item' \ --output_columns "_id,time_format_iso8601(offsetted_day),item,n_records_per_day_and_item" [ [ 0, 0.0, 0.0 ], [ [ [ "_id", "UInt32" ], [ "time_format_iso8601", null ], [ "item", "ShortText" ], [ "n_records_per_day_and_item", "UInt32" ] ], [ 1, "2017-03-14T00:00:00.000000+09:00", "A", 1 ], [ 2, "2017-03-15T00:00:00.000000+09:00", "A", 2 ], [ 3, "2017-03-15T00:00:00.000000+09:00", "B", 1 ], [ 1, "2017-03-15T00:00:00.000000+09:00", "A", 2 ], [ 2, "2017-03-16T00:00:00.000000+09:00", "B", 2 ], [ 4, "2017-03-16T00:00:00.000000+09:00", "A", 2 ], [ 5, "2017-03-16T00:00:00.000000+09:00", "B", 2 ], [ 1, "2017-03-16T00:00:00.000000+09:00", "A", 2 ], [ 2, "2017-03-17T00:00:00.000000+09:00", "B", 2 ], [ 3, "2017-03-17T00:00:00.000000+09:00", "B", 2 ], [ 4, "2017-03-17T00:00:00.000000+09:00", "A", 1 ] ] ]
This feature requires Apache Arrow 3.0.0 that is not released yet.
Added support for sequential search against reference column.
This feature is only used if an index search will match many records and the current result set is enough small.
Because the sequential search is faster than the index search in the above case.
It is invalid by default.
It is valid if we set
GRN_II_SELECT_TOO_MANY_INDEX_MATCH_RATIO_REFERENCE
environment variable.GRN_II_SELECT_TOO_MANY_INDEX_MATCH_RATIO_REFERENCE
environment variable is a threshold to switch the sequential search from the index search.For example, if we set
GRN_II_SELECT_TOO_MANY_INDEX_MATCH_RATIO_REFERENCE=0.7
as below, if the number of records for the result set less than 70 % of total records, a search is executed by a sequential search.$ export GRN_II_SELECT_TOO_MANY_INDEX_MATCH_RATIO_REFERENCE=0.7 table_create Tags TABLE_HASH_KEY ShortText table_create Memos TABLE_HASH_KEY ShortText column_create Memos tag COLUMN_SCALAR Tags load --table Memos [ {"_key": "Rroonga is fast!", "tag": "Rroonga"}, {"_key": "Groonga is fast!", "tag": "Groonga"}, {"_key": "Mroonga is fast!", "tag": "Mroonga"}, {"_key": "Groonga sticker!", "tag": "Groonga"}, {"_key": "Groonga is good!", "tag": "Groonga"} ] column_create Tags memos_tag COLUMN_INDEX Memos tag select Memos --query '_id:>=3 tag:@Groonga' --output_columns _id,_score,_key,tag [ [ 0, 0.0, 0.0 ], [ [ [ 2 ], [ [ "_id", "UInt32" ], [ "_score", "Int32" ], [ "_key", "ShortText" ], [ "tag", "Tags" ] ], [ 4, 2, "Groonga sticker!", "Groonga" ], [ 5, 2, "Groonga is good!", "Groonga" ] ] ] ]
[tokenizers] Added support for the token column into
TokenDocumentVectorTFIDF
andTokenDocumentVectorBM25
.If there is the token column that has the same source as the index column, these tokenizer use the token id of the token column.
The token column has already had data that has already finished tokenize.
Therefore, these tokenizer are improved performance by using a token column.
For example, we can use this feature by making a token column named
content_tokens
as below.table_create Memos TABLE_NO_KEY column_create Memos content COLUMN_SCALAR Text load --table Memos [ {"content": "a b c a"}, {"content": "c b c"}, {"content": "b b a"}, {"content": "a c c"}, {"content": "a"} ] table_create Tokens TABLE_PAT_KEY ShortText \ --normalizer NormalizerNFKC121 \ --default_tokenizer TokenNgram column_create Tokens memos_content COLUMN_INDEX|WITH_POSITION Memos content column_create Memos content_tokens COLUMN_VECTOR Tokens content table_create DocumentVectorBM25 TABLE_HASH_KEY Tokens \ --default_tokenizer \ 'TokenDocumentVectorBM25("index_column", "memos_content", \ "df_column", "df")' column_create DocumentVectorBM25 df COLUMN_SCALAR UInt32 column_create Memos content_feature COLUMN_VECTOR|WITH_WEIGHT|WEIGHT_FLOAT32 \ DocumentVectorBM25 content select Memos [ [ 0, 0.0, 0.0 ], [ [ [ 5 ], [ [ "_id", "UInt32" ], [ "content", "Text" ], [ "content_feature", "DocumentVectorBM25" ], [ "content_tokens", "Tokens" ] ], [ 1, "a b c a", { "a": 0.5095787, "b": 0.6084117, "c": 0.6084117 }, [ "a", "b", "c", "a" ] ], [ 2, "c b c", { "c": 0.8342565, "b": 0.5513765 }, [ "c", "b", "c" ] ], [ 3, "b b a", { "b": 0.9430448, "a": 0.3326656 }, [ "b", "b", "a" ] ], [ 4, "a c c", { "a": 0.3326656, "c": 0.9430448 }, [ "a", "c", "c" ] ], [ 5, "a", { "a": 1.0 }, [ "a" ] ] ] ] ]
TokenDocumentVectorTFIDF
andTokenDocumentVectorBM25
give a weight against each tokens.TokenDocumentVectorTFIDF
calculate a weight for token by using TF-IDF.Please refer to https://radimrehurek.com/gensim/models/tfidfmodel.html about TF-IDF.
TokenDocumentVectorBM25
calculate a weight for token by using Okapi BM25.Please refer to https://en.wikipedia.org/wiki/Okapi_BM25 about Okapi BM25.
Improved performance when below case.
(column @ "value") && (column @ "value")
[Ubuntu] Added support for Ubuntu 20.10 (Groovy Gorilla).
[Debian GNU/Linux] Dropped stretch support.
It reached EOL.
[CentOS] Dropped CentOS 6.
It reached EOL.
[httpd] Updated bundled nginx to 1.19.6.
Fixes¶
Fixed a bug that Groonga crash when we use multiple keys drilldown and use multiple accessor as below. [GitHub#1153][Patched by naoa]
table_create Tags TABLE_PAT_KEY ShortText table_create Memos TABLE_HASH_KEY ShortText column_create Memos tags COLUMN_VECTOR Tags column_create Memos year COLUMN_SCALAR Int32 load --table Memos [ {"_key": "Groonga is fast!", "tags": ["full-text-search"], "year": 2019}, {"_key": "Mroonga is fast!", "tags": ["mysql", "full-text-search"], "year": 2019}, {"_key": "Groonga sticker!", "tags": ["full-text-search", "sticker"], "year": 2020}, {"_key": "Rroonga is fast!", "tags": ["full-text-search", "ruby"], "year": 2020}, {"_key": "Groonga is good!", "tags": ["full-text-search"], "year": 2020} ] select Memos \ --filter '_id > 0' \ --drilldowns[tags].keys 'tags,year >= 2020' \ --drilldowns[tags].output_columns _key[0],_key[1],_nsubrecs select Memos \ --filter '_id > 0' \ --drilldowns[tags].keys 'tags,year >= 2020' \ --drilldowns[tags].output_columns _key[1],_nsubrecs
Fixed a bug that the near phrase search did not match when the same phrase occurs multiple times as below.
table_create Entries TABLE_NO_KEY column_create Entries content COLUMN_SCALAR Text table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer TokenNgram \ --normalizer NormalizerNFKC121 column_create Terms entries_content COLUMN_INDEX|WITH_POSITION Entries content load --table Entries [ {"content": "a x a x b x x"}, {"content": "a x x b x"} ] select Entries \ --match_columns content \ --query '*NP2"a b"' \ --output_columns '_score, content'
Thanks¶
Hung Nguyen V
naoa
timgates42 [Provided the patch at GitHub#1155]
Release 10.0.9 - 2020-12-01¶
Improvements¶
Improved performance when we specified
-1
tolimit
.[reference_acquire] Added a new option
--auto_release_count
.Groonga reduces a reference count automatically when a request reaching the number of value that is specified in
--auto_release_count
.For example, the acquired reference of
Users
is released automatically after the secondstatus
is processed as below.reference_acquire --target_name Users --auto_release_count 2 status # Users is still referred. status # Users' reference is released after this command is processed.
We can prevent a leak of the release of acquired reference by this option.
Modify behavior when Groonga evaluated empty
vector
anduvector
.Empty
vector
anduvector
are evaluated tofalse
in command version 3.This behavior is valid for only command version 3.
Note that the different behavior until now.
[Normalizers] Added a new Normalizer
NormalizerNFKC130
based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 13.0.[Token filters] Added a new TokenFilter
TokenFilterNFKC130
based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 13.0.[select] Improved performance for
"_score = column - X"
.[reference_acquire] Improved that
--reference_acquire
doesn’t get unnecessary reference of index column when we specify the--recursive dependent
option.From this release, the targets of
--recursive dependent
are only the target table’s key and the index column that is set to data column.
[select] Add support for ordered near phrase search.
Until now, the near phrase search have only looked for records that the distance of between specified phrases near.
This feature look for satisfy the following conditions records.
If the distance of between specified phrases is near.
If the specified phrases are in line with specified order.
This feature use
*ONP
as an operator. (Note that the operator isn’t*NP
.)$
is handled as itself in the query syntax. Note that it isn’t a special character in the query syntax.If we use script syntax, this feature use as below.
table_create Entries TABLE_NO_KEY column_create Entries content COLUMN_SCALAR Text table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer 'TokenNgram("unify_alphabet", false, \ "unify_digit", false)' \ --normalizer NormalizerNFKC121 column_create Terms entries_content COLUMN_INDEX|WITH_POSITION Entries content load --table Entries [ {"content": "abcXYZdef"}, {"content": "abebcdXYZdef"}, {"content": "abcdef"}, {"content": "defXYZabc"}, {"content": "XYZabc"}, {"content": "abc123456789def"}, {"content": "abc12345678def"}, {"content": "abc1de2def"} ] select Entries --filter 'content *ONP "abc def"' --output_columns '_score, content' [ [ 0, 0.0, 0.0 ], [ [ [ 4 ], [ [ "_score", "Int32" ], [ "content", "Text" ] ], [ 1, "abcXYZdef" ], [ 1, "abcdef" ], [ 1, "abc12345678def" ], [ 1, "abc1de2def" ] ] ] ]
If we use query syntax, this feature use as below.
select Entries --query 'content:*ONP "abc def"' --output_columns '_score, content' [ [ 0, 0.0, 0.0 ], [ [ [ 4 ], [ [ "_score", "Int32" ], [ "content", "Text" ] ], [ 1, "abcXYZdef" ], [ 1, "abcdef" ], [ 1, "abc12345678def" ], [ 1, "abc1de2def" ] ] ] ]
[httpd] Updated bundled nginx to 1.19.5.
Fixes¶
[Groonga HTTP server] Fixed that Groonga HTTP server finished without waiting all woker threads finished completely.
Until now, Groonga HTTP server had started the process of finish after worker threads finished itself process. However, worker threads may not finish completely at this timing. Therefore, Groonga HTTP server may crashed according to timing. Because Groonga HTTP server may free area of memory that worker threads are using them yet.
Release 10.0.8 - 2020-10-29¶
Improvements¶
[select] Added support for large drilldown keys.
The maximum on the key size of Groonga’s tables are 4KiB. However, if we specify multiple keys in drilldown, the size of drilldown keys may be larger than 4KiB.
For example, if the total size for
tag
key andn_like
key is lager than 4KiB in the following case, the drilldown had failed.
select Entries \ --limit -1 \ --output_columns tag,n_likes \ --drilldowns[tag.n_likes].keys tag,n_likes \ --drilldowns[tag.n_likes].output_columns _value.tag,_value.n_likes,_nsubrecs
Because the drilldown packes specifying all keys for drilldown. So, if each the size of drilldown key is large, the size of the packed drilldown keys is larger than 4KiB.
This feature requires xxHash .
However, if we install Groonga from package, we can use this feature without doing anything special. Because Groonga’s package already include xxHash .
[select] Added support for handling as the same dynamic column even if columns refer to different tables.
We can’t have handled the same dynamic column if columns refer to different tables until now. Because the type of columns is different.
However, from this version, we can handle the same dynamic column even if columns refer to different tables by casting to built-in types as below.
table_create Store_A TABLE_HASH_KEY ShortText table_create Store_B TABLE_HASH_KEY ShortText table_create Customers TABLE_HASH_KEY Int32 column_create Customers customers_A COLUMN_VECTOR Store_A column_create Customers customers_B COLUMN_VECTOR Store_B load --table Customers [ {"_key": 1604020694, "customers_A": ["A", "B", "C"]}, {"_key": 1602724694, "customers_B": ["Z", "V", "Y", "T"]}, ] select Customers \ --filter '_key == 1604020694' \ --columns[customers].stage output \ --columns[customers].flags COLUMN_VECTOR \ --columns[customers].type ShortText \ --columns[customers].value 'customers_A' \ --output_columns '_key, customers'
We have needed to set
Store_A
orStore_B
in thetype
ofcustomers
column until now.The type of
customers_A
column cast toShortText
in the above example.By this, we can also set the value of
customers_B
in the value ofcustomers
column. Because both the key ofcustomers_A
andcustomers_B
areShortText
type.
[select] Improved performance when the number of records for search result are huge.
This optimization works when below cases.
--filter 'column <= "value"'
or--filter 'column >= "value"'
--filter 'column == "value"'
--filter 'between(...)'
or--filter 'between(_key, ...)'
--filter 'sub_filter(reference_column, ...)'
Comparing against
_key
such as--filter '_key > "value"'
.--filter 'geo_in_circle(...)'
Updated bundled LZ4 to 1.9.2 from 1.8.2.
Added support xxHash 0.8
[httpd] Updated bundled nginx to 1.19.4.
Fixes¶
Fixed the following bugs related the browser based administration tool. [GitHub#1139][Reported by sutamin]
The problem that Groonga’s logo had not been displayed.
The problem that the throughput chart had not been displayed on the index page for Japanese.
[between] Fixed a bug that
between(_key, ...)
is always evaluated by sequential search.
Thanks¶
sutamin
Release 10.0.7 - 2020-09-29¶
Improvements¶
[highlight], [highlight_full] Added support for normalizer options.
[Return code] Added a new return code
GRN_CONNECTION_RESET
for resetting connection.it is returned when an existing connection was forcibly close by the remote host.
Dropped Ubuntu 19.10 (Eoan Ermine).
Because this version has been EOL.
[httpd] Updated bundled nginx to 1.19.2.
[grndb] Added support for detecting duplicate keys.
grndb check
is also able to detect duplicate keys since this release.This check valid except a table of
TABLE_NO_KEY
.If the table that was detected duplicate keys by
grndb check
has only index columns, we can recover bygrndb recover
.
[table_create], [column_create] Added a new option
--path
We can store specified a table or a column to any path using this option.
This option is useful if we want to store a table or a column that we often use to fast storage (e.g. SSD) and store them that we don’t often use to slow storage (e.g. HDD).
We can specify both relative path and absolute path in this option.
If we specify relative path in this option, the path is resolved the path of
groonga
process as the origin.
However, if we specify
--path
, the result ofdump
command includes--path
informations.Therefore, if we specify
--path
, we can’t restore to host in different enviroment.Because the directory configuration and the location of
groonga
process in different by each enviroment.
If we don’t want include
--path
informations to a dump, we need specify--dump_paths no
indump
command.
[dump] Added a new option
--dump_paths
.--dump_paths
option control whether--path
is dumped or not.The default value of it is
yes
.If we specify
--path
when we create tables or columns and we don’t want include--path
informations to a dump, we specifyno
into--dump_paths
when we executedump
command.
[functions] Added a new function
string_toknize()
.It tokenizes the column value that is specified in the second argument with the tokenizer that is specified in the first argument.
[tokenizers] Added a new tokenizer
TokenDocumentVectorTFIDF
(experimental).It generates automatically document vector by TF-IDF.
[tokenizers] Added a new tokenizer
TokenDocumentVectorBM25
(experimental).It generates automatically document vector by BM25.
[select] Added support for near search in same sentence.
Fixes¶
[load] Fixed a bug that
load
didn’t a return response when we executed it against 257 columns.This bug may occur from 10.0.4 or later.
This bug only occur when we load data by using
[a, b, c, ...]
format.If we load data by using
[{...}]
, this bug doesn’t occur.
[MessagePack] Fixed a bug that float32 value wasn’t unpacked correctly.
Fixed the following bugs related multi column index.
_score
may be broken with full text search.The records that couldn’t hit might hit.
Please refer to the following URL for the details about occurrence conditionin of this bug.
Release 10.0.6 - 2020-08-29¶
Improvements¶
[logical_range_filter] Improved search plan for large data.
Normally,
logical_range_filter
is faster thanlogical_select
. However, it had been slower thanlogical_select
in the below case.If Groonga can’t get the number of required records easily, it has the feature that switches index search from sequential search. (Normally,
logical_range_filter
uses a sequential search when records of search target are many.)The search process for it is almost the same as
logical_select
if the above switching occurred. So,logical_range_filter
is severalfold slower thanlogical_select
in the above case if the search target is large data. Becauselogical_range_filter
executes sort after the search.
If we search for large data, Groonga easily use sequential search than until now since this release.
Therefore,
logical_range_filter
will improve performance. Because the case of the search process almost the same aslogical_select
decreases.
[httpd] Updated bundled nginx to 1.19.1.
Modify how to install into Debian GNU/Linux.
We modify to use
groonga-apt-source
instead ofgroonga-archive-keyring
. Because thelintian
command recommends usingapt-source
if a package that it puts files under the/etc/apt/sources.lists.d/
.The
lintian
command is the command which checks for many common packaging errors.Please also refer to the following URL for the details about installation procedures.
[logical_select] Added a support for
highlight_html
andhighlight_full
.Added support for recycling the IDs of records that are deleted when an array without value space delete.[GitHub#mroonga/mroonga#327][Reported by gaeeyo]
If an array that doesn’t have value space is deleted, deleted IDs are never recycled.
Groonga had used large storage space by large ID. Because it uses large storage space by itself.
For example, large ID is caused after many adds and deletes like Mroonga’s
mroonga_operations
[select] Improved performance of full-text-search without index.
[Function] Improved performance for calling of function that all arguments a variable reference or literal.
[Indexing] Improved performance of offline index construction by using token column. [GitHub#1126][Patched by naoa]
Improved performance for
"_score = func(...)"
.The performance when the
_score
value calculate by using only function like"_score = func(...)"
improved.
Fixes¶
Fixed a bug that garbage may be included in response after response send error.
It may occur if a client didn’t read all responses and closed the connection.
Thanks¶
gaeeyo
naoa
Release 10.0.5 - 2020-07-30¶
Improvements¶
[select] Added support for storing reference in table that we specify with
--load_table
.--load_table
is a feature that stores search results to the table in a prepared.If the searches are executed multiple times, we can cache the search results by storing them to this table.
We can shorten the search times that the search after the first time by using this table.
We can store a reference to other tables into the key of this table as below since this release.
We can make a smaller size of this table. Because we only store references without store column values.
If we search against this table, we can search by using indexes for reference destination.
table_create Logs TABLE_HASH_KEY ShortText column_create Logs timestamp COLUMN_SCALAR Time table_create Times TABLE_PAT_KEY Time column_create Times logs_timestamp COLUMN_INDEX Logs timestamp table_create LoadedLogs TABLE_HASH_KEY Logs load --table Logs [ { "_key": "2015-02-03:1", "timestamp": "2015-02-03 10:49:00" }, { "_key": "2015-02-03:2", "timestamp": "2015-02-03 12:49:00" }, { "_key": "2015-02-04:1", "timestamp": "2015-02-04 00:00:00" } ] select \ Logs \ --load_table LoadedLogs \ --load_columns "_key" \ --load_values "_key" \ --limit 0 select \ --table LoadedLogs \ --filter 'timestamp >= "2015-02-03 12:49:00"' [ [ 0, 0.0, 0.0 ], [ [ [ 2 ], [ [ "_id", "UInt32" ], [ "_key", "ShortText" ], [ "timestamp", "Time" ] ], [ 2, "2015-02-03:2", 1422935340.0 ], [ 3, "2015-02-04:1", 1422975600.0 ] ] ] ]
[select] Improved sort performance on below cases.
When many sort keys need ID resolution.
For example, the following expression needs ID resolution.
--filter true --sort_keys column
For example, the following expression doesn’t need ID resolution. Because the
_score
pseudo column exists in the result table not the source table.--filter true --sort_keys _score
When a sort target table has a key.
Therefore,
TABLE_NO_KEY
isn’t supported this improvement.
[select] Improved performance a bit on below cases.
A case of searching for many records matches.
A case of drilldown for many records.
[aggregator] Added support for score accessor for target. [GitHub#1120][Patched by naoa]
For example, we can
_score
subject toaggregator_*
as below.table_create Items TABLE_HASH_KEY ShortText column_create Items price COLUMN_SCALAR UInt32 column_create Items tag COLUMN_SCALAR ShortText load --table Items [ {"_key": "Book", "price": 1000, "tag": "A"}, {"_key": "Note", "price": 1000, "tag": "B"}, {"_key": "Box", "price": 500, "tag": "B"}, {"_key": "Pen", "price": 500, "tag": "A"}, {"_key": "Food", "price": 500, "tag": "C"}, {"_key": "Drink", "price": 300, "tag": "B"} ] select Items \ --filter true \ --drilldowns[tag].keys tag \ --drilldowns[tag].output_columns _key,_nsubrecs,score_mean \ --drilldowns[tag].columns[score_mean].stage group \ --drilldowns[tag].columns[score_mean].type Float \ --drilldowns[tag].columns[score_mean].flags COLUMN_SCALAR \ --drilldowns[tag].columns[score_mean].value 'aggregator_mean(_score)' [ [ 0, 0.0, 0.0 ], [ [ [ 6 ], [ [ "_id", "UInt32" ], [ "_key", "ShortText" ], [ "price", "UInt32" ], [ "tag", "ShortText" ] ], [ 1, "Book", 1000, "A" ], [ 2, "Note", 1000, "B" ], [ 3, "Box", 500, "B" ], [ 4, "Pen", 500, "A" ], [ 5, "Food", 500, "C" ], [ 6, "Drink", 300, "B" ] ], { "tag": [ [ 3 ], [ [ "_key", "ShortText" ], [ "_nsubrecs", "Int32" ], [ "score_mean", "Float" ] ], [ "A", 2, 1.0 ], [ "B", 3, 1.0 ], [ "C", 1, 1.0 ] ] } ] ]
[Indexing] Improved performance of offline index construction on VC++ version.
[select] Use
null
insteadNaN
,Infinity
, and-Infinity
when Groonga outputs result for JSON format.Because JSON doesn’t support them.
[select] Add support fot aggregating standard deviation value.
For example, we can calculate a standard deviation for every group as below.
table_create Items TABLE_HASH_KEY ShortText column_create Items price COLUMN_SCALAR UInt32 column_create Items tag COLUMN_SCALAR ShortText load --table Items [ {"_key": "Book", "price": 1000, "tag": "A"}, {"_key": "Note", "price": 1000, "tag": "B"}, {"_key": "Box", "price": 500, "tag": "B"}, {"_key": "Pen", "price": 500, "tag": "A"}, {"_key": "Food", "price": 500, "tag": "C"}, {"_key": "Drink", "price": 300, "tag": "B"} ] select Items \ --drilldowns[tag].keys tag \ --drilldowns[tag].output_columns _key,_nsubrecs,price_sd \ --drilldowns[tag].columns[price_sd].stage group \ --drilldowns[tag].columns[price_sd].type Float \ --drilldowns[tag].columns[price_sd].flags COLUMN_SCALAR \ --drilldowns[tag].columns[price_sd].value 'aggregator_sd(price)' \ --output_pretty yes [ [ 0, 1594339851.924836, 0.002813816070556641 ], [ [ [ 6 ], [ [ "_id", "UInt32" ], [ "_key", "ShortText" ], [ "price", "UInt32" ], [ "tag", "ShortText" ] ], [ 1, "Book", 1000, "A" ], [ 2, "Note", 1000, "B" ], [ 3, "Box", 500, "B" ], [ 4, "Pen", 500, "A" ], [ 5, "Food", 500, "C" ], [ 6, "Drink", 300, "B" ] ], { "tag": [ [ 3 ], [ [ "_key", "ShortText" ], [ "_nsubrecs", "Int32" ], [ "price_sd", "Float" ] ], [ "A", 2, 250.0 ], [ "B", 3, 294.3920288775949 ], [ "C", 1, 0.0 ] ] } ] ]
We can also calculate sample standard deviation to specifing
aggregate_sd(target, {"unbiased": true})
.
[Windows] Dropped Visual Studio 2013 support.
Fixes¶
[Groonga HTTP server] Fixed a bug that a request can’t halt even if we execute
shutdown?mode=immediate
when the response was halted by error occurrence.Fixed a crash bug when an error occurs while a request.
It only occurs when we use Apache Arrow Format.
Groonga crashes when we send request to Groonga again after the previous request was halted by error occurrence.
[between] Fixed a crash bug when temporary table is used.
For example, if we specify a dynamic column in the first argument for
between
, Groonga had crashed.
Fixed a bug that procedure created by plugin is freed unexpectedly.
It’s only occurred in reference count mode.
It’s not occurred if we don’t use
plugin_register
.It’s not occurred in the process that executes
plugin_register
.It’s occurred in the process that doesn’t execute
plugin_register
.
Fixed a bug that normalization error occurred while static index construction by
token_column
. [GitHub#1122][Reported by naoa]
Thanks¶
naoa
Release 10.0.4 - 2020-06-29¶
Improvements¶
[Tables] Added support for registering 400M records into a hash table.
[select] Improve scorer performance when the
_score
doesn’t get recursively values.Groonga get recursively value of
_score
when search result is search target.For example, the search targets of
slices
are search result. Therefore, if we useslice
in a query, this improvement doesn’t ineffective.
[Log] Improved that we output drilldown keys in query-log.
[reference_acquire], [reference_release] Added new commands for reference count mode.
If we need to call multiple
load
in a short time, auto close by the reference count mode will degrade performance.We can avoid the performance degrading by calling
reference_acquire
before multipleload
and callingreference_release
after multipleload
. Betweenreference_acquire
andreference_release
, auto close is disabled.Because
reference_acquire
acquires a reference of target objects.
We can must call
reference_release
after you finish performance impact operations.If we don’t call
reference_release
, the reference count mode doesn’t work.
[select] Added support for aggregating multiple groups on one time
drilldown
.We came to be able to calculate sum or arithmetic mean every different multiple groups on one time
drilldown
as below.table_create Items TABLE_HASH_KEY ShortText column_create Items price COLUMN_SCALAR UInt32 column_create Items quantity COLUMN_SCALAR UInt32 column_create Items tag COLUMN_SCALAR ShortText load --table Items [ {"_key": "Book", "price": 1000, "quantity": 100, "tag": "A"}, {"_key": "Note", "price": 1000, "quantity": 10, "tag": "B"}, {"_key": "Box", "price": 500, "quantity": 15, "tag": "B"}, {"_key": "Pen", "price": 500, "quantity": 12, "tag": "A"}, {"_key": "Food", "price": 500, "quantity": 111, "tag": "C"}, {"_key": "Drink", "price": 300, "quantity": 22, "tag": "B"} ] select Items \ --drilldowns[tag].keys tag \ --drilldowns[tag].output_columns _key,_nsubrecs,price_sum,quantity_sum \ --drilldowns[tag].columns[price_sum].stage group \ --drilldowns[tag].columns[price_sum].type UInt32 \ --drilldowns[tag].columns[price_sum].flags COLUMN_SCALAR \ --drilldowns[tag].columns[price_sum].value 'aggregator_sum(price)' \ --drilldowns[tag].columns[quantity_sum].stage group \ --drilldowns[tag].columns[quantity_sum].type UInt32 \ --drilldowns[tag].columns[quantity_sum].flags COLUMN_SCALAR \ --drilldowns[tag].columns[quantity_sum].value 'aggregator_sum(quantity)' [ [ 0, 0.0, 0.0 ], [ [ [ 6 ], [ [ "_id", "UInt32" ], [ "_key", "ShortText" ], [ "price", "UInt32" ], [ "quantity", "UInt32" ], [ "tag", "ShortText" ] ], [ 1, "Book", 1000, 100, "A" ], [ 2, "Note", 1000, 10, "B" ], [ 3, "Box", 500, 15, "B" ], [ 4, "Pen", 500, 12, "A" ], [ 5, "Food", 500, 111, "C" ], [ 6, "Drink", 300, 22, "B" ] ], { "tag": [ [ 3 ], [ [ "_key", "ShortText" ], [ "_nsubrecs", "Int32" ], [ "price_sum", "UInt32" ], [ "quantity_sum", "UInt32" ] ], [ "A", 2, 1500, 112 ], [ "B", 3, 1800, 47 ], [ "C", 1, 500, 111 ] ] } ] ]
[groonga executable file] Added support for
--pid-path
in standalone mode.Because
--pid-path
had been ignored in standalone mode in before version.
[io_flush] Added support for reference count mode.
[logical_range_filter], [logical_count] Added support for reference count mode.
[Groonga HTTP server] We didn’t add header after the last chunk.
Because there is a possibility to exist that the HTTP client ignores header after the last chunk.
[vector_slice] Added support for a vector that has the value of the
Float32
type. [GitHub#1112 patched by naoa]Added support for parallel offline index construction using token column.
We came to be able to construct an offline index on parallel threads from data that are tokenized in advance.
We can tune parameters of parallel offline construction by the following environment variables
GRN_TOKEN_COLUMN_PARALLEL_CHUNK_SIZE
: We specify how many records are processed per thread.The default value is
1024
records.
GRN_TOKEN_COLUMN_PARALLEL_TABLE_SIZE_THRESHOLD
: We specify how many source records are required for parallel offline construction.The default value is
102400
records.
[select] Improved performance for
load_table
on the reference count mode.
Fixes¶
Fixed a bug that the database of Groonga was broken when we search by using the dynamic columns that don’t specify a
--filter
and stridden over shard.Fixed a bug that
Float32
type had not displayed on a result ofschema
command.Fixed a bug that we count in surplus to
_nsubrecs
when the referenceuvector
hasn’t element.
Thanks¶
naoa
Release 10.0.3 - 2020-05-29¶
Improvements¶
We came to be able to construct an inverted index from data that are tokenized in advance.
The construct of an index is speeded up from this.
We need to prepare token column to use this improvement.
token column is an auto generated value column like an index column.
token column value is generated from source column value by tokenizing the source column value.
We can create a token column by setting the source column as below.
table_create Terms TABLE_PAT_KEY ShortText \ --normalizer NormalizerNFKC121 \ --default_tokenizer TokenNgram table_create Notes TABLE_NO_KEY column_create Notes title COLUMN_SCALAR Text # The last "title" is the source column. column_create Notes title_terms COLUMN_VECTOR Terms title
[select] We came to be able to specify a
vector
for the argument of a function.For example,
flags
options ofquery
can describe by avector
as below.select \ --table Memos \ --filter 'query("content", "-content:@mroonga", \ { \ "expander": "QueryExpanderTSV", \ "flags": ["ALLOW_LEADING_NOT", "ALLOW_COLUMN"] \ })'
[select] Added a new stage
result_set
for dynamic columns.This stage generates a column into a result set table. Therefore, it is not generated if
query
orfilter
doesn’t existBecause if
query
orfilter
doesn’t exist, Groonga doesn’t make a result set table.
We can’t use
_value
for the stage. Theresult_set
stage is for storing value byscore_column
.
[vector_slice] Added support for weight vector that has weight of
Float32
type. [GitHub#1106 patched by naoa][select] Added support for
filtered
stage andoutput
stage of dynamic columns on drilldowns. [GitHub#1101 patched by naoa][GitHub#1100 patched by naoa]We can use
filtered
andoutput
stage of dynamic columns on drilldowns as withdrilldowns[Label].stage filtered
anddrilldowns[Label].stage output
.
[select] Added support for
Float
type value in aggregating on drilldown.We can aggregate max value, min value, and sum value for
Float
type value usingMAX
,MIN
, andSUM
.
[query] [geo_in_rectangle] [geo_in_circle] Added a new option
score_column
forquery()
,geo_in_rectangle()
, andgeo_in_circle()
.We can store a score value by condition using
score_column
.Normally, Groonga calculate a score by adding scores of all conditions. However, we sometimes want to get a score value by condition.
For example, if we want to only use how near central coordinate as score as below, we use
score_column
.
table_create LandMarks TABLE_NO_KEY column_create LandMarks name COLUMN_SCALAR ShortText column_create LandMarks category COLUMN_SCALAR ShortText column_create LandMarks point COLUMN_SCALAR WGS84GeoPoint table_create Points TABLE_PAT_KEY WGS84GeoPoint column_create Points land_mark_index COLUMN_INDEX LandMarks point load --table LandMarks [ {"name": "Aries" , "category": "Tower" , "point": "11x11"}, {"name": "Taurus" , "category": "Lighthouse", "point": "9x10" }, {"name": "Gemini" , "category": "Lighthouse", "point": "8x8" }, {"name": "Cancer" , "category": "Tower" , "point": "12x12"}, {"name": "Leo" , "category": "Tower" , "point": "11x13"}, {"name": "Virgo" , "category": "Temple" , "point": "22x10"}, {"name": "Libra" , "category": "Tower" , "point": "14x14"}, {"name": "Scorpio" , "category": "Temple" , "point": "21x9" }, {"name": "Sagittarius", "category": "Temple" , "point": "43x12"}, {"name": "Capricorn" , "category": "Tower" , "point": "33x12"}, {"name": "Aquarius" , "category": "mountain" , "point": "55x11"}, {"name": "Pisces" , "category": "Tower" , "point": "9x9" }, {"name": "Ophiuchus" , "category": "mountain" , "point": "21x21"} ] select LandMarks \ --sort_keys 'distance' \ --columns[distance].stage initial \ --columns[distance].type Float \ --columns[distance].flags COLUMN_SCALAR \ --columns[distance].value 0.0 \ --output_columns 'name, category, point, distance, _score' \ --limit -1 \ --filter 'geo_in_circle(point, "11x11", "11x1", {"score_column": distance}) && category == "Tower"' [ [ 0, 1590647445.406149, 0.0002503395080566406 ], [ [ [ 5 ], [ [ "name", "ShortText" ], [ "category","ShortText" ], [ "point", "WGS84GeoPoint" ], [ "distance", "Float" ], [ "_score", "Int32" ] ], [ "Aries", "Tower", "11x11", 0.0, 1 ], [ "Cancer", "Tower", "12x12", 0.0435875803232193, 1 ], [ "Leo", "Tower", "11x13", 0.06164214760065079, 1 ], [ "Pisces", "Tower", "9x9", 0.0871751606464386, 1 ], [ "Libra", "Tower", "14x14", 0.1307627409696579, 1 ] ] ] ]
The sort by
_score
is meaningless in the above example. Because the value of_score
is all1
bycategory == "Tower"
. However, we can sort distance from central coordinate usingscore_column
.
[Windows] Groonga came to be able to output backtrace when it occurs error even if it doesn’t crash.
[Windows] Dropped support for old Windows.
Groonga for Windows come to require Windows 8 (Windows Server 2012) or later from 10.0.3.
[select] Improved sort performance when sort keys were mixed referable sort keys and the other sort keys.
We improved sort performance if mixed referable sort keys and the other and there are referable keys two or more.
Referable sort keys are sort keys that except below them.
Compressed columns
_value
against the result of drilldown that is specified multiple values to the key of drilldown._key
against patricia trie table that has not the key ofShortText
type._score
The more sort keys that except string, a decrease in the usage of memory for sort.
[select] Improved sort performance when sort keys are all referable keys case.
[select] Improve scorer performance as a
_socre = column1*X + column2*Y + ...
case.This optimization effective when there are many
+
or*
in_score
.At the moment, it has only effective against
+
and*
.
[select] Added support for phrase near search.
We can search phrase by phrase by a near search.
Query syntax for near phrase search is
*NP"Phrase1 phrase2 ..."
.Script syntax for near phrase search is
column *NP "phrase1 phrase2 ..."
.If the search target phrase includes space, we can search for it by surrounding it with
"
as below.table_create Entries TABLE_NO_KEY column_create Entries content COLUMN_SCALAR Text table_create Terms TABLE_PAT_KEY ShortText \ --default_tokenizer 'TokenNgram("unify_alphabet", false, \ "unify_digit", false)' \ --normalizer NormalizerNFKC121 column_create Terms entries_content COLUMN_INDEX|WITH_POSITION Entries content load --table Entries [ {"content": "I started to use Groonga. It's very fast!"}, {"content": "I also started to use Groonga. It's also very fast! Really fast!"} ] select Entries --filter 'content *NP "\\"I started\\" \\"use Groonga\\""' --output_columns 'content' [ [ 0, 1590469700.715882, 0.03997230529785156 ], [ [ [ 1 ], [ [ "content", "Text" ] ], [ "I started to use Groonga. It's very fast!" ] ] ] ]
[Vector column] Added support for
float32
weight vector.We can store weight as
float32
instead ofuint32
.We need to add
WEIGHT_FLOAT32
flag when executecolumn_create
to use this feature.column_create Records tags COLUMN_VECTOR|WITH_WEIGHT|WEIGHT_FLOAT32 Tags
However,
WEIGHT_FLOAT32
flag isn’t available withCOLUMN_INDEX
flag for now.
Added following APIs
Added
grn_obj_is_xxx
functions. For more information as below.grn_obj_is_weight_vector(grn_ctx *ctx, grn_obj *obj)
It returns as a
bool
whether the object is a weight vector.
grn_obj_is_uvector(grn_ctx *ctx, grn_obj *obj)
It returns as a
bool
whether the object is auvector
.uvector
is avector
that size of elements forvector
are fixed.
grn_obj_is_weight_uvector(grn_ctx *ctx, grn_obj *obj)
It returns as a
bool
whether the object is a weight uvector.
Added
grn_type_id_size(grn_ctx *ctx, grn_id id)
.It returns the size of Groonga data type as a
size_t
.
Added
grn_selector_data_get_xxx
functions. For more information as below.These functions return selector related data.
These functions are supposed to call in selector. If they are called except in selector, they return
NULL
.grn_selector_data_get(grn_ctx *ctx)
It returns all information that relating calling selector as
grn_selector_data *
structure.
grn_selector_data_get_selector(grn_ctx *ctx, grn_selector_data *data)
It returns selector itself as
grn_obj *
.
grn_selector_data_get_expr(grn_ctx *ctx, grn_selector_data *data)
It returns selector is used
--filter
condition and--query
condition asgrn_obj *
.
grn_selector_data_get_table(grn_ctx *ctx, grn_selector_data *data)
It returns target table as
grn_obj *
grn_selector_data_get_index(grn_ctx *ctx, grn_selector_data *data)
It returns index is used by selector as
grn_obj *
.
grn_selector_data_get_args(grn_ctx *ctx, grn_selector_data *data, size_t *n_args)
It returns arguments of function that called selector as
grn_obj **
.
grn_selector_data_get_result_set(grn_ctx *ctx, grn_selector_data *data)
It returns result table as
grn_obj *
.
grn_selector_data_get_op(grn_ctx *ctx, grn_selector_data *data)
It returns how to perform the set operation on existing result set as
grn_operator
.
Added
grn_plugin_proc_xxx
functions. For more information as below.grn_plugin_proc_get_value_operator(grn_ctx *ctx, grn_obj *value, grn_operator default_operator, const char *context)
It returns the operator of a query as a
grn_operator
.For example,
&&
is returned as aGRN_OP_AND
.
grn_plugin_proc_get_value_bool(grn_ctx *ctx, grn_obj *value, bool default_value, const char *tag)
It returns the value that is specified
true
orfalse
likewith_transposition
argument of the below function as abool
(bool
is the data type of C language).fuzzy_search(column, query, {"max_distance": 1, "prefix_length": 0, "max_expansion": 0, "with_transposition": true})
Added
grn_proc_options_xxx
functions. For more information as below.query()
only uses them for now.grn_proc_options_parsev(grn_ctx *ctx, grn_obj *options, const char *tag, const char *name, va_list args)
This function execute parse options.
We had to implement parsing to options ourselves until now, however, we can parse them by just call this function from this version.
grn_proc_options_parse(grn_ctx *ctx, grn_obj *options, const char *tag, const char *name, ...)
It calls
grn_proc_options_parsev()
. Therefore, features of this function samegrn_proc_options_parsev()
.It only differs in the interface compare with
grn_proc_options_parsev()
.
Added
grn_text_printfv(grn_ctx *ctx, grn_obj *bulk, const char *format, va_list args)
grn_text_vprintf
is deprecated from 10.0.3. We usegrn_text_printfv
instead.
Added
grn_type_id_is_float_family(grn_ctx *ctx, grn_id id)
.It returns whether
grn_type_id
isGRN_DB_FLOAT32
orGRN_DB_FLOAT
or not as abool
.
Added
grn_dat_cursor_get_max_n_records(grn_ctx *ctx, grn_dat_cursor *c)
.It returns the number of max records the cursor can have as a
size_t
. (This API is for the DAT table)
Added
grn_table_cursor_get_max_n_records(grn_ctx *ctx, grn_table_cursor *cursor)
.It returns the number of max records the cursor can have as a
size_t
.It can use against all table type (
TABLE_NO_KEY
,TABLE_HASH_KEY
,TABLE_DAT_KEY
, andTABLE_PAT_KEY
).
Added
grn_result_set_add_xxx
functions. For more information as below.grn_result_set_add_record(grn_ctx *ctx, grn_hash *result_set, grn_posting *posting, grn_operator op)
It adds a record into the table of result sets.
grn_ii_posting_add_float
is deprecated from 10.0.3. We usegrn_rset_add_records()
instead.
grn_result_set_add_table(grn_ctx *ctx, grn_hash *result_set, grn_obj *table, double score, grn_operator op)
It adds a table into the result sets.
grn_result_set_add_table_cursor(grn_ctx *ctx, grn_hash *result_set, grn_table_cursor *cursor, double score, grn_operator op)
It adds records that a table cursor has into the result sets.
Added
grn_vector_copy(grn_ctx *ctx, grn_obj *src, grn_obj *dest)
.It copies a
vector
object. It returns whether success copy avector
object.
Added
grn_obj_have_source(grn_ctx *ctx, grn_obj *obj)
.It returns whether the column has a source column as a
bool
.
Added
grn_obj_is_token_column(grn_ctx *ctx, grn_obj *obj)
.It returns whether the column is a token column as a
bool
.
Added
grn_hash_add_table_cursor(grn_ctx *ctx, grn_hash *hash, grn_table_cursor *cursor, double score)
.It’s for bulk result set insert. It’s faster than inserting records by
grn_ii_posting_add()
.
Fixes¶
Fixed a crash bug if the modules (tokenizers, normalizers, and token filters) are used at the same time from multiple threads.
Fixed precision of
Float32
value when it outputted.The precision of it changes to 8-digit to 7-digit from 10.0.3.
Fixed a bug that Groonga used the wrong cache when the query that just the parameters of dynamic column different was executed. [GitHub#1102 patched by naoa]
Thanks¶
naoa
Release 10.0.2 - 2020-04-29¶
Improvements¶
Added support for
uvector
fortime_classify_*
functions. [GitHub#1089][Patched by naoa]uvector
is a vector that size of elements for vector are fixed.For example, a vector that has values of Time type as elements is a
uvector
.
Improve sort performance if sort key that can’t refer value with zero-copy is mixed.
Some sort key (e.g.
_score
) values can’t be referred with zero-copy.If there is at least one sort key that can’t be referable is included, all sort keys are copied before.
With this change, we just copy sort keys that can’t be referred. Referable sort keys are just referred without a copy.
However, this change may have performance regression when all sort keys are referable.
Added support for loading weight vector as a JSON string.
We can load weight vector as a JSON string as below example.
table_create Tags TABLE_PAT_KEY ShortText table_create Data TABLE_NO_KEY column_create Data tags COLUMN_VECTOR|WITH_WEIGHT Tags column_create Tags data_tags COLUMN_INDEX|WITH_WEIGHT Data tags load --table Data [ {"tags": "{\"fruit\": 10, \"apple\": 100}"}, {"tags": "{\"fruit\": 200}"} ]
Added support for
Float32
type.Groonga already has a
Float
type. However, it is double precision floating point number. Therefore if we only use single precision floating point number, it is not efficient.We can select more suitable type by adding a
Float32
type.
Added following APIs
grn_obj_unref(grn_ctx *ctx, grn_obj *obj)
This API is only used on the reference count mode (The reference count mode is a state of
GRN_ENABLE_REFERENCE_COUNT=yes
.).It calls
grn_obj_unlink()
only on the reference count mode. It doesn’t do anything except when the reference count mode.We useful it when we need only call
grn_obj_unlink()
on the referece count mode.Because as the following example, we don’t write condition that whether the reference count mode or not.
The example if we don’t use
grn_obj_unref()
.if (grn_enable_reference_count) { grn_obj_unlink(ctx, obj); }
The example if we use
grn_obj_unref()
.grn_obj_ubref(ctx, obj);
grn_get_version_major(void)
grn_get_version_minor(void)
grn_get_version_micro(void)
They return Groonga’s major, minor, and micor version numbers as a
uint32_t
.
grn_posting_get_record_id(grn_ctx *ctx, grn_posting *posting)
grn_posting_get_section_id(grn_ctx *ctx, grn_posting *posting)
grn_posting_get_position(grn_ctx *ctx, grn_posting *posting)
grn_posting_get_tf(grn_ctx *ctx, grn_posting *posting)
grn_posting_get_weight(grn_ctx *ctx, grn_posting *posting)
grn_posting_get_weight_float(grn_ctx *ctx, grn_posting *posting)
grn_posting_get_rest(grn_ctx *ctx, grn_posting *posting)
They return information on the posting list.
These APIs return value as a
uint32_t
exceptgrn_posting_get_weight_float
.grn_posting_get_weight_float
returns value as afloat
.grn_posting_get_section_id(grn_ctx *ctx, grn_posting *posting)
Section id is the internal representation of the column name.
If column name store in posting list as a string, it is a large amount of information and it use waste capacity.
Therefore, Groonga compresses the amount of information and use storage capacity is small by storing column name in the posting list as a number called section id.
grn_posting_get_tf(grn_ctx *ctx, grn_posting *posting)
tf
ofgrn_posting_get_tf
is Term Frequency score.
grn_posting_get_weight_float(grn_ctx *ctx, grn_posting *posting)
It returns weight of token as a
float
.We suggest using this API when we get a weight of token after this.
Because we modify the internal representation of the weight from
uint32_t
tofloat
in the near future.
Fixes¶
Fixed a bug that Groonga for 32bit on GNU/Linux may crash.
Fixed a bug that unrelated column value may be cleared. [GtiHub#1087][Reported by sutamin]
Fixed a memory leak when we dumped records with
dump
command.Fixed a memory leak when we specified invalid value into
output_columns
.Fixed a memory leak when we executed
snippet
function.Fixed a memory leak when we filled the below conditions.
If we used dynamic columns on the
initial
stage.If we used
slices
argument withselect
command.
Fixed a memory leak when we deleted tables with
logical_table_remove
.Fixed a memory leak when we use the reference count mode.
The reference count mode is a
GRN_ENABLE_REFERENCE_COUNT=yes
state.This mode is experimental. Performance may degrade by this mode.
Fixed a bug that Groonga too much unlink
_key
accessor when we load data for apache arrow format.
Thanks¶
sutamin
naoa
Release 10.0.1 - 2020-03-30¶
We have been released Groonga 10.0.1. Because Ubuntu and Windows(VC++ version) package in Groonga 10.0.0 were mistaken.
If we have already used Groonga 10.0.0 for CentOS, Debian, Windows(MinGW version), no problem with continued use it.
Fixes¶
Added a missing runtime(vcruntime140_1.dll) in package for Windows VC++ version.
Release 10.0.0 - 2020-03-29¶
Improvements¶
[httpd] Updated bundled nginx to 1.17.9.
[httpd] Added support for specifying output type as an extension.
For example, we can write
load.json
instead ofload?output_type=json
.
[Log] Outputted a path of opened or closed file into a log of dump level on Linux.
[Log] Outputted a path of closed file into a log of debug level on Windows.
Added following API and macros
grn_timeval_from_double(grn_ctx, double)
This API converts
double
type togrn_timeval
type.It returns value of
grn_timeval
type.
GRN_TIMEVAL_TO_NSEC(timeval)
This macro converts value of
grn_timeval
type to nanosecond as the value ofuint64_t
type.
GRN_TIME_USEC_TO_SEC(usec)
This macro converts microsecond to second.
Deprecated the following macro.
GRN_OBJ_FORMAT_FIN(grn_ctx, grn_obj_format)
We
grn_obj_format_fin(grn_ctx, grn_obj_format)
use instead since 10.0.0.
[logical_range_filter],[dump] Added support for stream output.
This feature requires
command_version 3
or later. The header content is outputted after the body content.Currently, this feature support only
dump
andlogical_range_filter
.logical_range_filter
always returns the output as a stream oncommand_version 3
or later.This feature has the following limitations.
-1 is only allowed for negative
limit
MessagePack output isn’t supported
We a little changed the response contents of JSON by this modify.
The key order differs from before versions as below.
The key order in before versions.
{ "header": {...}, "body": {...} }
The key order in this version(10.0.0).
{ "body": {...}, "header": {...} }
Disabled caches of
dump
andlogical_range_filter
when they execute oncommand_version 3
.Because of
dump
andlogical_range_filter
oncommand_version 3
returns stream since 10.0.0, Groonga can not cache the whole response.
[logical_range_filter] Added support for outputting response as Apache Arrow format.
Supported data type as below.
UInt8
Int8
UInt16
Int16
UInt32
Int32
UInt64
Int64
Time
ShortText
Text
LongText
Vector
ofInt32
Reference vector
Supported Ubuntu 20.04 (Focal Fossa).
Dropped Ubuntu 19.04 (Disco Dingo).
Because this version has been EOL.
Release 9.1.2 - 2020-01-29¶
Improvements¶
[tools] Added a script for copying only files of specify tables or columns.
This script name is copy-related-files.rb.
This script is useful if we want to extract specifying tables or columns from a huge database.
Related files of specific tables or columns may need for reproducing fault.
If we difficult to offer a database whole, we can extract related files of target tables or columns by this tool.
[shutdown] Accept
/d/shutdown?mode=immediate
immediately even when all threads are used.This feature can only use on the Groonga HTTP server.
Unused objects free immediately by using
GRN_ENABLE_REFERENCE_COUNT=yes
.This feature is experimental. Performance degrade by this feature.
If we load to span many tables, we can expect to keep in the usage of memory by this feature.
[CentOS] We prepare
groonga-release
by version.Note that the little modification how to install.
[Debian GNU/Linux] We use
groonga-archive-keyring
for adding the Groonga apt repository.We can easy to add the Groonga apt repository to our system by this improvement.
groonga-archive-keyring
includes all information for using the Groonga apt repository. Thus, we need not be conscious of changing of repository information or PGP key by installing this package.groonga-archive-keyring
is deb package. Thus, we can easy to update byapt update
.
Release 9.1.1 - 2020-01-07¶
Improvements¶
[load] Added support for Apache Arrow format data.
If we use Apache Arrow format data, we may reduce parse cost. Therefore, data might be loading faster than other formats.
Groonga can also directly input data for Apache Arrow format from other data analysis systems by this improvement.
However, Apache Arrow format can use in the HTTP interface only. We can’t use it in the command line interface.
[load] Added how to load Apache Arrow format data in the document.
[load] Improve error message.
Response of
load
command includes error message also.If we faile data load, Groonga output error detail of
load
command by this Improvement.
[httpd] Updated bundled nginx to 1.17.7.
[Groonga HTTP server] Added support for sending command parameters by body of HTTP request.
We must set
application/x-www-form-urlencoded
toContent-Type
for this case.
[Groonga HTTP server] Added how to use HTTP POST in the document.
Release 9.1.0 - 2019-11-29¶
Improvements¶
Improved the performance of the “&&” operation.
For example, the performance of condition expression such as the following is increased.
( A || B ) && ( C || D ) && ( E || F) …
[TokenMecab] Added a new option
use_base_form
We can search using the base form of a token by this option.
For example, if we search “支えた” using this option, “支える” is hit also.
Fixes¶
Fix a bug that when the accessor is index, performance decreases.
For example, it occurs with the query include the following conditions.
accessor @ query
accessor == query
Fixed a bug the estimated size of a search result was overflow when the buffer is big enough. [PGroonga#GitHub#115][Reported by Albert Song]
Improved a test(1) portability. [GitHub#1065][Patched by OBATA Akio]
Added missing tools.
Because
index-column-diff-all.sh
andobject-inspect-all.sh
had not bundled in before version.
Thanks¶
Albert Song
OBATA Akio
Release 9.0.9 - 2019-10-30¶
Note
Maybe performance decreases from this version. Therefore, If performance decreases than before, please report us with reproducible steps.
Improvements¶
[Log] Improved that output the sending time of response into query-log.
[status] Added that the number of current jobs in the
status
command response.[groonga-httpd] Added support for
$request_time
in log.In the previous version, even if we specified the
$request_time
in thelog_format
directive, it was always 0.00.If we specify the
$request_time
, groonga-httpd output the correct time form this version.
[groonga-httpd] Added how to set the
$request_time
in the document.Supported Ubuntu 19.10 (Eoan Ermine)
Supported CentOS 8 (experimental)
The package for CentOS 8 can’t use a part of features(e.g. we can’t use
TokenMecab
and can’t cast to int32 vector from JSON string) for lacking some packages for development.
[tools] Added a script for executing the
index_column_diff
command simply.This script name is index-column-diff-all.sh.
This script extracts index columns form Groonga’s database and execute the
index_column_diff
command to them.
[tools] Added a script for executing
object_inspect
against all objects.This script name is object-inspect-all.sh.
Fixes¶
Fixed a bug that Groonga crash when we specify the value as the first argument of between.[GitHub#1045][Reported by yagisumi]
Thanks¶
yagisumi
Release 9.0.8 - 2019-09-27¶
Improvements¶
[log_reopen] Added a supplementary explanation when we use
groonga-httpd
with 2 or more workers.Improved that Groonga ignores the index being built.
We can get correct search results even if the index is under construction.
However, the search is slow because of Groonga out of use the index to search in this case.
[sub_filter] Added a feature that
sub_filter
executes a sequential search when Groonga is building indexes for the target column or the target column hasn’t indexed.sub_filter
was an error if the above situation in before version.From this version,
sub_filter
returns search results if the above situation.However if the above situation,
sub_filter
is slow. Because it is executed as a sequential search.
[CentOS] Dropped 32-bit package support on CentOS 6.
Fixes¶
[logical_range_filter] Fixed a bug that exception about closing the same object twice occurs when we have enough records and the number of records that unmatch filter search criteria is more than the estimated value of it.
Release 9.0.7 - 2019-08-29¶
Improvements¶
[httpd] Updated bundled nginx to 1.17.3.
Contains security fix for CVE-2019-9511, CVE-2019-9513, and CVE-2019-9516.
Fixes¶
Fixed a bug that Groonga crash when posting lists were huge.
However, this bug almost doesn’t occur by general data. Because posting lists don’t grow bigger so much by them.
Fixed a bug that returns an empty result when we specify
initial
into a stage of a dynamic column and search for using index. [GitHub#683]Fixed a bug that the configure phase didn’t detect libedit despite installing it. [GitHub#1030][Patched by yu]
Fixed a bug that
--offset
and--limit
options didn’t work with--sort_keys
and--slices
options. [clear-code/redmine_full_text_search#70][Reported by a9zawa]Fixed a bug that search result is empty when the result of
select
command is huge. [groonga-dev,04770][Reported by Yutaro Shimamura]Fixed a bug that doesn’t use a suitable index when prefix search and suffix search. [GitHub#1007, PGroonga#GitHub#96][Reported by oknj]
Thanks¶
oknj
Yutaro Shimamura
yu
a9zawa
Release 9.0.6 - 2019-08-05¶
Improvements¶
Added support for Debian 10 (buster).
Fixes¶
[select] Fixed a bug that search is an error when occurring search escalation.
[select] Fixed a bug that may return wrong search results when we use nested equal condition.
[geo_distance_location_rectangle] Fixed an example that has wrong
load
format. [GitHub#1023][Patched by yagisumi][Let’s create micro-blog] Fixed an example that has wrong search results. [GutHub#1024][Patched by yagisumi]
Thanks¶
yagisumi
Release 9.0.5 - 2019-07-30¶
Warning
There are some critical bugs are found in this release. select
command returns wrong search results.
We will release the new version (9.0.6) which fixes the issues.
Please do not use Groonga 9.0.5, and recommends to upgrade to 9.0.6 in the future.
The detail of this issues are explained at http://groonga.org/en/blog/2019/07/30/groonga-9.0.5.html.
Improvements¶
[logical_range_filter] Improved that only apply an optimization when the search target shard is large enough.
This feature reduces that duplicate search result between offset when we use same sort key.
Large enough threshold is 10000 records by default.
[Normalizers] Added new option
unify_to_katakana
forNormalizerNFKC100
.This option normalize hiragana to katakana.
For example,
ゔぁゔぃゔゔぇゔぉ
is normalized toヴァヴィヴヴェヴォ
.
[select] Added drilldowns support as a slices parameter.
[select] Added columns support as a slices parameter.
[select] Improved that we can refer
_score
in the initial stage for slices parameter.[highlight_html], [snippet_html] Improved that extract a keyword also from an expression of before executing a slices when we specify the slices parameter.
Improved that collect scores also from an expression of before executing a slices when we specify the slices parameter.
Stopped add 1 in score automatically when add posting to posting list.
grn_ii_posting_add
is backward incompatible changed by this change. * Caller must increase the score to maintain compatibility.
Added support for index search for nested equal like
XXX.YYY.ZZZ == AAA
.Reduce rehash interval when we use hash table.
This feature improve performance for output result.
Improved to we can add tag prefix in the query log.
We become easy to understand that it is filtered which the condition.
Added support for Apache Arrow 1.0.0.
However, It’s not released this version yet.
Added support for Amazon Linux 2.
Fixes¶
Fixed a bug that vector values of JSON like
"[1, 2, 3]"
are not indexed.Fixed wrong parameter name in
table_create
tests. [GitHub#1000][Patch by yagisumi]Fixed a bug that drilldown label is empty when a drilldown command is executed by
command_version=3
. [GitHub#1001][Reported by yagisumi]Fixed build error for Windows package on MinGW.
Fixed install missing COPYING for Windows package on MinGW.
Fixed a bug that don’t highlight when specifing non-text query as highlight target keyword.
Fixed a bug that broken output of MessagePack format of [object_inspect]. [GitHub#1009][Reported by yagisumi]
Fixed a bug that broken output of MessagePack format of
index_column_diff
. [GitHub#1010][Reported by yagisumi]Fixed a bug that broken output of MessagePack format of [suggest]. [GitHub#1011][Reported by yagisumi]
Fixed a bug that allocate size by realloc isn’t enough when a search for a table of patricia trie and so on. [Reported by Shimadzu Corporation]
Groonga may be crashed by this bug.
Fix a bug that
groonga.repo
is removed when updating 1.5.0 fromgroonga-release
version before 1.5.0-1. [groonga-talk:429][Reported by Josep Sanz]
Thanks¶
yagisumi
Shimadzu Corporation
Josep Sanz
Release 9.0.4 - 2019-06-29¶
Improvements¶
Added support for array literal with multiple elements.
Added support equivalence operation of a vector.
[logical_range_filter] Increase outputting logs into query log.
logical_range_filter
command comes to output a log for below timing.After filtering by
logical_range_filter
.After sorting by
logical_range_filter
.After applying dynamic column.
After output results.
We can see how much has been finished this command by this feature.
[Tokenizers] Added document for
TokenPattern
description.[Tokenizers] Added document for
TokenTable
description.[Tokenizers] Added document for
TokenNgram
description.[grndb] Added output operation log into groonga.log
grndb
command comes to output execution result and execution process.
[grndb] Added support for checking empty files.
We can check if the empty files exist by this feature.
[grndb] Added support new option
--since
We can specify a scope of an inspection.
[grndb] Added document about new option
--since
Bundle RapidJSON
We can use RapidJson as Groonga’s JSON parser partly. (This feature is partly yet)
We can more exactly parse JSON by using this.
Added support for casting to int32 vector from JSON string.
This feature requires RapidJSON.
[query] Added
default_operator
.We can customize operator when “keyword1 keyword2”.
“keyword1 Keyword2” is AND operation in default.
We can change “keyword1 keyword2“‘s operator except AND.
Fixes¶
[optimizer] Fix a bug that execution error when specified multiple filter conditions and like
xxx.yyy=="keyword"
.Added missing LICENSE files in Groonga package for Windows(VC++ version).
Added UCRT runtime into Groonga package for Windows(VC++ version).
[Window function] Fix a memory leak.
This occurs when multiple windows with sort keys are used. [Patched by Takashi Hashida]
Thanks¶
Takashi Hashida
Release 9.0.3 - 2019-05-29¶
Improvements¶
[select] Added more query logs.
select
command comes to output a log for below timing.After sorting by drilldown.
After filter by drilldown.
We can see how much has been finished this command by this feature.
[logical_select] Added more query logs.
logical_select
command comes to output a log for below timing.After making dynamic columns.
After grouping by drilldown.
After sorting by drilldown.
After filter by drilldown.
After sorting by
logical_select
.
We can see how much has been finished this command by this feature.
[logical_select] Improved performance of sort a little when we use
limit
option.[index_column_diff] Improved performance.
We have greatly shortened the execution speed of this command.
[index_column_diff] Improved ignore invalid reference.
[index_column_diff] Added support for duplicated vector element case.
[Normalizers] Added a new Normalizer
NormalizerNFKC121
based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 12.1.[TokenFilters] Added a new TokenFilter
TokenFilterNFKC121
based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 12.1.[grndb] Added a new option
--log-flags
We can specify output items of a log as with groonga executable file.
See [groonga executable file] to know about supported log flags.
[snippet_html] Added a new option for changing a return value when no match by search.
[plugin_unregister] Added support full path of Windows.
Added support for multiline log message.
The multiline log message is easy to read by this feature.
Output key in Groonga’s log when we search by index.
[match_columns parameter] Added a document for indexes with weight.
[logical_range_filter] Added a explanation for
order
parameter.[object_inspect] Added an explanation for new statistics
INDEX_COLUMN_VALUE_STATISTICS_NEXT_PHYSICAL_SEGMENT_ID
andINDEX_COLUMN_VALUE_STATISTICS_N_PHYSICAL_SEGMENTS
.Dropped Ubuntu 14.04 support.
Fixes¶
[index_column_diff] Fixed a bug that too much
remains
are reported.Fixed a build error when we use
--without-onigmo
option. [GitHub#951] [Reported by Tomohiro KATO]Fixed a vulnerability of “CVE: 2019-11675”. [Reported by Wolfgang Hotwagner]
Removed extended path prefix
\\?\
at Windows version of Groonga. [GitHub#958] [Reported by yagisumi]This extended prefix causes a bug that plugin can’t be found correctly.
Thanks¶
Tomohiro KATO
Wolfgang Hotwagner
yagisumi
Release 9.0.2 - 2019-04-29¶
We provide a package for Windows made from VC++ from this release.
We also provide a package for Windows made form MinGW as in the past. However, we will provide it made from VC++ instead of making from MinGW sooner or later.
Improvements¶
[column_create] Added a new flag
INDEX_LARGE
for index column.We can make an index column has space that two times of default by this flag.
However, note that it also uses two times of memory usage.
This flag useful when index target data are large.
Large data must have many records (normally at least 10 millions records) and at least one of the following features.
Index targets are multiple columns
Index table has tokenizer
[object_inspect] Added a new statistics
next_physical_segment_id
andmax_n_physical_segments
for physical segment information.We can confirm usage of index column space and max value of index column space by this information.
[logical_select] Added support for window function over shard.
[logical_range_filter] Added support for window function over shard.
[logical_count] Added support for window function over shard.
We provided a package for Windows made from VC++.
[io_flush] Added a new option
--recursive dependent
We can all of the specified flush target object, child objects, corresponding table of an index column and corresponding index column are flush target objects.
Fixes¶
Fixed “unknown type name ‘bool’” compilation error in some environments.
Fixed a bug that incorrect output number over Int32 by command of execute via mruby (e.g.
logical_select
,logical_range_filter
,logical_count
, etc.). [GitHub#936] [Patch by HashidaTKS]
Thanks¶
HashidaTKS
Release 9.0.1 - 2019-03-29¶
Improvements¶
Added support to acccept null for vector value.
You can use select … –columns[vector].flags COLUMN_VECTOR –columns[vector].value “null”
[dump] Translated document into English.
Added more checks and logging for invalid indexes. It helps to clarify the index related bugs.
Improved an explanation about
GRN_TABLE_SELECT_ENOUGH_FILTERED_RATIO
behavior in news at Release 8.0.6 - 2018-08-29.[select] Added new argument
--load_table
,--load_columns
and--load_values
.You can store a result of
select
in a table that specifying--load_table
.--load_values
option specifies columns of result ofselect
.--load_columns
options specifies columns of table that specifying--load_table
.In this way, you can store values of columns that specifying with
--load_values
into columns that specifying with--load_columns
.
[select] Added documentation about
load_table
,load_columns
andload_values
.[load] Added supoort to display a table of load destination in a query log.
A name of table of load destination display as string in
[]
as below.:000000000000000 load(3): [LoadedLogs][3]
Added a new API:
grn_ii_get_flags()
grn_index_column_diff()
grn_memory_get_usage()
Added
index_column_diff
command to check broken index column. If you want to log progress of command execution, set log level to debug.
Fixes¶
[snippet_html] Changed to return an empty vector for no match.
In such a case, an empty vector
[]
is returned instead ofnull
.
Fixed a warning about possibility of counting threads overflow. In real world, it doesn’t affect user because enourmous number of threads is not used. [GitHub#904]
Fixed build error on macOS [GitHub#909] [Reported by shiro615]
Fixed a stop word handling bug.
This bug occurs when we set the first token as a stop word in our query.
If this bug occurs, our search query isn’t hit.
[Global configurations] Fixed a typo about parameter name of
grn_lock_set_timeout
.Fixed a bug that deleted records may be matched because of updating indexes incorrectly.
It may occure when large number of records is added or deleted.
Fixed a memory leak when
logical_range_filter
returns no records. [GitHub#911] [Patch by HashidaTKS]Fixed a bug that query will not match because of loading data is not normalized correctly. [PGroonga#GitHub#93, GitHub#912,GitHub#913] [Reported by kamicup and dodaisuke]
This bug occurs when load data contains whitespace after KATAKANA and
unify_kana
option is used for normalizer.
Fixed a bug that an indexes is broken during updating indexes.
It may occurs when repeating to add large number of records or delete them for a long term.
Fixed a crash bug that allocated working area is not enough size when updating indexes.
Thanks¶
shiro615
HashidaTKS
kamicup
dodaisuke
Release 9.0.0 - 2019-02-09¶
This is a major version up! But It keeps backward compatibility. You can upgrade to 9.0.0 without rebuilding database.
Improvements¶
[Tokenizers] Added a new tokenizer
TokenPattern
.You can extract tokens by regular expression.
This tokenizer extracts only token that matches the regular expression.
You can also specify multiple patterns of regular expression.
[Tokenizers] Added a new tokenizer
TokenTable
.You can extract tokens by a value of columns of existing a table.
[dump] Added support for dumping binary data.
[select] Added support for similer search against index column.
If you have used multi column index, you can similar search against all source columns by this feature.
[Normalizers] Added new option
remove_blank
forNormalizerNFKC100
.This option remove white spaces.
[groonga executable file] Improve display of thread id in log.
Because It was easy to confuse thread id and process id on Windows version, it made clear which is a thread id or a process id.
Release 8.1.1 - 2019-01-29¶
Improvements¶
[logical_select] Added new argument
--load_table
,--load_columns
and--load_values
.You can store a result of
logical_select
in a table that specifying--load_table
.--load_values
option specifies columns of result oflogical_select
.--load_columns
options specifies columns of table that specifying--load_table
.In this way, you can store values of columns that specifying with
--load_values
into columns that specifying with--load_columns
.
Improve error log when update error of index.
Added more information in the log.
For example, output source buffer and chunk when occur merge of posting lists error.
Also, outputting the log a free space size of a buffer and request size of a buffer when occurs error of allocating a buffer.
[groonga executable file] Added a new option
--log-flags
.We can specify output items of a log of the Groonga.
We can output as below items.
Timestamp
Log message
Location(the location where the log was output)
Process id
Thread id
We can specify prefix as below.
+
This prefix means that “add the flag”.
-
This prefix means that “remove the flag”.
No prefix means that “replace existing flags”.
Specifically, we can specify flags as below.
none
Output nothing into the log.
time
Output a timestamp into the log.
message
Output log messages into the log.
location
Output the location where the log was output( a file name, a line and a function name) and process id.
process_id
Output a process id into the log.
pid
This flag is an alias of
process_id
.
thread_id
Output thread id into the log.
all
This flag specifies all flags except
none
anddefault
flags.
default
Output a timestamp and log messages into the log.
We can also specify multiple log flags by separating flags with
|
.
Fixes¶
Fixed a memory leak when occurs index update error.
[Normalizers] Fixed a bug that stateless normalizers and stateful normalizers return wrong results when we use them at the same time.
Stateless normalizers are below.
unify_kana
unify_kana_case
unify_kana_voiced_sound_mark
unify_hyphen
unify_prolonged_sound_mark
unify_hyphen_and_prolonged_sound_mark
unify_middle_dot
Stateful normalizers are below.
unify_katakana_v_sounds
unify_katakana_bu_sound
unify_to_romaji
Release 8.1.0 - 2018-12-29¶
Improvements¶
[httpd] Updated bundled nginx to 1.15.8.
Fixes¶
Fixed a bug that unlock against DB is always executed after flush when after execute a
io_flush
command.Fixed a bug that
reindex
command doesn’t finish when execute areindex
command against table that has record that has not references.
Release 8.0.9 - 2018-11-29¶
Improvements¶
[Tokenizers] Improved that output a tokenizer name in error message when create tokenizer fail.
[Tokenizers][TokenDelimit] Supported that customizing delimiter of a token.
You can use token other than whitespace as a token of delimiter.
[Tokenizers][TokenDelimit] Added new option
pattern
.You can specify delimiter with regular expression by this option.
[Tokenizers] Added force_prefix_search value to each token information.
“force_prefix” is kept for backward compatibility.
[Token filters] Added built-in token filter
TokenFilterNFKC100
.You can convert katakana to hiragana like NormalizerNFKC100 with a
unify_kana
option.
[Token filters][TokenFilterStem] Added new option
algorithm
.You can also stem language other than English(French, Spanish, Portuguese, Italian, Romanian, German, Dutch, Swedish, Norwegian, Danish, Russian, Finnish) by this option.
[Token filters][TokenFilterStopWord] Added new option
column
.You can specify stop word in a column other than is_stop_word column by this option.
[dump] Supported output options of token filter options.
If you specify a tokenizer like
TokenNgram
orTokenMecab
etc that has options, you can output these options withtable_list
command.
[truncate] Supported a table that it has token filter option.
You can
truncate
even a tabel that it has token filter likeTokenFilterStem
orTokenStopWord
that has options.
[schema] Support output of options of token filter.
[Normalizers] Added new option for
NormalizerNFKC100
thatunify_to_romaji
option.You can normalize hiragana and katakana to romaji by this option.
[query-log][show-condition] Supported “func() > 0” case.
[Windows] Improved that ensure flushing on unmap.
Improved error message on opening input file error.
[httpd] Updated bundled nginx to 1.15.7.
contains security fix for CVE-2018-16843 and CVE-2018-16844.
Fixes¶
Fixed a memory leak when evaluating window function.
[groonga-httpd] Fixed bug that log content may be mixed.
Fixed a bug that generates invalid JSON when occurs error of slice on output_columns.
Fixed a memory leak when getting nested reference vector column value.
Fixed a crash bug when outputting warning logs of index corruption.
Fix a crash bug when temporary vector is reused in expression evaluation.
For example, crash when evaluating an expression that uses a vector as below.
_score = _score + (vector_size(categories) > 0)
Fix a bug that hits a value of vector columns deleted by a delete command.[GitHub PGroonga#85][Reported by dodaisuke]
Thanks¶
dodaisuke
Release 8.0.8 - 2018-10-29¶
Improvements¶
[table_list] Supported output options of default tokenizer.
If you specify a tokenizer like
TokenNgram
orTokenMecab
etc that has options, you can output these options withtable_list
command.
[select] Supported normalizer options in sequential match with
record @ 'query'
.[truncate] Supported a table that it has tokenizer option.
You can
truncate
even a tabel that it has tokenizer likeTokenNgram
orTokenMecab
etc that has options.
[Tokenizers][TokenMecab] Added new option
target_class
This option searches a token of specifying a part-of-speech. For example, you can search only a noun.
This option can also specify subclasses and exclude or add specific part-of-speech of specific using
+
or-
. So, you can also search except a pronoun as below.'TokenMecab("target_class", "-名詞/代名詞", "target_class", "+")'
[io_flush] Supported locking of a database during a
io_flush
.Because Groonga had a problem taht is a crash when deleteing a table of a target of a
io_flush
during execution of aio_flush
.
[cast_loose] Added a new function
cast_loose
.This function cast to a type to specify. If a value to specify can’t cast, it become to a default value to specify.
Added optimize the order of evaluation of a conditional expression.(experimental)
You can active this feature by setting environment value as below.
GRN_EXPR_OPTIMIZE=yes
Supported
(?-mix:XXX)
form for index searchable regular expression. [groonga-dev,04683][Reported by Masatoshi SEKI](?-mix:XXX)
form treats the same as XXX.
[httpd] Updated bundled nginx to 1.15.5.
Supported Ubuntu 18.10 (Cosmic Cuttlefish)
Fixes¶
Fixed a bug that the Groonga GQTP server may fail to accept a new connection. [groonga-dev,04688][Reported by Yutaro Shimamura]
It’s caused when interruption client process without using quit.
Thanks¶
Masatoshi SEKI
Yutaro Shimamura
Release 8.0.7 - 2018-09-29¶
Improvements¶
[Tokenizers][TokenMecab] support outputting metadata of Mecab.
Added new option
include_class
forTokenMecab
.This option outputs
class
andsubclass
in Mecab’s metadata.Added new option
include_reading
forTokenMecab
.This option outputs
reading
in Mecab’s metadata.Added new option
include_form
forTokenMecab
.This option outputs
inflected_type
,inflected_form
andbase_form
in Mecab’s metadata.Added new option
use_reading
forTokenMecab
.This option supports a search by kana.
This option is useful for countermeasure of orthographical variants because it searches with kana.
[plugin] Groonga now can grab plugins from multiple directories.
You can specify multiple directories to
GRN_PLUGINS_PATH
separated with “:” on non Windows, “;” on Windows.GRN_PLUGINS_PATH
has high priority than the existingGRN_PLUGINS_DIR
. Currently, this option is not supported Windows.[Tokenizers][TokenNgram] Added new option
unify_alphabet
forTokenNgram
.If we use
unify_alphabet
asfalse
,TokenNgram
uses bigram tokenize method for ASCII character.[Tokenizers][TokenNgram] Added new option
unify_symbol
forTokenNgram
.TokenNgram("unify_symbol", false)
is same behavior ofTokenBigramSplitSymbol
.[Tokenizers][TokenNgram] Added new option
unify_digit
forTokenNgram
.If we use
unify_digit
asfalse
, If we set false,TokenNgram
uses bigram tokenize method for digits.[httpd] Updated bundled nginx to 1.15.4.
Fixes¶
Fixed wrong score calculations on some cases.
It’s caused when adding, multiplication or division numeric to a bool value.
It’s caused when comparing a scalar and vector columns using
!=
or==
.
Release 8.0.6 - 2018-08-29¶
Improvements¶
[Tokenizers][TokenMecab] add
chunked_tokenize
andchunk_size_threshold
options.[optimizer] support estimation for query family expressions. It will generate more effective execution plan with query family expressions such as
column @ query
,column @~ pattern
and so on.[optimizer] plug-in -> built-in It’s disabled by default for now. We can enable it by defining
GRN_EXPR_OPTIMIZE=yes
environment variable or usingexpression_rewriters
table as before.Enable sequential search for enough filtered case by default. If the current result is enough filtered, sequential search is faster than index search. If the current result has only 1% records of all records in a table and less than 1000 records, sequential search is used even when index search is available.
Cullently, this optimization is applied when search by
==
,>
,<
,>=
, or<=
.When a key of a table that has columns specified by the filter is
ShortText
, you must setNormalizerAuto
to normalizer of the table to apply this optimization.You can disable this feature by
GRN_TABLE_SELECT_ENOUGH_FILTERED_RATIO=0.0
environment variable.[load] improve error message. Table name is included.
[load] add
lock_table
option. If--lock_table yes
is specified,load
locks the target table while updating columns and applying--each
. This option avoidsload
anddelete
conflicts but it’ll reduce load performance.[vector_find] avoid to crash with unsupported modes
Fixes¶
[index] fix a bug that offline index construction for text vector with
HASH_KEY
. It creates index with invalid section ID.Fix a bug that
--match_columns 'index[0] || index[9]'
uses wrong section.[highlighter] fix a wrong highlight bug It’s caused when lexicon is hash table and keyword is less than N of N-gram.
[mruby] fix a bug that real error is hidden. mruby doesn’t support error propagation by no argument raise. https://github.com/mruby/mruby/issues/290
[Tokenizers][TokenNgram loose]: fix a not found bug when query has only loose types.
highlight_html()
with lexicon was also broken.Fix a bug that text->number cast ignores trailing garbage. “0garbage” should be cast error.
Fix an optimization bug for
reference_column >= 'key_value'
case
Release 8.0.5 - 2018-07-29¶
Improvements¶
[Script syntax] Added complementary explain about similar search against Japanese documents. [GitHub#858][Patch by Yasuhiro Horimoto]
[time_classify_day_of_week] Added a new API:
time_classify_day_of_week()
.Suppressed a warning with
-fstack-protector
. Suggested by OBATA Akio.Added a new API:
time_format_iso8601()
.Exported a struct
grn_raw_string
.Added a new API:
grn_obj_clear_option_values()
. It allows you to clear option values on remove (for persistent) / close (for temporary.)[log] Reported index column name for error message
[ii][update][one]
.[httpd] Updated bundled nginx to 1.15.2.
[Ubuntu] Dropped Ubuntu 17.10 (Artful Aardvark) support. It has reached EOL at July 19, 2018.
[Debian GNU/Linux] Dropped jessie support. Debian’s security and release team will no longer produce updates for jessie.
Fixes¶
Fixed returning wrong result after unfinished
/d/load
data by POST.Fixed wrong function call around KyTea.
[grndb] Added a missing label for the
--force-truncate
option.Fixed crash on closing of a database, when a normalizer provided by a plugin (ex.
groonga-normalizer-mysql
) is used with any option.Fixed a bug that normalizer/tokenizer options may be ignored. It’s occurred when the same object ID is reused.
Release 8.0.4 - 2018-06-29¶
Improvements¶
[log] Add sub error for error message
[ii][update][one]
.Added a new API:
grn_highlighter_clear_keywords()
.Added a new predicate:
grn_obj_is_number_family_bulk()
.Added a new API:
grn_plugin_proc_get_value_mode()
.[vector_find] Added a new function
vector_find()
.Suppress memcpy warnings in msgpack.
Updated mruby from 1.0.0 to 1.4.1.
[doc][grn_obj] Added API reference for
grn_obj_is_index_column()
.[windows] Suppress printf format warnings.
[windows] Suppress warning by msgpack.
[grn_obj][Plugin] Added encoding converter. rules:
grn_ctx::errbuf: grn_encoding
grn_logger_put: grn_encoding
mruby: UTF-8
path: locale
[mrb] Added
LocaleOutput
.[windows] Supported converting image path to grn_encoding.
[Tokenizers][TokenMecab] Convert error message encoding.
[window_sum] Supported dynamic column as a target column.
[doc][grn_obj] Added API reference for
grn_obj_is_vector_column()
.[column_create] Added more validations.
1: Full text search index for vector column must have
WITH_SECTION
flag. (Note that TokenDelmit withWITH_POSITION
withoutWITH_SECTION
is permitted. It’s useful pattern for tag search.)2: Full text search index for vector column must not be multi column index. detail: https://github.com/groonga/groonga/commit/08e2456ba35407e3d5172f71a0200fac2a770142
[grndb] Disabled log check temporarily. Because it’s not completed yet.
Fixes¶
[sub_filter] Fixed too much score with a too filtered case.
Fixed build error if KyTea is installed.
[grndb] Fixed output channel.
[query-log][show-condition] Maybe fixed a crash bug.
[highlighter][lexicon] Fixed a not highlighted bug. The keyword wasn’t highlighted if keyword length is less than N (“N”-gram. In many cases, it’s Bigram so “less than 2”).
[windows] Fixed a base path detection bug. If system locale DLL path includes 0x5c (
\
in ASCII) such as “U+8868 CJK UNIFIED IDEOGRAPH-8868” in CP932, the base path detection is buggy.[Tokenizers][TokenNgram] Fixed wrong first character length. It’s caused for “PARENTHESIZED IDEOGRAPH” characters such as “U+3231 PARENTHESIZED IDEOGRAPH STOCK”.
Release 8.0.3 - 2018-05-29¶
Improvements¶
[highlight_html] Support highlight of results of the search by
NormalizerNFKC100
orTokenNgram
.[Tokenizers] Added new option for
TokenNgram
thatreport_source_location option
. This option used when highlighting withhighlight_html
use a lexicon.[Normalizers] Added new option for
NormalizerNFKC100
thatunify_middle_dot option
. This option normalizes middle dot. You can search with or without・
(middle dot) and regardless of・
position.[Normalizers] Added new option for
NormalizerNFKC100
thatunify_katakana_v_sounds option
. This option normalizesヴァヴィヴヴェヴォ
(katakana) toバビブベボ
(katakana). For example, you can searchバイオリン
(violin) inヴァイオリン
(violin).[Normalizers] Added new option for
NormalizerNFKC100
thatunify_katakana_bu_sound option
. This option normalizesヴァヴィヴゥヴェヴォ
(katakana) toブ
(katakana). For example, you can searchセーブル
(katakana) andセーヴル
inセーヴェル
(katakana).[sub_filter] Supported
sub_filter
optimization for the too filter case. this optimize is valid when records are enough narrowed down beforesub_filter
execution as below.[groonga-httpd] Made all workers context address to unique. context address is
#{ID}
of below query log.#{TIME_STAMP}|#{MESSAGE}#{TIME_STAMP}|#{ID}|>#{QUERY}#{TIME_STAMP}|#{ID}|:#{ELAPSED_TIME} #{PROGRESS}#{TIME_STAMP}|#{ID}|<#{ELAPSED_TIME} #{RETURN_CODE}[delete] Added new options that
limit
. You can limit the number of delete records as below example.delete --table Users --filter '_key @^ "b"' --limit 4
[httpd] Updated bundled nginx to 1.14.0.
Fixes¶
[logical_select] Fixed memory leak when an error occurs in filtered dynamic columns.
[logical_count] Fixed memory leak on initial dynamic column error.
[logical_range_filter] Fixed memory leak when an error occurs in dynamic column evaluation.
[Tokenizers] Fixed a bug that the wrong
source_offset
when a loose tokenizing such asloose_symbol
option.[Normalizers] Fixed a bug that FULLWIDTH LATIN CAPITAL LETTERs such as
U+FF21 FULLWIDTH LATIN CAPITAL LETTER A
aren’t normalized to LATIN SMALL LETTERs such asU+0061 LATIN SMALL LETTER A
. If you have been usedNormalizerNFKC100
, you must recreate your indexes.
Release 8.0.2 - 2018-04-29¶
Improvements¶
[grndb][--force-truncate] Improved
grndb recover --force-truncate
option that it can be truncated even if locks are left on the table.[logical_range_filter] Added
sort_keys
option.Added a new function
time_format()
. You can specify time format against a column ofTime
type. You can specify with use format ofstrftime
.[Tokenizers] Support new tokenizer
TokenNgram
. You can change its behavior dynamically via options. Here is a list of available options:n
: “N” of Ngram. For example, “3” for trigram.loose_symbol
: Tokenize keywords including symbols, to be searched by both queries with/without symbols. For example, a keyword “090-1111-2222” will be found by any of “09011112222”, “090”, “1111”, “2222” and “090-1111-2222”.loose_blank
: Tokenize keywords including blanks, to be searched by both queries with/without blanks. For example, a keyword “090 1111 2222” will be found by any of “09011112222”, “090”, “1111”, “2222” and “090 1111 2222”.remove_blank
: Tokenize keywords including blanks, to be searched by queries without blanks. For example, a keyword “090 1111 2222” will be found by any of “09011112222”, “090”, “1111” or “2222”. Note that the keyword won’t be found by a query including blanks like “090 1111 2222”.
[Normalizers] Support new normalizer “NormalizerNFKC100” based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 10.0.
[Normalizers] Support options for “NormalizerNFKC51” and “NormalizerNFKC100” normalizers. You can change their behavior dynamically. Here is a list of available options:
unify_kana
: Same pronounced characters in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character.unify_kana_case
: Large and small versions of same letters in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character.unify_kana_voiced_sound_mark
: Letters with/without voiced sound mark and semi voiced sound mark in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character.unify_hyphen
: The characters like hyphen are regarded as the hyphen.unify_prolonged_sound_mark
: The characters like prolonged sound mark are regarded as the prolonged sound mark.unify_hyphen_and_prolonged_sound_mark
: The characters like hyphen and prolonged sound mark are regarded as the hyphen.
[dump] Support output of tokenizer’s options and normalizer’s options. Groonga 8.0.1 and earlier versions cannot import dump including options for tokenizers or normalizers generated by Groonga 8.0.2 or later, and it will occurs error due to unsupported information.
[schema] Support output of tokenizer’s options and normalizer’s options. Groonga 8.0.1 and earlier versions cannot import schema including options for tokenizers or normalizers generated by Groonga 8.0.2 or later, and it will occurs error due to unsupported information.
Supported Ubuntu 18.04 (Bionic Beaver)
Fixes¶
Fixed a bug that unexpected record is matched with space only query. [groonga-dev,04609][Reported by satouyuzh]
Fixed a bug that wrong scorer may be used. It’s caused when multiple scorers are used as below.
--match_columns 'title || scorer_tf_at_most(content, 2.0)'
.Fixed a bug that it may also take so much time to change “thread_limit”.
Thanks¶
satouyuzh
Release 8.0.1 - 2018-03-29¶
Improvements¶
[Log] Show
filter
conditions in query log. It’s disabled by default. To enable it, you need to set an environment variableGRN_QUERY_LOG_SHOW_CONDITION=yes
.Install
*.pdb
into the directory where*.dll
and*.exe
are installed.[logical_count] Support
filtered
stage dynamic columns.[logical_count] [post_filter] Added a new filter timing. It’s executed after
filtered
stage columns are generated.[logical_select] [post_filter] Added a new filter timing. It’s executed after
filtered
stage columns are generated.Support LZ4/Zstd/zlib compression for vector data.
Support alias to accessor such as
_key
.[logical_range_filter] Optimize window function for large result set. If we find enough matched records, we don’t apply window function to the remaining windows.
TODO: Disable this optimization for small result set if its overhead is not negligible. The overhead is not evaluated yet.
[select] Added
match_escalation
parameter. You can force to enable match escalation by--match_escalation yes
. It’s stronger than--match_escalation_threshold 99999....999
because--match_escalation yes
also works withSOME_CONDITIONS && column @ 'query'
.--match_escalation_threshold
isn’t used in this case.The default is
--match_escalation auto
. It doesn’t change the current behavior.You can disable match escalation by
--match_escalation no
. It’s the same as--match_escalation_threshold -1
.[httpd] Updated bundled nginx to 1.13.10.
Fixes¶
Fixed memory leak that occurs when a prefix query doesn’t match any token. [GitHub#820][Patch by Naoya Murakami]
Fixed a bug that a cache for different databases is used when multiple databases are opened in the same process.
Fixed a bug that a wrong index is constructed. This occurs only when the source of a column is a vector column and
WITH_SECTION
isn’t specified.Fixed a bug that a constant value can overflow or underflow in comparison (>,>=,<,<=,==,!=).
Thanks¶
Naoya Murakami
Release 8.0.0 - 2018-02-09¶
This is a major version up! But It keeps backward compatibility. You can upgrade to 8.0.0 without rebuilding database.
Improvements¶
[select] Added
--drilldown_adjuster
and--drilldowns[LABEL].adjuster
. You can adjust score against result of drilldown.[Online index construction] Changed environment variable name
GRN_II_REDUCE_EXPIRE_ENABLE
toGRN_II_REDUCE_EXPIRE_THRESHOLD
.GRN_II_REDUCE_EXPIRE_THRESHOLD=0 == GRN_II_REDUCE_EXPIRE_ENABLE=no
.GRN_II_REDUCE_EXPIRE_THRESHOLD=-1
usesii->chunk->max_map_seg / 2
as threshold.GRN_II_REDUCE_EXPIRE_THRESHOLD > 0
usesMIN(ii->chunk->max_map_seg / 2, GRN_II_REDUCE_EXPIRE_THRESHOLD)
as threshold.GRN_II_REDUCE_EXPIRE_THRESHOLD=32
is the default.[between] Accept
between()
without borders. If the number of arguments passed tobetween()
is 3, the 2nd and 3rd arguments are handled as the inclusive edges. [GitHub#685]
Fixes¶
Fixed a memory leak for normal hash table. [GitHub:mroonga/mroonga#190][Reported by fuku1]
Fix a memory leak for normal array.
[select] Stopped to cache when
output_columns
uses not stable function.[Windows] Fixed wrong value report on
WSASend
error.
Thanks¶
fuku1
Release 7.1.1 - 2018-01-29¶
Improvements¶
[Ubuntu] Dropped Ubuntu 17.04 (Zesty Zapus) support. It has reached EOL at Jan 13, 2018.
Added quorum match support. You can use quorum match in both script syntax and query syntax. [groonga-talk,385][Suggested by 付超群]
TODO: Add documents for quorum match syntax and link to them.
Added custom similarity threshold support in script syntax. You can use custom similarity threshold in script syntax.
TODO: Add document for the syntax and link to it.
[grndb][--force-lock-clear] Added
--force-lock-clear
option. With this option,grndb
forces to clear locks of database, tables and data columns. You can use your database again even if locks are remained in database, tables and data columns.But this option very risky. Normally, you should not use it. If your database is broken, your database is still broken. This option just ignores locks.
[load] Added surrogate pairs support in escape syntax. For example,
\uD83C\uDF7A
is processed as🍺
.[Windows] Changed to use sparse file on Windows. It reduces disk space and there are no performance demerit.
[Online index construction] Added
GRN_II_REDUCE_EXPIRE_THRESHOLD
environment variable to control when memory maps are expired in index column. It’s-1
by default. It means that expire timing is depends on index column size. If index column is smaller, expire timing is more. If index column is larger, expire timing is less.You can use the previous behavior by
0
. It means that Groonga always tries to expire.[logical_range_filter] [post_filter] Added a new filter timing. It’s executed after
filtered
stage generated columns are generated.
Fixes¶
Reduced resource usage for creating index for reference vector. [GitHub#806][Reported by Naoya Murakami]
[table_create] Fixed a bug that a table is created even when
token_filters
is invalid. [GitHub#266]
Thanks¶
付超群
Naoya Murakami
Release 7.1.0 - 2017-12-29¶
Improvements¶
[load] Improved the
load
’s query-log format. Added detail below items in theload
’s query-log.outputs number of loaded records.
outputs number of error records and columns.
outputs number of total records.
[logical_count] Improved the
logical_count
’s query-log format. Added detail below items in thelogical_count
’s query-log.outputs number of count.
[logical_select] Improve the
logical_select
’s query-log format. Added detail below items in thelogical_select
’s query-log.log N outputs.
outputs plain drilldown.
outputs labeled drilldown.
outputs selected in each shard.
use “[…]” for target information.
[delete] Improved the
delete
’s query-log format. Added detail below items in thedelete
’s query-log.outputs number of deleted and error records.
outputs number of rest number of records.
[Groonga HTTP server] The server executed by
groonga -s
ensure stopping by C-c.Used
NaN
andInfinity
,-Infinity
instead of Lisp representations(#<nan>
and#i1/0
,#-i1/0
).Supported vector for drilldown calc target.
Partially supported keyword extraction from regexp search. It enables
highlight_html
andsnippet_html
for regexp search. [GitHub#787][Reported by takagi01][bulk] Reduced the number of
realloc()
.grn_bulk_*()
API supports it.It improves performance for large output case on Windows. For example, it causes 100x faster for 100MB over output.
Because
realloc()
is heavy on Windows.Enabled
GRN_II_OVERLAP_TOKEN_SKIP_ENABLE
only when its value is “yes”.Deprecated
GRN_NGRAM_TOKENIZER_REMOVE_BLANK_DISABLE
. UseGRN_NGRAM_TOKENIZER_REMOVE_BLANK_ENABLE=no
instead.Added new function
index_column_source_records
. It gets source records of index column.[Patch by Naoya Murakami][select] Supported negative “offset” for “offset + size - limit” >= 0
Added
grn_column_cache
. It’ll improve performance for getter of fixed size column value.[groonga executable file] Added
--listen-backlog option
. You can customizelisten(2)
’s backlog by this option.[httpd] Updated bundled nginx to 1.13.8.
Fixes¶
Fixed a memory leak in
highlight_full
Fixed a crash bug by early unlink It’s not caused by instruction in
grn_expr_parse()
but it’s caused when libgroonga user such as Mroonga uses the following instructions:grn_expr_append_const("_id")
grn_expr_append_op(GRN_OP_GET_VALUE)
Thanks¶
takagi01
Naoya Murakami
Release 7.0.9 - 2017-11-29¶
Improvements¶
Supported newer version of Apache Arrow. In this release, 0.8.0 or later is required for Apache Arrow support.
[sharding] Added new API for dynamic columns.
Groonga::LabeledArguments
[sharding] Added convenient
Table#select_all
method.[logical_range_filter] Supported dynamic columns. Note that
initial
andfiltered
stage are only supported.[logical_range_filter] Added documentation about
cache
parameter and dynamic columns.[logical_count] Supported dynamic columns. Note that
initial
stage is only supported.[logical_count] Added documentation about named parameters.
[select] Supported
--match_columns _key
without index.[in_values] Supported to specify more than 126 values. [GitHub#760] [GitHub#781] [groonga-dev,04449] [Reported by Murata Satoshi]
[httpd] Updated bundled nginx to 1.13.7.
Fixes¶
[httpd] Fixed build error when old Groonga is already installed. [GitHub#775] [Reported by myamanishi3]
[in_values] Fixed a bug that
in_values
with too many arguments can cause a crash. This bug is found during supporting more than 126 values. [GitHub#780][cmake] Fixed LZ4 and MessagePack detection. [Reported by Sergei Golubchik]
[Offline index construction] Fixed a bug that offline index construction for vector column consumes unnecessary resources. If you have a log of elements in one vector column and many records, Groonga will crash. [groonga-dev,04533][Reported by Toshio Uchiyama]
Thanks¶
Murata Satoshi
myamanishi3
Sergei Golubchik
Toshio Uchiyama
Release 7.0.8 - 2017-10-29¶
Improvements¶
[windows] Supported backtrace on crash. This feature not only function call history but also source filename and number of lines can be displayed as much as possible. This feature makes problem solving easier.
Supported
( )
(empty block) only query (--query "( )"
) forQUERY_NO_SYNTAX_ERROR
. In the previous version, it caused an error. [GitHub#767]Supported
(+)
(only and block) only query (--query "(+)"
) forQUERY_NO_SYNTAX_ERROR
. In the previous version, it caused an error. [GitHub#767]Supported
~foo
(starting with “~”) query (--query "~y"
) forQUERY_NO_SYNTAX_ERROR
. In the previous version, it caused an error. [GitHub#767]Modified log level of
expired
frominfo
todebug
.2017-10-29 14:05:34.123456|i| <0000000012345678:0> expired i=000000000B123456 max=10 (2/2)
This message is logged when memory mapped area for index is unmapped. Thus, this log message is useful information for debugging, in other words, as it is unnecessary information in normal operation, we changed log level frominfo
todebug
.Supported Ubuntu 17.10 (Artful Aardvark)
Fixes¶
[dat] Fixed a bug that large file is created unexpectedly in the worst case during database expansion process. This bug may occurs when you create/delete index columns so frequently. In 7.0.7 release, a related bug was fixed - “
table_create
command fails when there are many deleted keys”, but it turns out that it is not enough in the worst case.[logical_select] Fixed a bug that when
offset
andlimit
were applied to multiple shards at the same time, there is a case that it returns a fewer number of records unexpectedly.
Release 7.0.7 - 2017-09-29¶
Improvements¶
Supported
+
only query (--query "+"
) forQUERY_NO_SYNTAX_ERROR
. In the previous version, it caused an error.[httpd] Updated bundled nginx to 1.13.5.
[dump] Added the default argument values to the syntax section.
[Command version] Supported
--default-command-version 3
.Supported caching select result with function call. Now, most of existing functions supports this feature. There are two exception, when
now()
andrand()
are used in query, select result will not cached. Because of this default behavior change, new APIs are introduced.grn_proc_set_is_stable()
grn_proc_is_stable()
Note that if you add a new function that may return different result with the same argument, you must call
grn_proc_is_stable(ctx, proc, GRN_FALSE)
. If you don’t call it, select result with the function call is cached and is wrong result for multiple requests.
Fixes¶
[windows] Fixed to clean up file handle correctly on failure when
database_unmap
is executed. There is a case that critical section is not initialized when request is canceled before executingdatabase_unmap
. In such a case, it caused a crach bug.[Tokenizers] Fixed document for wrong tokenizer names. It should be
TokenBigramIgnoreBlankSplitSymbolAlpha
andTokenBigramIgnoreBlankSplitSymbolAlphaDigit
.Changed not to keep created empty file on error.
In the previous versions, there is a case that empty file keeps remain on error.
Here is the senario to reproduce:
creating new file by grn_fileinfo_open succeeds
mapping file by DO_MAP() is failed
In such a case, it causes an another error such as “already file exists” because of the file which isn’t under control. so these file should be removed during cleanup process.
Fixed a bug that Groonga may be crashed when search process is executed during executing many updates in a short time.
[table_create] Fixed a bug that
table_create
failed when there are many deleted keys.
Release 7.0.6 - 2017-08-29¶
Improvements¶
Supported prefix match search using multiple indexes. (e.g.
--query "Foo*" --match_columns "TITLE_INDEX_COLUMN||BODY_INDEX_COLUMN"
).[window_count] Supported
window_count
function to add count data to result set. It is useful to analyze or filter additionally.Added the following API
grn_obj_get_disk_usage():
GRN_EXPR_QUERY_NO_SYNTAX_ERROR
grn_expr_syntax_expand_query_by_table()
grn_table_find_reference_object()
[object_inspect] Supported to show disk usage about specified object.
Supported falling back query parse feature. It is enabled when
QUERY_NO_SYNTAX_ERROR
flag is set toquery_flags
. (this feature is disabled by default). If this flag is set, query never causes syntax error. For example, “A +” is parsed and escaped automatically into “A +”. This behavior is useful when application uses user input directly and doesn’t want to show syntax error to user and in log.Supported to adjust score for term in query. “>”, “<”, and “~” operators are supported. For example, “>Groonga” increments score of “Groonga”, “<Groonga” decrements score of “Groonga”. “~Groonga” decreases score of matched document in the current search result. “~” operator doesn’t change search result itself.
Improved performance to remove table.
thread_limit=1
is not needed for it. The process about checking referenced table existence is done without opening objects. As a result, performance is improved.[httpd] Updated bundled nginx to 1.13.4.
Fixes¶
[dump] Fixed a bug that the 7-th unnamed parameter for –sort_hash_table option is ignored.
[schema] Fixed a typo in command line parameter name. It should be source instead of sources. [groonga-dev,04449] [Reported by murata satoshi]
[ruby_eval] Fixed crash when ruby_eval returned syntax error. [GitHub#751] [Patch by ryo-pinus]
Thanks¶
murata satoshi
ryo-pinus
Release 7.0.5 - 2017-07-29¶
Improvements¶
[httpd] Updated bundled nginx to 1.13.3. Note that this version contains security fix for CVE-2017-7529.
[load] Supported to load the value of max UInt64. In the previous versions, max UInt64 value is converted into 0 unexpectedlly.
Added the following API
grn_window_get_size()
[GitHub#725] [Patch by Naoya Murakami]
[math_abs] Supported
math_abs()
function to calculate absolute value. [GitHub#721]Supported to make
grn_default_logger_set_path()
andgrn_default_query_logger_set_path()
thread safe.[windows] Updated bundled pcre library to 8.41.
[normalize] Improved not to output redundant empty string
""
on error. [GitHub#730][functions/time] Supported to show error message when division by zero was happened. [GitHub#733] [Patch by Naoya Murakami]
[windows] Changed to map
ERROR_NO_SYSTEM_RESOURCES
toGRN_RESOURCE_TEMPORARILY_UNAVAILABLE
. In the previous versions, it returnsrc=-1
as a result code. It is not helpful to investigate what actually happened. With this fix, it returnsrc=-12
.[functions/min][functions/max] Supported vector column. Now you need not to care scalar column or vector column to use. [GitHub#735] [Patch by Naoya Murakami]
[dump] Supported
--sort_hash_table
option to sort by_key
for hash table. Specify--sort_hash_table yes
to use it.[between] Supported to specify index column. [GitHub#740] [Patch by Naoya Murakami]
[load] Supported Apache Arrow 0.5.0 or later.
[How to analyze error messages] Added howto article to analyze error message in Groonga.
[Debian GNU/Linux] Updated required package list to build from source.
[Ubuntu] Dropped Ubuntu 16.10 (Yakkety Yak) support. It has reached EOL at July 20, 2017.
Fixes¶
Fixed to construct correct fulltext indexes against vector column which type belongs to text family (
`ShortText
and so on). This fix resolves that fulltext search doesn’t work well against text vector column after updating indexes. [GitHub#494][thread_limit] Fixed a bug that deadlock occurs when thread_limit?max=1 is requested at once.
[groonga-httpd] Fixed a mismatch path of pid file between default one and restart command assumed. This mismatch blocked restarting groonga-httpd. [GitHub#743] [Reported by sozaki]
Thanks¶
Naoya Murakami
Release 7.0.4 - 2017-06-29¶
Improvements¶
Added physical create/delete operation logs to identify problem for troubleshooting. [GitHub#700,#701]
[in_records] Improved performance for fixed sized column. It may reduce 50% execution time.
[grndb] Added
--log-path
option. [GitHub#702,#703][grndb] Added
--log-level
option. [GitHub#706,#708]Added the following API
grn_operator_to_exec_func()
grn_obj_is_corrupt()
Improved performance for “FIXED_SIZE_COLUMN OP CONSTANT”. Supported operators are:
==
,!=
,<
,>
,<=
and>=
.Improved performance for “COLUMN OP VALUE && COLUMN OP VALUE && …”.
[grndb] Supported corrupted object detection with
grndb check
.[io_flush] Supported
--only_opened
option which enables to flush only opened database objects.[grndb] Supported to detect/delete orphan “inspect” object. The orphaned “inspect” object is created by renamed command name from
inspect
toobject_inspect
.
Fixes¶
[rpm][centos] Fixed unexpected macro expansion problem with customized build. This bug only affects when rebuilding Groonga SRPM with customized
additional_configure_options
parameter in spec file.Fixed missing null check for
grn_table_setoperation()
. There is a possibility of crash bug when indexes are broken. [GitHub#699]
Thanks¶
Release 7.0.3 - 2017-05-29¶
Improvements¶
[select] Add document about Full text search with specific index name.
[index] Supported to log warning message which record causes posting list overflows.
[cmake] Supported linking lz4 in embedded static library build. [Original patch by Sergei Golubchik]
[delete] Supported to cancel.
[httpd] Updated bundled nginx to 1.13.0
Exported the following API
grn_plugin_proc_get_caller()
Added index column related function and selector.
Added new selector: index_column_df_ratio_between()
Added new function: index_column_df_ratio()
Fixes¶
[delete] Fixed a bug that error isn’t cleared correctly. It affects to following deletions so that it causes unexpected behavior.
[windows] Fixed a bug that IO version is not detected correctly when the file is opened with
O_CREAT
flag.[vector_slice] Fixed a bug that non 4 bytes vector columns can’t slice. [GitHub#695] [Patch by Naoya Murakami]
Fixed a bug that non 4 bytes fixed vector column can’t sequential match by specifying index of vector. [GitHub#696] [Patch by Naoya Murakami]
[logical_select] Fixed a bug that “argument out of range” occurs when setting last day of month to the min. [GitHub#698]
Thanks¶
Sergei Golubchik
Naoya Murakami
Release 7.0.2 - 2017-04-29¶
Improvements¶
[logical_select] Supported multiple drilldowns[${LABEL}].columns[${NAME}].window.sort_keys and drilldowns[${LABEL}].columns[${NAME}].window.group_keys.
[windows] Updated bundled LZ4 to 1.7.5.
[cache] Supported persistent cache feature.
[log_level] Update English documentation.
Added the following APIs:
grn_set_default_cache_base_path()
grn_get_default_cache_base_path()
grn_persistent_cache_open()
grn_cache_default_open()
[
groonga --cache-base-path
] Added a new option to use persistent cache.[groonga-httpd] [groonga_cache_base_path] Added new configuration to use persistent cache.
[windows] Updated bundled msgpack to 2.1.1.
[object_inspect] Supported not only column inspection, but also index column statistics.
Supported index search for “
.*
” regexp pattern. This feature is enabled by default. SetGRN_SCAN_INFO_REGEXP_DOT_ASTERISK_ENABLE=no
environment variable to disable this feature.[in_records] Added function to use an existing table as condition patterns.
[Ubuntu] Dropped Ubuntu 12.04 (Precise Pangolin) support because of EOL.
Fixes¶
[logical_select] Fixed a bug that wrong cache is used. This bug was occurred when dynamic column parameter is used.
[logical_select] Fixed a bug that dynamic columns aren’t created. It’s occurred when no match case.
[reindex] Fixed a bug that data is lost by reindex. [GitHub#646]
[httpd] Fixed a bug that response of quit and shutdown is broken JSON when worker is running as another user. [GitHub ranguba/groonga-client#12]
Release 7.0.1 - 2017-03-29¶
Improvements¶
Exported the following API
grn_ii_cursor_next_pos()
grn_table_apply_expr()
grn_obj_is_data_column()
grn_obj_is_expr()
grn_obj_is_scalar_column()
[dump] Supported to dump weight reference vector.
[load] Supported to load
array<object>
style weight vector column. The example ofarray<object>
style is:[{"key1": weight1}, {"key2": weight2}]
.Supported to search
!(XXX OPERATOR VALUE)
by index. Supported operator is not only>
but also>=
,<
,<=
,==
and!=
.Supported index search for “!(column == CONSTANT)”. The example in this case is:
!(column == 29)
and so on.Supported more “!” optimization in the following patterns.
!(column @ "X") && (column @ "Y")
(column @ "Y") && !(column @ "X")
(column @ "Y") &! !(column @ "X")
Supported to search
XXX || !(column @ "xxx")
by index.[dump] Changed to use
'{"x": 1, "y": 2}'
style for not referenced weight vector. This change doesn’t affect to old Groonga because it already supports one.[experimental] Supported
GRN_ORDER_BY_ESTIMATED_SIZE_ENABLE
environment variable. This variable controls whether query optimization which is based on estimated size is applied or not. This feature is disabled by default. SetGRN_ORDER_BY_ESTIMATED_SIZE_ENABLE=yes
if you want to try it.[select] Added query log for
columns
,drilldown
evaluation.[select] Changed query log format for
drilldown
. This is backward incompatible change, but it only affects users who convert query log by own programs.[table_remove] Reduced temporary memory usage. It’s enabled when the number of max threads is 0.
[select]
columns[LABEL](N)
is used for query log format instead ofcolumns(N)[LABEL]
..[Query expansion] Updated example to use vector column because it is recommended way. [Reported by Gurunavi, Inc]
Supported to detect canceled request while locking. It fixes the problem that
request_cancel
is ignored unexpectedly while locking.[logical_select] Supported
initial
andfiltered
stage dynamic columns. The examples are:--columns[LABEL].stage initial
or--columns[LABEL].stage filtered
.[logical_select] Supported
match_columns
,query
anddrilldown_filter
option.[highlight_html] Supported similar search.
[logical_select] Supported
initial
and stage dynamic columns in labeled drilldown. The example is:--drilldowns[LABEL].stage initial
.[logical_select] Supported window function in dynamic column.
[select] Added documentation about dynamic columns.
[Window function] Added section about window functions.
[CentOS] Dropped CentOS 5 support because of EOL.
[httpd] Updated bundled nginx to 1.11.12
Supported to disable AND match optimization by environment variable. You can disable this feature by
GRN_TABLE_SELECT_AND_MIN_SKIP_ENABLE=no
. This feature is enable by default.[vector_new] Added a new function to create a new vector.
[select] Added documentation about
drilldown_filter
.
Fixes¶
[lock_clear] Fixed a crash bug against temporary database.
Fixed a problem that dynamically updated index size was increased for natural language since Grooonga 6.1.4.
[select] Fixed a bug that “A && B.C @ X” may not return records that should be matched.
Fixed a conflict with
grn_io_flush()
andgrn_io_expire()
. Without this change, ifio_flush
andload
command are executed simultaneously in specific timing, it causes a crash bug by access violation.[logical_table_remove] Fixed a crash bug when the max number of threads is 1.
Thanks¶
Gurunavi, Inc.
Release 7.0.0 - 2017-02-09¶
Improvements¶
[in_values] Supported sequential search for reference vector column. [Patch by Naoya Murakami] [GitHub#629]
[select] Changed to report error instead of ignoring on invalid
drilldown[LABEL].sort_keys
.[select] Removed needless metadata updates on DB. It reduces the case that database lock remains even though
select
command is executed. [Reported by aomi-n][lock_clear] Changed to clear metadata lock by lock_clear against DB.
[CentOS] Enabled EPEL by default to install Groonga on Amazon Linux.
[query] Supported “@X” style in script syntax for prefix(“@^”), suffix(“@$”), regexp(“@^”) search.
[query] Added documentation about available list of mode. The default mode is
MATCH
(“@”) mode which executes full text search.[rpm][centos] Supported groonga-token-filter-stem package which provides stemming feature by
TokenFilterStem
token filter on CentOS 7. [GitHub#633] [Reported by Tim Bellefleur][window_record_number] Marked
record_number
as deprecated. Usewindow_record_number
instead.record_number
is still available for backward compatibility.[window_sum] Added
window_sum
window function. It’s similar behavior to window function sum() on PostgreSQL.Supported to construct offline indexing with in-memory (temporary)
TABLE_DAT_KEY
table. [GitHub#623] [Reported by Naoya Murakami][onigmo] Updated bundled Onigmo to 6.1.1.
Supported
columns[LABEL].window.group_keys
. It’s used to apply window function for every group.[load] Supported to report error on invalid key. It enables you to detect mismatch type of key.
[load] Supported
--output_errors yes
option. If you specify “yes”, you can get errors for each load failed record. Note that this feature requires command version 3.[load] Improve error message on table key cast failure. Instead of “cast failed”, type of table key and target type of table key are also contained in error message.
[httpd] Updated bundled nginx to 1.11.9.
Fixes¶
Fixed a bug that nonexistent sort keys for
drilldowns[LABEL]
orslices[LABEL]
causes invalid JSON parse error. [Patch by Naoya Murakami] [GitHub#627]Fixed a bug that access to nonexistent sub records for group causes a crash. For example, This bug affects the case when you use
drilldowns[LABEL].sort_keys _sum
without specifyingcalc_types
. [Patch by Naoya Murakami] [GitHub#625]Fixed a crash bug when tokenizer has an error. It’s caused when tokenizer and token filter are registered and tokenizer has an error.
[window_record_number] Fixed a bug that arguments for window function is not correctly passed. [GitHub#634][Patch by Naoya Murakami]
Thanks¶
Naoya Murakami
aomi-n
The old releases¶
- News - 6.x
- Release 6.1.5 - 2017-01-23
- Release 6.1.4 - 2017-01-18
- Release 6.1.3 - 2017-01-06
- Release 6.1.2 - 2016-12-31
- Release 6.1.1 - 2016-11-29
- Release 6.1.0 - 2016-10-29
- Release 6.0.9 - 2016-09-29
- Release 6.0.8 - 2016-08-29
- Release 6.0.7 - 2016-07-29
- Release 6.0.5 - 2016-06-29
- Release 6.0.4 - 2016-06-06
- Release 6.0.3 - 2016-05-29
- Release 6.0.2 - 2016-04-29
- Release 6.0.1 - 2016-03-29
- Release 6.0.0 - 2016-02-29
- News - 5.x
- Release 5.1.2 - 2016-01-29
- Release 5.1.1 - 2015-12-29
- Release 5.1.0 - 2015-11-29
- Release 5.0.9 - 2015-10-29
- Release 5.0.8 - 2015-09-29
- Release 5.0.7 - 2015-08-31
- Release 5.0.6 - 2015-07-29
- Release 5.0.5 - 2015-06-29
- Release 5.0.4 - 2015-05-29
- Release 5.0.3 - 2015-04-29
- Release 5.0.2 - 2015-03-31
- Release 5.0.1 - 2015-03-29
- Release 5.0.0 - 2015-02-09
- News - 4.x
- Release 4.1.1 - 2015-01-29
- Release 4.1.0 - 2015-01-09
- Release 4.0.9 - 2014-12-29
- Release 4.0.8 - 2014-11-29
- Release 4.0.7 - 2014-10-29
- Release 4.0.6 - 2014-09-29
- Release 4.0.5 - 2014-08-29
- Release 4.0.4 - 2014-07-29
- Release 4.0.3 - 2014-06-29
- Release 4.0.2 - 2014-05-29
- Release 4.0.1 - 2014-03-29
- Release 4.0.0 - 2014-02-09
- News - 3.x
- Release 3.1.2 - 2014-01-29
- Release 3.1.1 - 2013-12-29
- Release 3.1.0 - 2013-11-29
- Release 3.0.9 - 2013-10-29
- Release 3.0.8 - 2013-09-29
- Release 3.0.7 - 2013-08-29
- Release 3.0.6 - 2013-07-29
- Release 3.0.5 - 2013-06-29
- Release 3.0.4 - 2013-05-29
- Release 3.0.3 - 2013-04-29
- Release 3.0.2 - 2013-03-29
- Release 3.0.1 - 2013-02-28
- Release 3.0.0 - 2013-02-09
- News - 2.x
- Release 2.1.2 - 2013-01-29
- Release 2.1.1 - 2012-12-29
- Release 2.1.0 - 2012-12-29
- Release 2.0.9 - 2012-11-29
- Release 2.0.8 - 2012-10-29
- Release 2.0.7 - 2012-09-29
- Release 2.0.6 - 2012-08-29
- Release 2.0.5 - 2012-07-29
- Release 2.0.4 - 2012-06-29
- Release 2.0.3 - 2012-05-29
- Release 2.0.2 - 2012-04-29
- Release 2.0.1 - 2012-03-29
- Release 2.0.0 - 2012-02-29
- News - 1.3.x
- News - 1.2.x
- バージョン1.1.xのお知らせ
- バージョン1.0.xのお知らせ
- バージョン0.xのお知らせ
- News in Senna period