2024-07-04
PGroonga (fast full text search module for PostgreSQL) 3.2.1 has been released
PGroonga 3.2.1 has been released! PGroonga makes PostgreSQL fast full text search for all languages.
Improvements
Fixes
How to upgrade
If you're using PGroonga 2.0.0 or later, you can upgrade by steps in "Compatible case" in Upgrade document.
If you're using PGroonga 1.Y.Z, you can upgrade by steps in "Incompatible case" in Upgrade document.
Support service
If you need commercial support for PGroonga, contact us.
Conclusion
Try PGroonga when you want to perform fast full text search against all languages on PostgreSQL!
2024-07-04
Groonga 14.0.5 has been released
Groonga 14.0.5 has been released!
How to install: Install
Changes
Here are important changes in this release:
Improvements
-
Added a new feature that objects(table or column) as remove as possible.
The crash safe feature of PGroonga will use this feature mainly.
PGroonga will apply PGroonga’s WAL to standby database automatically by using Custom WAL Resource Managers
.
However, when PGroonga use Custom WAL Resource Managers
, all replications are stop if PGroonga fail application of PGroonga’s WAL due to break Groonga’s object.
So, if broken objects exist in database, Groonga will try as remove as possible objects by using this feature.
Fixes
-
[query()] Fixed a bug that the order of evaluation of 'A || query("...", "B C")'
is wrong
Here is occurrence condition of this problem.
- We use
OR
search and query()
.
- We use
AND
search in query()
.
- The order of condition expression is
'A || query("...", "B C")'
.
So, this problem doesn't occur if we use only query()
or we don't use AND
search in query()
.
We expect that {"name": "Alice", "memo": "Groonga user"}
is hit in the following example.
However, if this problem occurred, the following query had not been hit.
table_create Users TABLE_NO_KEY
column_create Users name COLUMN_SCALAR ShortText
column_create Users memo COLUMN_SCALAR ShortText
load --table Users
[
{"name": "Alice", "memo": "Groonga user"},
{"name": "Bob", "memo": "Rroonga user"}
]
select Users \
--output_columns 'name, memo, _score' \
--filter 'memo @ "Groonga" || query("name", "Bob Rroonga")'
[[0,0.0,0.0],[[[0],[["name","ShortText"],["memo","ShortText"],["_score","Int32"]]]]]
After the fix, {"name": "Alice", "memo": "Groonga user"}
is hit such as the following example.
select Users \
--output_columns 'name, memo, _score' \
--filter 'memo @ "Groonga" || query("name", "Bob Rroonga")'
[
[
0,
1719376617.537505,
0.002481460571289062
],
[
[
[
1
],
[
[
"name",
"ShortText"
],
[
"memo",
"ShortText"
],
[
"_score",
"Int32"
]
],
[
"Alice",
"Groonga user",
1
]
]
]
]
-
[select] Fixed a bug that a condition that evaluate prefix search in advance such as --query "A* OR B"
returned wrong search result
This problem may occur when prefix search evaluate in advance.
This problem doesn't occur a condition that evaluate prefix search in the end such as --query A OR B*
.
If this problem occur, the Bo
and the li
of --query "Bo* OR li"
are evaluated as a prefix search.
As a result, The following query does not hit. Because li
is evaluated as a prefix search as mentioned above.
table_create Users TABLE_NO_KEY
column_create Users name COLUMN_SCALAR ShortText
load --table Users
[
["name"],
["Alice"]
]
select Users \
--match_columns name \
--query "Bo* OR li"
[
[
0,
1719377505.628048,
0.0007376670837402344
],
[
[
[
0
],
[
[
"_id",
"UInt32"
],
[
"name",
"ShortText"
]
]
]
]
]
Conclusion
Please refert to the following news for more details.
News Release 14.0.5
Let's search by Groonga!
2024-05-29
Groonga 14.0.4 has been released
Groonga 14.0.4 has been released!
How to install: Install
Changes
Here are important changes in this release:
Fixes
-
[query_parallel_or] Fixed a bug that the match_escalation_threshold
or force_match_escalation
options were ignored when using query_parallel_or()
.
Before the fix, even when match_escalation_threshold
was set to disable match escalation, the results still escalated when we use query_parallel_or()
.
This problem occurred only query_parallel_or()
. query()
don't occur this problem.
Generally, we don't disable match escalation. Because we want to get something search results. The number of hits is 0 is the unwelcome result of us.
Therefore, this problem has no effect by many users. However, it has effect by user who use stop word as below.
plugin_register token_filters/stop_word
table_create Memos TABLE_NO_KEY
column_create Memos content COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStopWord
column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
column_create Terms is_stop_word COLUMN_SCALAR Bool
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
load --table Memos
[
{"content": "Hello"},
{"content": "Hello and Good-bye"},
{"content": "Good-bye"}
]
select Memos \
--filter 'query_parallel_or(["content", "content", "content", "content"], \
"and", \
{"options": {"TokenFilterStopWord.enable": true}})' \
--match_escalation_threshold -1 \
--sort_keys -_score
We don't want to match a keyword that is registered as stopword.
Therefore, we set -1
to match_escalation_threshold
in the above example.
We expect that Groonga doesn't return records in the above example because of escalation disable and search keyword(and
) is registered as stopword.
However, If this problem occur, Groonga returns match record.
Because if we use query_parallel_or()
, match_escalation_threshold
doesn't work.
-
Fixed a bug that full-text search againt a reference column of a vector didn't work.
This problem has occured Groonga v14.0.0 or later.
This problem has effect if we execute full-text search against a reference column of a vector.
We expected that Groonga returns [1, "Linux MySQL"]
and [2, "MySQL Groonga"]
as below example.
However, before the fix, Groonga always returned 0 hits as below because of we executed full-text search on a reference column of a vector.
table_create bugs TABLE_PAT_KEY UInt32
table_create tags TABLE_PAT_KEY ShortText --default_tokenizer TokenDelimit
column_create tags name COLUMN_SCALAR ShortText
column_create bugs tags COLUMN_VECTOR tags
load --table bugs
[
["_key", "tags"],
[1, "Linux MySQL"],
[2, "MySQL Groonga"],
[3, "Mroonga"]
]
column_create tags bugs_tags_index COLUMN_INDEX bugs tags
select --table bugs --filter 'tags @ "MySQL"'
[
[
0,
0.0,
0.0
],
[
[
[
0
],
[
[
"_id",
"UInt32"
],
[
"_key",
"UInt32"
],
[
"tags",
"tags"
]
]
]
]
]
Conclusion
Please refert to the following news for more details.
News Release 14.0.4
Let's search by Groonga!
2024-05-09
Groonga 14.0.3 has been released
Groonga 14.0.3 has been released!
How to install: Install
Changes
Here are important changes in this release:
Improvements
-
We optimized performance as below.
-
We optimized performance of OR
and AND
search when the number of hits were many.
-
We optimized performance of prefix search(@^
).
-
We optimized performance of AND
search when the number of records of A
more than B
in condition of A AND B
.
-
We optimized performance of search when we used many dynamic columns.
-
token_ngram Added new option ignore_blank
.
We can replace TokenBigramIgnoreBlank
with TokenNgram("ignore_blank", true)
as below.
Here is example of use TokenBigram
.
tokenize TokenBigram "! ! !" NormalizerAuto
[
[
0,
1715155644.64263,
0.001013517379760742
],
[
{
"value": "!",
"position": 0,
"force_prefix": false,
"force_prefix_search": false
},
{
"value": "!",
"position": 1,
"force_prefix": false,
"force_prefix_search": false
},
{
"value": "!",
"position": 2,
"force_prefix": false,
"force_prefix_search": false
}
]
]
Here is example of use TokenBigramIgnoreBlank
.
tokenize TokenBigramIgnoreBlank "! ! !" NormalizerAuto
[
[
0,
1715155680.323451,
0.0009913444519042969
],
[
{
"value": "!!!",
"position": 0,
"force_prefix": false,
"force_prefix_search": false
}
]
]
Here is example of use TokenNgram("ignore_blank", true)
.
tokenize 'TokenNgram("ignore_blank", true)' "! ! !" NormalizerAuto
[
[
0,
1715155762.340685,
0.001041412353515625
],
[
{
"value": "!!!",
"position": 0,
"force_prefix": false,
"force_prefix_search": false
}
]
]
-
ubuntu Add support for Ubuntu 24.04 LTS (Noble Numbat).
Fixes
-
request_cancel Fix a bug that Groonga may crash when we execute request_cancel
command while we execute the other query.
-
Fixed the unexpected error when using --post_filter
with --offset
greater than the post-filtered result
In the same situation, using --filter
with --offset
doesn't raise the error.
This inconsistency in behavior between --filter
and --post-filter
has now been resolved.
table_create Users TABLE_PAT_KEY ShortText
column_create Users age COLUMN_SCALAR UInt32
load --table Users
[
["_key", "age"],
["Alice", 21],
["Bob", 22],
["Chris", 23],
["Diana", 24],
["Emily", 25]
]
select Users \
--filter 'age >= 22' \
--post_filter 'age <= 24' \
--offset 3 \
--sort_keys -age --output_pretty yes
[
[
-68,
1715224057.317582,
0.001833438873291016,
"[table][sort] grn_output_range_normalize failed",
[
[
"grn_table_sort",
"/home/horimoto/Work/free-software/groonga.tag/lib/sort.c",
1052
]
]
]
]
-
Fixed a bug where incorrect search result could be returned when not all phrases within (...)
matched using near phrase product.
For example, there is no record which matched (2)
condition using --query '*NPP1"(a) (2)"'
.
In this case, the expected behavior would be return no record. However, the actual behavior was equal to the query --query '*NPP and "(a)"
as below.
This means that despite no records matched (2)
, records like ax1
and axx1
were incorrectly returned.
table_create Entries TABLE_NO_KEY
column_create Entries content COLUMN_SCALAR Text
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenNgram
column_create Terms entries_content COLUMN_INDEX|WITH_POSITION Entries content
load --table Entries
[
{"content": "ax1"},
{"content": "axx1"}
]
select Entries \
--match_columns content \
--query '*NPP1"(a) (2)"' \
--output_columns 'content'
[
[
0,
1715224211.050228,
0.001366376876831055
],
[
[
[
2
],
[
[
"content",
"Text"
]
],
[
"ax1"
],
[
"axx1"
]
]
]
]
-
Fixed a bug that rehash failed or data in a table broke when rehash occurred that the table with TABLE_HASH_KEY
has 2^28 or more records.
-
Fixed a bug that highlight position slipped out of place in the following cases.
-
If full width space existed before highlight target characters as below.
We expected that Groonga returned "Groonga <span class=\"keyword\">高</span>速!"
.
However, Groonga returned "Groonga <span class=\"keyword\">高速</span>!"
as below.
table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer 'TokenNgram("report_source_location", true)' \
--normalizer 'NormalizerNFKC150("report_source_offset", true)'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
load --table Entries
[
{"body": "Groonga 高速!"}
]
select Entries \
--output_columns \
--match_columns body \
--query '高' \
--output_columns 'highlight_html(body, Terms)'
[
[
0,
1715215640.979517,
0.001608610153198242
],
[
[
[
1
],
[
[
"highlight_html",
null
]
],
[
"Groonga <span class=\"keyword\">高速</span>!"
]
]
]
]
-
If we used TokenNgram("loose_blank", true)
and if highlight target characters included full width space as below.
We expected that Groonga returned "<span class=\"keyword\">山田 太郎</span>"
.
However, Groonga returned "<span class=\"keyword\">山田 太</span>"
as below.
table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer 'TokenNgram("loose_blank", true, "report_source_location", true)' \
--normalizer 'NormalizerNFKC150("report_source_offset", true)'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
load --table Entries
[
{"body": "山田 太郎"}
]
select Entries --output_columns \
--match_columns body --query '山田太郎' \
--output_columns 'highlight_html(body, Terms)' --output_pretty yes
[
[
0,
1715220409.096246,
0.0004854202270507812
],
[
[
[
1
],
[
[
"highlight_html",
null
]
],
[
"<span class=\"keyword\">山田 太</span>"
]
]
]
]
-
If white space existed in the front of highlight target characters as below.
We expected that Groonga returned " <span class=\"keyword\">山</span>田太郎"
.
However, Groonga returned " <span class=\"keyword\">山</span>"
as below.
table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer 'TokenNgram("report_source_location", true)' \
--normalizer 'NormalizerNFKC150("report_source_offset", true)'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
load --table Entries
[
{"body": " 山田太郎"}
]
select Entries \
--output_columns \
--match_columns body \
--query '山' \
--output_columns 'highlight_html(body, Terms)' --output_pretty yes
[
[
0,
1715221627.002193,
0.001977920532226562
],
[
[
[
1
],
[
[
"highlight_html",
null
]
],
[
" <span class=\"keyword\">山</span>"
]
]
]
]
-
If the second character of highlight target was full width space as below.
We expected that Groonga returned "<span class=\"keyword\">山 田</span>太郎"
.
However, Groonga returned "<span class=\"keyword\">山 田太</span>郎"
as below.
table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer 'TokenNgram("report_source_location", true)' \
--normalizer 'NormalizerNFKC150("report_source_offset", true)'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
load --table Entries
[
{"body": "山 田太郎"}
]
select Entries \
--output_columns \
--match_columns body \
--query '山 田' \
--output_columns 'highlight_html(body, Terms)'
[
[
0,
1715222501.496007,
0.0005536079406738281
],
[
[
[
0
],
[
[
"highlight_html",
"<span class=\"keyword\">山 田太</span>郎"
]
]
]
]
]
Conclusion
Please refert to the following news for more details.
News Release 14.0.3
Let's search by Groonga!
2024-03-29
Groonga 14.0.2 has been released
Groonga 14.0.2 has been released!
How to install: Install
Changes
Here are important changes in this release:
Improvements
-
Reduced a log level of a log when Groonga setting normalizers/tokenizer/token_filters against temporary table.
For example, the target log of this modification is the following log.
DDL:1234567890:set_normalizers NormalizerAuto
PGroonga sets normalizers against temporary table on start. So, this log becomes noise. Because this log become output when PGroonga start because of PGroonga’s default log level is notice
.
Therefore, we reduce log level to debug
for the log since this release. Thus, this log does not output when PGroonga start in default.
Conclusion
Please refert to the following news for more details.
News Release 14.0.2
Let's search by Groonga!