2020-12-29
Groonga 10.1.0 has been released
Groonga 10.1.0 has been released!
How to install: Install
Changes
Here are important changes in this release:
-
highlight_html Added support for removing leading full width spaces from highlight target.
-
status Added a new item features
.
-
status Added a new item apache_arrow
.
-
Window function Added support for processing all tables at once even if target tables straddle a shard. (experimental)
-
Added support for sequential search against reference column.
-
[tokenizers] Added support for the token column into TokenDocumentVectorTFIDF
and TokenDocumentVectorBM25
.
-
Improved performance when below case.
(column @ "value") && (column @ "value")
-
Ubuntu Added support for Ubuntu 20.10 (Groovy Gorilla).
-
Debian Dropped stretch support.
-
CentOS Dropped CentOS 6.
-
[httpd] Updated bundled nginx to 1.19.6.
-
Fixed a bug that Groonga crash when we use multiple keys drilldown and use multiple accessor.
-
Fixed a bug that the near phrase search did not match when the same phrase occurs multiple times.
Conclusion
Please refert to the following news for more details.
News Release 10.1.0
Let's search by Groonga!
2020-12-01
Groonga 10.0.9 has been released
Groonga 10.0.9 has been released!
How to install: Install
Changes
Here are important changes in this release:
-
select Improved performance when we specified -1
to limit
.
-
reference_acquire Added a new option --auto_release_count
.
-
Modify behavior when Groonga evaluated empty vector
and uvector
.
- Empty vector and uvector are evaluated to false in command version 3.
-
Normalizers Added a new Normalizer NormalizerNFKC130
based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 13.0.
-
Token filters Added a new TokenFilter TokenFilterNFKC130
based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 13.0.
-
select Improved performance for "_score = column - X"
.
-
reference_acquire Improved that --reference_acquire
doesn't get unnecessary reference of index column when we specify the --recursive dependent
option.
-
select Add support for ordered near phrase search.
-
[httpd] Updated bundled nginx to 1.19.5.
-
Groonga HTTP server Fixed that Groonga HTTP server finished without waiting all woker threads finished completely.
Conclusion
Please refert to the following news for more details.
News Release 10.0.9
Let's search by Groonga!
2020-11-10
PGroonga (fast full text search module for PostgreSQL) 2.2.7 has been released
PGroonga 2.2.7 has been released! PGroonga makes PostgreSQL fast full text search for all languages.
If you are new user, see also About PGroonga.
Highlight
Here are highlights after PGroonga 2.2.7:
-
Provided the packages for PostgreSQL13.
-
We have already supported PostgreSQL 13 from before version.
However, we don't provided packages for PostgreSQL 13.
-
In this release, we started provision them.
-
[Windows] Upgraded bundled Groonga to 10.0.8.
-
[Ubuntu], [Debian] Fixed a bug that WAL support was disabled.
-
Fixed a bug that PGroonga might crash when PostgreSQL wrote WAL.
- It only occurs in the environment that is enabled
pgroonga.enable_wal
.
How to upgrade
This version is compatible with before versions. You can upgrade by steps in "Compatible case" in Upgrade document.
Announce
Session
This session is for people who have already used PGroonga's WAL or will try to use it int the future.
Conclusion
Try PGroonga when you want to perform fast full text search against all languages on PostgreSQL!
2020-10-29
Groonga 10.0.8 has been released
Groonga 10.0.8 has been released!
How to install: Install
Changes
Here are important changes in this release:
-
select Added support for large drilldown keys.
-
select Added support for handling as the same dynamic column even if columns refer to different tables.
-
select Improved performance when the number of records for search result are huge.
-
Updated bundled LZ4 to 1.9.2 from 1.8.2.
-
Added support xxHash 0.8
-
[httpd] Updated bundled nginx to 1.19.4.
-
Fixed the following bugs related the browser based administration tool.
-
between Fixed a bug that between(_key, ...)
is always evaluated by sequential search.
Conclusion
Please refert to the following news for more details.
News Release 10.0.8
Let's search by Groonga!
2020-09-29
Groonga 10.0.7 has been released
Groonga 10.0.7 has been released!
How to install: Install
Changes
Here are important changes in this release:
-
[highlight], [highlight_full] Added support for normalizer options.
-
return code Added a new return code GRN_CONNECTION_RESET
for resetting connection.
- it is returned when an existing connection was forcibly close by the remote host.
-
Dropped Ubuntu 19.10 (Eoan Ermine).
- Because this version has been EOL.
-
[httpd] Updated bundled nginx to 1.19.2.
-
grndb Added support for detecting duplicate keys.
grndb check
is also able to detect duplicate keys since this release.
- This check valid except a table of
TABLE_NO_KEY
.
- If the table that was detected duplicate keys by
grndb check
has only index columns, we can recover by grndb recover
.
-
[table_create], [column_create] Added a new option --path
.
-
[dump] Added a new option --dump_paths
.
-
Added a new function string_toknize()
.
- It tokenizes the column value that is specified in the second argument with the tokenizer that is specified in the first argument.
-
[tokenizer] Added a new tokenizer TokenDocumentVectorTFIDF
(experimental).
- It generates automatically document vector by TF-IDF.
-
[tokenizer] Added a new tokenizer TokenDocumentVectorBM25
(experimental).
- It generates automatically document vector by BM25.
-
[select] Added support for near search in same sentence.
-
Fixed a bug that load
didn't a return response when we executed it against 257 columns.
-
[MessagePack] Fixed a bug that float32 value isn't be unpacked correctly.
-
Fixed the following bugs related multi column index.
_score
may be broken with full text search.
- The records that couldn't hit might hit.
[highlight], highlight_full Added support for normalizer options
- We can also specify normalizer options into
highlight()
and highlight_full()
.
-
Please refer to the following about possible options to set.
- https://groonga.org/docs/reference/normalizers/normalizer_nfkc100.html#parameters
-
For example, we can identify hyphen that has different code point by using unify_hyphen
.
table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
load --table Entries
[
{"body": "full-text-search. Use U+002D HYPHEN-MINUS"},
{"body": "full֊text֊search. Use U+058A ARMENIAN HYPHEN"},
{"body": "full˗text˗search. Use U+02D7 MODIFIER LETTER MINUS SIGN"}
]
select Entries --output_columns \
'highlight_full(body, \
"NormalizerNFKC121(\\"unify_hyphen\\", true)", \
true, \
"full-text-search", \
"<span class=\\"keyword1\\">", \
"</span>")' --output-pretty yes
[
[
0,
0.0,
0.0
],
[
[
[
3
],
[
[
"highlight_full",
null
]
],
[
"<span class=\"keyword1\">full-text-search</span>. Use U+002D HYPHEN-MINUS"
],
[
"<span class=\"keyword1\">full֊text֊search</span>. Use U+058A ARMENIAN HYPHEN"
],
[
"<span class=\"keyword1\">full˗text˗search</span>. Use U+02D7 MODIFIER LETTER MINUS SIGN"
]
]
]
]
-
If we don't specify unify_hyphen
option, {"body": "full-text-search. Use U+002D HYPHEN-MINUS"}
is only highlighted as below.
- Because the other record different code point from the hyphen that is included the search keyword.
select Entries --output_columns \
'highlight_full(body, \
"NormalizerNFKC121()", \
true, \
"full-text-search", \
"<span class=\\"keyword1\\">", \
"</span>")'
[
[
0,
0.0,
0.0
],
[
[
[
3
],
[
[
"highlight_full",
null
]
],
[
"<span class=\"keyword1\">full-text-search</span>. Use U+002D HYPHEN-MINUS"
],
[
"full֊text֊search. Use U+058A ARMENIAN HYPHEN"
],
[
"full˗text˗search. Use U+02D7 MODIFIER LETTER MINUS SIGN"
]
]
]
]
-
We can store specified a table or a column to any path using this option.
-
This option is useful if we want to store a table or a column that
we often use to fast storage (e.g. SSD) and store them that we don't often
use to slow storage (e.g. HDD).
-
We can specify both relative path and absolute path in this option.
- If we specify relative path in this option, the path is resolved the path of
groonga
process as the origin.
-
However, if we specify --path
, the result of dump
command includes --path
informations.
- Therefore, if we specify
--path
, we can't restore to host in different enviroment.
- If we don't want include
--path
informations to a dump, we need specify --dump_paths no
in dump
command.
dump Added a new option --dump_paths
.
-
--dump_paths
option control whether --path
is dumped or not.
-
The default value of it is yes
.
-
If we specify --path
when we create tables or columns and we don't want include --path
informations to a dump, we specify no
into --dump_paths
when we execute dump
command.
select Added support for near search in same sentence.
-
the near search can't search in the same sentence until now.
-
It can search in the same sentence as below from this release.
table_create Memos TABLE_PAT_KEY ShortText
column_create Memos content COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
load --table Memos
[
{"_key":"alphabets1", "content": "a c d ."},
{"_key":"alphabets2", "content": "a b c d e f ."},
{"_key":"alphabets3", "content": "a b x c d e f ."},
{"_key":"alphabets4", "content": "a b x x c d e f ."}
]
select \
--table Memos \
--match_columns content \
--query '*NP3,-1"a c .$"' \
--output_columns _score,_key,content
[
[
0,
0.0,
0.0
],
[
[
[
2
],
[
[
"_score",
"Int32"
],
[
"_key",
"ShortText"
],
[
"content",
"ShortText"
]
],
[
1,
"alphabets1",
"a c d ."
],
[
1,
"alphabets2",
"a b x c ."
]
]
]
]
-
We use the following syntax for using near-search in the same sentence.
-
'"NP${MAX_INTERVAL},${ADDITIONAL_LAST_INTERVAL}"'${FIRST_PHRASE},${LASR_PHRASE} ${SEPARATOR}$
-
If we specify -1
into ${ADDITIONAL_LAST_INTERVAL}, a record that the interval the first phrase and the last phrase less or equal than ${MAX_INTERVAL} hit.
- In this case, however much the phrase and the separator are apart from each other, it hit.
-
If we specify an integer not smaller than 1
into ${ADDITIONAL_LAST_INTERVAL}, the record of the following conditions hit.
- The interval of the first phrase and the last phrase less or equal than ${MAX_INTERVAL}.
- The interval of the first phrase and the separator less or equal than ${MAX_INTERVAL}+${ADDITIONAL_LAST_INTERVAL}.
-
If we specify an integer not smaller than 0
into ${ADDITIONAL_LAST_INTERVAL}, the near-search same behavior as before.
- The default value of ${ADDITIONAL_LAST_INTERVAL} is
0
.
-
We can specify any character in ${SEPARATOR}.
Fixed the following bugs related multi column index.
Conclusion
Let's search by Groonga!