PGroonga 1.0.0 has been released
PGroonga (píːzí:lúnɡά) 1.0.0 has been released! It's the first major release!
PGroonga is a PostgreSQL extension that makes PostgreSQL fast full text search platform for all languages!
pg_trgm doesn't support languages that use non-alphanumerics characters such as Japanese and Chinese.
pg_bigm supports languages that use non-alphanumerics characters but it's slow.
PGroonga supports all languages, provides rich full text search related features and is very fast. Because PGroonga uses Groonga that is a full-fledged full text search engine as backend.
For example, PGroonga is a few times faster than pg_bigm. In some cases, PGroonga is 10 times over faster than pg_bigm.
Here are benchmark results between PGroonga and pg_bigm. They use Japanese Wikipedia data.
Here is a benchmark result for creating an index:
|Extension||Index creation time|
|pg_bigm||5h 56m 15s|
In this case, PGroonga is about 14 times faster than pg_bigm.
Here is a benchmark result for full text search:
|Search keywords||N hits||PGroonga||pg_bigm|
|"PostgreSQL" or "MySQL"||368||0.030s||0.107s|
|データベース (database in Japanese)||17172||0.121s||1.224s|
|テレビアニメ (TV animation in Japanese)||22885||0.179s||2.472s|
|日本 (Japan in Japanese)||625792||0.646s||0.556s|
In "日本" (Japan in Japanese) case, pg_bigm is a bit faster(*) than PGroonga. But PGroonga is 3 times to 14 times faster than pg_bigm in other cases. The result shows that PGroonga can perform stable high performance fast full text search against all keywords.
(*) pg_bigm can perform faster full text search against keywords that have 2 or less characters rather than keywords that have 3 or more characters.
PGroonga provides the following features that aren't provided by other extensions:
- Normalize feature
- Custom tokenizer feature
- Snippet feature
Normalize feature is a feature that converts different notation texts to unified notation text.
For simple example, both "ABC" and "abc" are converted to "abc".
For more complex example, both "ﾎﾟｽｸﾞﾚ" (HALFWIDTH KATAKANA) and "ポスグレ" (FULLWIDTH KATAKANA) are converted to "ポスグレ" (FULLWIDTH KATAKANA). ("ポスグレ" is an abbreviation of PostgreSQL in Japanese.)
This normalization is based on Unicode NFKC.
Custom tokenizer feature is a feature that customizes search keyword extraction process (tokenization). If you can custom tokenization, you can control trade-off between search precision and search performance.
For example, if you use "tokenizer that is based of morphological analyzer", you can get better search precision and search performance but may not find some texts.
FYI: There is no other extension that supports morphological analyzer based tokenizer. PGroonga is the only extension that supports it.
Snippet feature is a feature that shows texts around keyword. It's used by Web search engine. Google also uses it. You can find it under page title in hit page list. PGroonga provides a function that implements it.
There are more features:
- Query language that uses Web search engine like syntax
- JSON search
- You can use each value for condition. You can also perform full text search against all texts in JSON. No other extension such as JsQuery doesn't provide full text search feature against JSON.
Here are features that will be implemented in the feature. They are already implemented in Groonga.
- Query expansion feature
- Weight feature
- Stemming feature
You can use PGroonga without full text search knowledge. You just create an index and puts a condition into
CREATE INDEX index_name ON table USING pgroonga (column); SELECT * FROM table WHERE column @@ 'PostgreSQL';
You can also use
LIKE to use PGroonga. PGroonga provides a feature that performs
LIKE with index.
LIKE with PGroonga index is faster than
LIKE without index. It means that you can improve performance without changing your application that uses the following SQL:
SELECT * FROM table WHERE column LIKE '%PostgreSQL%';
You can install PGroonga easily. Because PGroonga provides packages for major platforms. There are binaries for Windows.
New PGroonga version has been released. PGroonga is a PostgreSQL extension that makes PostgreSQL fast full text search platform for all languages.
It's the first major release. If you want to use fast full text search for all langauges on PostgreSQL, try PGroonga!