BloGroonga

2015-10-29

PGroonga 1.0.0 has been released

PGroonga (píːzí:lúnɡά) 1.0.0 has been released! It's the first major release!

About PGroonga

PGroonga is a PostgreSQL extension that makes PostgreSQL fast full text search platform for all languages!

There are some PostgreSQL extensions that improves full text search feature of PostgreSQL such as pg_trgm and pg_bigm.

pg_trgm doesn't support languages that use non-alphanumerics characters such as Japanese and Chinese.

pg_bigm supports languages that use non-alphanumerics characters but it's slow.

PGroonga supports all languages, provides rich full text search related features and is very fast. Because PGroonga uses Groonga that is a full-fledged full text search engine as backend.

For example, PGroonga is a few times faster than pg_bigm. In some cases, PGroonga is 10 times over faster than pg_bigm.

Here are benchmark results between PGroonga and pg_bigm. They use Japanese Wikipedia data.

Here is a benchmark result for creating an index:

Extension Index creation time
PGroonga 25m 37s
pg_bigm 5h 56m 15s

In this case, PGroonga is about 14 times faster than pg_bigm.

Here is a benchmark result for full text search:

Search keywords N hits PGroonga pg_bigm
"PostgreSQL" or "MySQL" 368 0.030s 0.107s
データベース (database in Japanese) 17172 0.121s 1.224s
テレビアニメ (TV animation in Japanese) 22885 0.179s 2.472s
日本 (Japan in Japanese) 625792 0.646s 0.556s

In "日本" (Japan in Japanese) case, pg_bigm is a bit faster(*) than PGroonga. But PGroonga is 3 times to 14 times faster than pg_bigm in other cases. The result shows that PGroonga can perform stable high performance fast full text search against all keywords.

(*) pg_bigm can perform faster full text search against keywords that have 2 or less characters rather than keywords that have 3 or more characters.

PGroonga provides the following features that aren't provided by other extensions:

  • Normalize feature
  • Custom tokenizer feature
  • Snippet feature

Normalize feature is a feature that converts different notation texts to unified notation text.

For simple example, both "ABC" and "abc" are converted to "abc".

For more complex example, both "ポスグレ" (HALFWIDTH KATAKANA) and "ポスグレ" (FULLWIDTH KATAKANA) are converted to "ポスグレ" (FULLWIDTH KATAKANA). ("ポスグレ" is an abbreviation of PostgreSQL in Japanese.)

This normalization is based on Unicode NFKC.

Custom tokenizer feature is a feature that customizes search keyword extraction process (tokenization). If you can custom tokenization, you can control trade-off between search precision and search performance.

For example, if you use "tokenizer that is based of morphological analyzer", you can get better search precision and search performance but may not find some texts.

FYI: There is no other extension that supports morphological analyzer based tokenizer. PGroonga is the only extension that supports it.

Snippet feature is a feature that shows texts around keyword. It's used by Web search engine. Google also uses it. You can find it under page title in hit page list. PGroonga provides a function that implements it.

There are more features:

  • Query language that uses Web search engine like syntax
  • JSON search
    • You can use each value for condition. You can also perform full text search against all texts in JSON. No other extension such as JsQuery doesn't provide full text search feature against JSON.

Here are features that will be implemented in the feature. They are already implemented in Groonga.

  • Query expansion feature
  • Weight feature
  • Stemming feature

Usage

You can use PGroonga without full text search knowledge. You just create an index and puts a condition into WHERE:

CREATE INDEX index_name ON table USING pgroonga (column);

SELECT * FROM table WHERE column @@ 'PostgreSQL';

You can also use LIKE to use PGroonga. PGroonga provides a feature that performs LIKE with index. LIKE with PGroonga index is faster than LIKE without index. It means that you can improve performance without changing your application that uses the following SQL:

SELECT * FROM table WHERE column LIKE '%PostgreSQL%';

Are you interested in PGroonga? Please install and try tutorial. You can know all PGroonga features.

You can install PGroonga easily. Because PGroonga provides packages for major platforms. There are binaries for Windows.

Conclusion

New PGroonga version has been released. PGroonga is a PostgreSQL extension that makes PostgreSQL fast full text search platform for all languages.

It's the first major release. If you want to use fast full text search for all langauges on PostgreSQL, try PGroonga!