BloGroonga

2017-11-29

Groonga 7.0.9 has been released

Groonga 7.0.9 has been released!

How to install: Install

Changes

Here are important changes in this release:

More than 126 arguments has been supported in in_values function

In the previous version, there is a limitation about the max number of arguments in in_values.

This fact disturbs you to simplify the query by in_values which uses too many OR and == in a query.

In this release, this limitation about the max number of arguments is removed.

Dynamic columns has been supported for logical_range_filter and logical_count

Not only select and logical_select commands, but also logical_range_filter and logical_count command, dynamic columns has been supported.

logical_range_filter command is fast in contrast to logical_select command if many records are matched and requested records are small enough.

Conclusion

See Release 7.0.9 2017-11-29 about detailed changes since 7.0.8

Let's search by Groonga!

2017-10-29

Groonga 7.0.8 has been released

Groonga 7.0.8 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • [Windows] Supported backtrace on crash
  • Fixed the some cases doesn't work for QUERY_NO_SYNTAX_ERROR flag
  • Supported Ubuntu 17.10 (Artful Aardvark)

[Windows] Supported backtrace on crash

This feature not only function call history but also source filename and number of lines can be displayed as much as possible. This feature makes problem solving easier.

Example of bcktrace

2017-10-29 16:27:02.371000|C| db.c:12352:0: 00000000657E9BD9: grn_table_sort(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.402000|C| window_function.c:374:0: 00000000659D08FB: grn_table_apply_window_function(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.434000|C| mrb_table.c:437:0: 00000000659EEFE0: mrb_grn_table_apply_window_function_raw(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.449000|C| ..\mruby-source\src\vm.c:1266:0: 0000000065A573BF: mrb_vm_exec(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.465000|C| ..\mruby-source\src\vm.c:821:19: 0000000065A59CA7: mrb_vm_run(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.481000|C| ..\mruby-source\src\vm.c:2619:0: 0000000065A5347C: mrb_run(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.481000|C| (unknown):-1:-1: 0000000065A5389D: mrb_funcall_with_block(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.496000|C| ..\mruby-source\src\vm.c:360:0: 0000000065A53CA9: mrb_funcall_with_block(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.512000|C| ..\mruby-source\src\vm.c:345:0: 0000000065A53E9F: mrb_funcall(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.543000|C| mrb_command.c:90:0: 00000000659E11FA: mrb_grn_command_run_wrapper(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.559000|C| command.c:199:0: 00000000657BA61C: grn_command_run(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.574000|C| expr.c:2635:0: 0000000065807B5B: grn_expr_exec(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.590000|C| ctx.c:1255:14: 00000000657BE8AC: grn_ctx_qe_exec(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.590000|C| ctx.c:1361:14: 00000000657BF36D: grn_ctx_send(): <libgroonga-0>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\libgroonga-0.dll>
2017-10-29 16:27:02.606000|C| groonga.c:402:0: 0000000000410A61: main(): <groonga>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\groonga.exe>
2017-10-29 16:27:02.606000|C| .\mingw-w64-crt\crt\crtexe.c:336:0: 00000000004013F8: __tmainCRTStartup(): <groonga>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\groonga.exe>
2017-10-29 16:27:02.606000|C| .\mingw-w64-crt\crt\crtexe.c:214:0: 000000000040151B: mainCRTStartup(): <groonga>: <c:\Users\groonga\groonga-7.0.8-x64-2017-10-29\bin\groonga.exe>
2017-10-29 16:27:02.606000|C| (unknown):-1:-1: 000000003B44168D: (unknown)(): <(unknown)>: <(unknown)>

Fixed the some cases doesn't work for QUERY_NO_SYNTAX_ERROR flag

In the previous version, QUERY_NO_SYNTAX_ERROR flag is introduced. If this flag is set, query never causes syntax error.

But there are cases that it causes an error when this flag is used with --query '( )' , --query '(+)' , and --query '~foo' . In this release, this bug was fixed.

Supported Ubuntu 17.10 (Artful Aardvark)

In this release, Ubuntu 17.10 (Artful Aardvark) is supported!

Conclusion

See Release 7.0.8 2017-10-29 about detailed changes since 7.0.7.

Let's search by Groonga!

2017-10-10

PGroonga (fast full text search module for PostgreSQL) 2.0.2 has been released

PGroonga 2.0.2 has been released! PGroonga makes PostgreSQL fast full text search for all languages. This is major version up!

About PGroonga

I will explain about PGroonga. Because it is the first release announcement after the major version upgrade. The highlight is summarized after this.

PGroonga is a PostgreSQL extension that makes PostgreSQL fast full text search platform for all languages. There are some PostgreSQL extensions that improve full text search feature of PostgreSQL. PGroonga provides full text search related rich features. PGroonga is very fast. Because PGroonga uses Groonga that is a real full text search engine as backend.

Performance

PGroonga is faster than pg_bigm. PGroonga is faster than textsearch bundled in PostgreSQL. PGroonga is faster than them about index creation and full text search.

Here is a benchmark result between PGroonga and pg_bigm. They use Japanese Wikipedia data.

Extension Index creation time
PGroonga About 19m
pg_bigm About 33m

In this case, PGroonga is about 2 times faster than pg_bigm.

Here is a benchmark result between PGroonga and textsearch. They use English Wikipedia data. Because textsearch isn't support Japanese.

You can't comparison directly against the above result. Because the amount of data is different.

Extension Index creation time
PGroonga About 1h 24m
textsearch About 2h 53m

In this case, PGroonga is about 2 times faster than textsearch.

Here is a benchmark result for full text search between PGroonga and pg_bigm:

Search keywords N hits PGroonga pg_bigm
"PostgreSQL" or "MySQL" About 300 About 2ms About 49ms
データベース (database in Japanese) About 15000 About 49ms About 1300ms
テレビアニメ (TV animation in Japanese) About 20000 About 65ms About 2800ms
日本 (Japan in Japanese) About 530000 About 560ms About 480ms

In "日本" (Japan in Japanese) case, pg_bigm is a bit faster (*1) than PGroonga. But PGroonga is 24 times to 43 times faster than pg_bigm in other cases. The result shows that PGroonga can perform stable high performance fast full text search against all keywords.

(*1) pg_bigm can perform faster full text search against keywords that have 2 or less characters rather than keywords that have 3 or more characters.

Here is a benchmark result for full text search between PGroonga and textsearch:

Search keywords N hits PGroonga textsearch Groonga
"PostgreSQL" or "MySQL" About 1600 About 6ms About 3ms About 3ms
database About 210000 About 698ms About 602ms About 19ms
animation About 40000 About 173ms About 1000ms (*2) About 6ms
America About 470000 About 1300ms About 1200ms About 45ms

(*2) textsearch is slow because hit about 420 thousand items (about 10 times larger of PGroonga) with "animation". This is caused by stemming. "animation" is stemmed as "anim".

The search performance of PGroonga and textsearch are almost the same. Textsearch is slower in "animation" because it comes from the difference in the number of hits, not the essential search performance difference.

There are Groonga's results as reference. Groonga is the full text search engine of PGroonga. Groonga can search every cases in less than 50ms. It shows that the main processes of PGroonga and textsearch aren't full text search in these cases. It shows there are common overhead in PostgreSQL. It has greater impact than full text search.

You can see more details of these benchmark results:

PGroonga provides the following features that aren't provided by other extensions:

  • Normalize feature
  • Custom tokenizer feature
  • Custom token filter feature
  • Search using query language
  • HTML highlight feature
  • HTML snippet feature
  • JSON search
  • Auto complete feature
  • Similar document search feature
  • Synonym expansion feature

Normalize feature is a feature that unifies different notation texts to an unified notation text.

For simple example, both "POSTGRESQL" (uppercase only) and "PostgreSQL" (mixed case) are converted to "postgresql" (lowercase only). You can search "PostgreSQL" (mixed case) by "postgresql" (lowercase).

For more complex example, "¼" (U+00BC VULGAR FRACTION ONE QUARTER) is converted to "1/4" ("1", "/" and "4"). This normalization is based on Unicode NFKC

Custom tokenizer feature is a feature that customizes search keyword extraction process (tokenization). If you can custom tokenization, you can control trade-off between search precision and search performance.

For example, if you use "tokenizer that is based of character based N-gram" instead of "tokenizer that is based on character and character type based N-gram", you can get better search precision and search performance but may not find some texts. You can search "123" by "2" with character based N-gram but not with character and character type based N-gram.

Custom token filter feature is a function that customizes how to process keywords extracted by tokenizer. Textsearch has the same function with the name dictionary. Both PGroonga and textsearch implements the stemming function with this mechanism.

Search using query language is a function that specifies AND/OR/NOT search by user with a mini language like "A OR (B - 1)". The syntax is similar to Google's one.

HTML highlight feature is a function to mark up search keywords with <span class="keyword">...</span>. The result is safe to use in HTML as it is. It's useful for Web application development.

HTML snippet feature is a function to return texts around search keyword. The feature is used in Google search results too. The keyword is marked up with <span class="keyword">...</span>. It's safe to use the result in HTML as it is.

JSON search feature is a function that searches JSON contents flexible. You can index jsonb type column as is. You don't need to use expression for indexing. You can perform full text search against all texts in JSON. It's useful to insert all logs as JSON that have some different structured and search them later.

Auto complete feature is a function to completes an input in a text box for entering a search keyword. Google implements it too. You can support completing by romaji like Google does.

Similar document search feature is a function to search texts whose contents are similar. You can use this feature to show similar entries in blog system.

Synonym expansion feature is a function that searches keywords that have the same meaning but different expressions. For example, You can search "PostgreSQL" with "PostgreSQL" or "PG".

See reference manual and how to for details of these features.

Here are features that will be implemented in the feature. They are already implemented in Groonga.

  • Weight feature

Usage

You can use PGroonga without full text search knowledge. You just create an index and puts a condition into WHERE:

CREATE INDEX index_name ON table USING pgroonga (column);

SELECT * FROM table WHERE column &@~ 'PostgreSQL';

You can also use LIKE to use PGroonga. PGroonga provides a feature that performs LIKE with index. LIKE with PGroonga index is faster than LIKE without index. It means that you can improve performance without changing your application that uses the following SQL:

SELECT * FROM table WHERE column LIKE '%PostgreSQL%';

It's recommend that you migrate to &@~, an operator for full text search, from LIKE. Because &@~ is faster than LIKE.

Are you interested in PGroonga? Please install and try tutorial. You can know all PGroonga features.

You can install PGroonga easily. Because PGroonga provides packages for major platforms. There are binaries for Windows.

Highlight

Here are highlights after PGroonga 1.2.3:

  • Support PostgreSQL 10

  • Improve accuracy of query execution plan (performance is improved)

  • pgroonga schema is deprecated

    • You can still use pgroonga scheme. Because PGroonga 2.x supports backward compatibility with PGroonga 1.x.

Support PostgreSQL 10

PGroonga supports PostgreSQL 10. You can use PGroonga in the latest PostgreSQL!

PGroonga supports logical replication. Logical replication is a new feature in PostgreSQL 10. You can also use physical replication for replication. You can choose physical replication or logical replication in PostgreSQL 10.

Physical replication uses more disk space, but crash recovery works in many cases.

On the other hand, you can use more flexible schema with logical replication. For example, you can use master only for update and use slaves for all search. You just create PGroonga indexes in slave. It improves update performance and supports scale-out for search.

We'll publish benchmark results for each replication. Stay tuned!

If you need commercial support about PostgreSQL cluster and PGroonga, contact us.

Improve accuracy of query execution plan (performance is improved)

PostgreSQL planner estimates execution cost based on information from each index including PGroonga and selects the best execution plan. If PGroonga returns more accurate information, PostgreSQL can selects more effective execution plan.

In this release, PGroonga improves index information of expression that uses a STABLE function or IMMUTABLE function. PGroonga provides pgroonga_query_expand() as an IMMUTABLE function. It's a function that expands query. It's used like the following. PostgreSQL will select a good execution plan when you use like the following SQL:

SELECT *
  FROM diaries
 WHERE content &@~ pgroonga_query_expand('synonyms', 'term', 'synonyms',
                                         'SEARCH QUERY BY THE USER');

pgroonga schema is deprecated

PGroonga 1.x defines functions and operator classes in the pgroonga schema. Some users say that it's useful to use the current schema (public in most cases) with prefixed names instead of using pgroonga schema. pgroonga schema is deprecated in PGroonga 2.x. PGroonga 2.x defines functions and operator classes with pgroonga _ prefix.

pgroonga schema is deprecated but you can still use pgroonga schema. You can upgrade to PGroonga 2.x safely. pgroonga schema is maintained at least in PGroonga 2.x. Please migrate to pgroonga_ prefixed name gradually.

How to upgrade

This version is compatible with 1.0 or later. You can upgrade by steps in "Compatible case" in Upgrade document.

Announce

Sessions

Both sessions are about PGroonga 2. The session of PostgreSQL Conference Japan 2017 is for people who are not using PGroonga yet. The session of PGConf.ASIA 2017 is for people who already use PGroonga.

Support service

If you need commercial support for PGroonga, contact us.

Conclusion

New PGroonga version has been released. PGroonga 2 provides more PostgreSQL friendly interface.

See also release note for all changes.

Try PGroonga when you want to perform fast full text search against all languages on PostgreSQL!

2017-09-29

Groonga 7.0.7 has been released

Groonga 7.0.7 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • Fixed the case that --query '+' doesn't work for QUERY_NO_SYNTAX_ERROR flag
  • --default-command-version 3 has been supported
  • Caching select result with function call has been supported.

Fixed the case that --query '+' doesn't work for QUERY_NO_SYNTAX_ERROR flag

In the previous version, QUERY_NO_SYNTAX_ERROR flag is introduced. If this flag is set, query never causes syntax error.

But there is a case that it causes an error when this flag is used for --query '+'. In this release, this bug was fixed.

--default-command-version 3 has been supported

In this release, groonga executable now supports --default-command-version 3. In the previous versions, groonga executable only supports --command_version 3 but not for --default-command-version 3.

Caching select result with function call has been supported.

In this release, caching select result with function call feature has been supported.

Now, most of existing functions supports this feature.

But there are two exception. When now() and rand() are used in a query, select result will not be cached.

Conclusion

See Release 7.0.7 2017-09-29 about detailed changes since 7.0.6.

Let's search by Groonga!

2017-08-29

Groonga 7.0.6 has been released

Groonga 7.0.6 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • object_inspect command has been supported to show disk usage
  • Falllback feature when parsing query has been supported
  • The score adjusting about keyword in query has been supported

object_inspect command has been supported to show disk usage

In this release, object_inspect command has been supported to show disk usage.

In the previous versions, there is no easy way to calculate the disk usage about each objects such as tables, index columns and so on.

object_inspect command returns disk_usage parameter in response. It returns size in bytes.

table_create --name Site --flags TABLE_HASH_KEY --key_type ShortText
column_create --table Site --name title --type ShortText
load --table Site
[
{"_key":"http://example.org/","title":"This is test record 1!"},
{"_key":"http://example.net/","title":"test record 2."},
{"_key":"http://example.com/","title":"test test record three."},
{"_key":"http://example.net/afr","title":"test record four."},
{"_key":"http://example.org/aba","title":"test test test record five."},
{"_key":"http://example.com/rab","title":"test test test test record six."},
{"_key":"http://example.net/atv","title":"test test test record seven."},
{"_key":"http://example.org/gat","title":"test test record eight."},
{"_key":"http://example.com/vdw","title":"test test record nine."},
]
table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
column_create --table Terms --name blog_title --flags COLUMN_INDEX|WITH_POSITION --type Site --source title

Execute the following command to check the disk usage about Terms table.

object_inspect --output_pretty yes Terms

Then, object_inspect command returns the following result.

{
  "id": 258,
  "name": "Terms",
  "type": {
    "id": 49,
    "name": "table:pat_key"
  },
  "key": {
    "type": {
      "id": 14,
      "name": "ShortText",
      "type": {
        "id": 32,
        "name": "type"
      },
      "size": 4096
    },
    "total_size": 21,
    "max_total_size": 4294967294
  },
  "value": {
    "type": null
  },
  "n_records": 15,
  "disk_usage": 8437760
}

It turns out that the disk usage is "disk_usage": 8437760.

Let's check the disk usage about index column.

Execute the following command to check blog_title column in Terms table.

object_inspect --output_pretty yes Terms.blog_title

object_inspect command returns the following result.

{
  "id": 259,
  "name": "blog_title",
  "table": {
    "id": 258,
    "name": "Terms",
    "type": {
      "id": 49,
      "name": "table:pat_key"
    },
  (省略)
  ],
  "disk_usage": 5283840
}

It turns out that the disk usage is "disk_usage": 5283840.

Falllback feature when parsing query has been supported

It is enabled when QUERY_NO_SYNTAX_ERROR flag is set to query_flags.

This feature is disabled by default.

If this flag is set, query never causes syntax error. For example, "A +" is parsed and escaped automatically into "A" and "+". This behavior is useful when application uses user input directly and doesn't want to show syntax error to user and in log.

Here is the example how to use QUERY_NO_SYNTAX_ERROR.

table_create --name Magazine --flags TABLE_HASH_KEY --key_type ShortText
column_create --table Magazine --name title --type ShortText
load --table Magazine
[
{"_key":"http://gihyo.jp/magazine/wdpress","title":"WEB+DB PRESS"},
{"_key":"http://gihyo.jp/magazine/SD","title":"Software Design"},
]
table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
column_create --table Terms --name title --flags COLUMN_INDEX|WITH_POSITION --type Magazine --source title

Let's search by keyword - WEB +.

select Magazine --output_pretty yes --query 'WEB +' --match_columns title"

It causes an syntax error.

[
  [
    -63,
    1503902587.063566,
    0.0007965564727783203,
    "Syntax error: <WEB +||>",
    [
      [
        "yy_syntax_error",
        "grn_ecmascript.lemon",
        37
      ]
    ]
  ]
]

Let's try with QUERY_NO_SYNTAX_ERROR flag.

select Magazine --output_pretty yes --match_columns title --query 'WEB +'  --query_flags ALLOW_PRAGMA|ALLOW_COLUMN|QUERY_NO_SYNTAX_ERROR

It turns out that there is no syntax error.

[
  [
    0,
    1503902343.382929,
    0.0419621467590332
  ],
  [
    [
      [
        1
      ],
      [
        [
          "_id",
          "UInt32"
        ],
        [
          "_key",
          "ShortText"
        ],
        [
          "title",
          "ShortText"
        ]
      ],
      [
        1,
        "http://gihyo.jp/magazine/wdpress",
        "WEB+DB PRESS"
      ]
    ]
  ]
]

With QUERY_NO_SYNTAX_ERROR flag in query, The keyword in above query is parsed into WEB and +. So, it doesn't cause an syntax error.

The score adjusting about keyword in query has been supported

In this release, The feature which adjusts score for term in query has been supported. Actually, >, <, and ~ operators are supported.

For example, >Groonga increments score of Groonga, <Groonga decrements score of Groonga. ~Groonga decreases score of matched document in the current search result. ~ operator doesn't change search result itself.

Here is the sample to show usage.

table_create --name Shops --flags TABLE_NO_KEY
column_create --table Shops --name keyword --type ShortText
load --table Shops
[
{"keyword":"restraunt western food"},
{"keyword":"restraunt japanese food"},
{"keyword":"restraunt chinese food"},
{"keyword":"cafe western food"},
]

Let's search restraunt by the following query.

select Shops --output_pretty yes --match_columns keyword --output_columns keyword,_score --sort_keys -_score --query 'restraunt'

It returns the following result.

[
  [
    3
  ],
  [
    [
      "keyword",
      "ShortText"
    ],
    [
      "_score",
      "Int32"
    ]
  ],
  [
    "restraunt western food",
    1
  ],
  [
    "restraunt chinese food",
    1
  ],
  [
    "restraunt japanese food",
    1
  ]
]

The query returns response which contains same score - 1.

Let's search japanese food with > to adjust score.

select Shops --output_pretty yes --match_columns keyword --output_columns keyword,_score --sort_keys -_score --query 'restraunt (>japanese OR western OR chinese)'

[
  [
    3
  ],
  [
    [
      "keyword",
      "ShortText"
    ],
    [
      "_score",
      "Int32"
    ]
  ],
  [
    "restraunt japanese food",
    8
  ],
  [
    "restraunt chinese food",
    2
  ],
  [
    "restraunt western food",
    2
  ]
]

Now that score of japanese food is largest in the tree restraunt.

Then, try to adjust score with < to raise western food.

select Shops --output_pretty yes --match_columns keyword --output_columns keyword,_score --sort_keys -_score --query 'restraunt (>japanese OR <western OR chinese)'

[
  [
    3
  ],
  [
    [
      "keyword",
      "ShortText"
    ],
    [
      "_score",
      "Int32"
    ]
  ],
  [
    "restraunt japanese food",
    8
  ],
  [
    "restraunt western food",
    7
  ],
  [
    "restraunt chinese food",
    2
  ]
]

As you can see, the score is adjustable by <, and > combination with each keyword.

Conclusion

See Release 7.0.6 2017-08-29 about detailed changes since 7.0.5.

Let's search by Groonga!