BloGroonga

2018-03-29

Groonga 8.0.1 has been released

Groonga 8.0.1 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • [log] Show filter conditions in query log.
  • [Windows] Install *.pdb into the directory where *.dll and *.exe are installed .
  • [logical_count] Support filtered stage dynamic columns.
  • [logical_count] Added a new filter timing.
  • [logical_select] Added a new filter timing.
  • [logical_range_filter] Optimize window function for large result set.
  • [select] Added --match_escalation parameter.`
  • [httpd] Updated bundled nginx to 1.13.10.
  • Fixed memory leak that occurs when a prefix query doesn't match any token.
  • Fixed a bug that a cache for different databases is used when multiple databases are opened in the same process.
  • Fixed a bug that a constant value can overflow or underflow in comparison (>,>=,<,<=,==,!=).

[log] Show filter conditions in query log.

As a result, under what conditions you can see how many records were narrowed down. Specifically, below.

2018-02-15 19:04:02.303809|0x7ffd9eedf6f0|:000000013837058 filter(17): product equal "test_product"

In the above example, we can 17 records were narrowed down by product == "test_product". It's disabled by default. To enable it, you need to set an environment variable below.

GRN_QUERY_LOG_SHOW_CONDITION=yes

[logical_count] Support filtered stage dynamic columns.

logical_count is only support initial stage dynamic columns until now. You can use filtered stage dynamic columns in logical_count also from this release.

[logical_count][logical_select] Added a new filter timing.

It's executed after filtered stage generated columns are generated. Specifically, below.

logical_select \
    --logical_table Entries \
    --shard_key created_at \
    --columns[n_likes_sum_per_tag].stage filtered \
    --columns[n_likes_sum_per_tag].type UInt32 \
    --columns[n_likes_sum_per_tag].value 'window_sum(n_likes)' \
    --columns[n_likes_sum_per_tag].window.group_keys 'tag' \
    --filter 'content @ "system" || content @ "use"' \
    --post_filter 'n_likes_sum_per_tag > 10' \
    --output_columns _key,n_likes,n_likes_sum_per_tag

  # [
  #   [
  #     0, 
  #     1519030779.410312,
  #     0.04758048057556152
  #   ], 
  #   [
  #     [
  #       [
  #         2
  #       ], 
  #       [
  #         [
  #           "_key", 
  #           "ShortText"
  #         ], 
  #         [
  #           "n_likes", 
  #           "UInt32"
  #         ], 
  #         [
  #           "n_likes_sum_per_tag", 
  #           "UInt32"
  #         ]
  #       ]
  #       [
  #         "Groonga", 
  #         10, 
  #         25
  #       ], 
  #       [
  #         "Mroonga", 
  #         15, 
  #         25
  #       ]
  #     ]
  #   ]
  # ]

This feature's point that after filtered stage generated columns use in --post_filter. In the above example is logical_select'example, however it's available on the logical_count as well.

[logical_range_filter] Optimize window function for large result set.

If we find enough matched records, we don't apply window function to the remaining windows. Disable this optimization for small result set if its overhead is not negligible.

[select] Added --match_escalation parameter.`

You can force to enable match escalation by --match_escalation yes. It's stronger than --match_escalation_threshold 99999....999 because --match_escalation yes also works with SOME_CONDITIONS && column @ 'query'. --match_escalation_threshold isn't used in this case.

The default is --match_escalation auto. It doesn't change the current behavior.

You can disable match escalation by --match_escalation no. It's the same as --match_escalation_threshold -1.

Fixed memory leak that occurs when a prefix query doesn't match any token.

Fixed a memory leak that occurs when a prefix query doesn't match any token by fuzzy search as below example.

table_create Users TABLE_NO_KEY
[[0,0.0,0.0],true]
column_create Users name COLUMN_SCALAR ShortText
[[0,0.0,0.0],true]
table_create Names TABLE_PAT_KEY ShortText
[[0,0.0,0.0],true]
column_create Names user COLUMN_INDEX Users name
[[0,0.0,0.0],true]
load --table Users
[
{"name": "Tom"},
{"name": "Tomy"},
{"name": "Pom"},
{"name": "Tom"}
]
[[0,0.0,0.0],4]
select Users --filter 'fuzzy_search(name, "Atom", {"prefix_length": 1})'   --output_columns 'name, _score'   --match_escalation_threshold -1
[[0,0.0,0.0],[[[0],[["name","ShortText"],["_score","Int32"]]]]]

Fixed a bug that a cache for different databases is used when multiple databases are opened in the same process.

Fixed a bug that when multiple databases are opened in the same process, results are returned from the cache of another database because the cache was shared within the process.

Fixed a bug that a constant value can overflow or underflow in comparison (>,>=,<,<=,==,!=).

Fixed a bug that a constant value can overflow or underflow in comparison as below example.

table_create Values TABLE_NO_KEY
[[0,0.0,0.0],true]
column_create Values number COLUMN_SCALAR Int16
[[0,0.0,0.0],true]
load --table Values
[
{"number": 3},
{"number": 4},
{"number": -1}
]
[[0,0.0,0.0],3]
select Values   --filter 'number > 32768'   --output_columns 'number'
[[0,1522305525.361629,0.0003235340118408203],[[[3],[["number","Int16"]],[3],[4],[-1]]]]

An overflow occurs, because of 32768 is over the range of Int16 (-32,768 to 32,767). As this time number> 32768 was evaluated as number> - 32768. In this release, when overflow or underflow occurs as above, not to return any results.

Conclusion

See Release 8.0.1 2018-03-29 about detailed changes since 8.0.0

Let's search by Groonga!

2018-02-09

Groonga 8.0.0 has been released

Groonga 8.0.0 has been released!

This is a major version up! But It keeps backward compatibility. You can upgrade to 8.0.0 without rebuilding database.

How to install: Install

Changes

Here are important changes in this release:

  • select added --drilldown_adjuster and --drilldowns[label].adjuster.

  • between Accept between() without borders.

  • Fixed a memory leak for normal hash table.

select added --drilldown_adjuster and --drilldowns[label].adjuster.

Added --drilldown_adjuster and --drilldowns[LABEL].adjuster in select arguments. You can adjust score against result of drilldown.

Specifically, below.

table_create Categories TABLE_PAT_KEY ShortText

table_create Tags TABLE_PAT_KEY ShortText
column_create Tags categories COLUMN_VECTOR|WITH_WEIGHT Categories

table_create Memos TABLE_HASH_KEY ShortText
column_create Memos tags COLUMN_VECTOR Tags

column_create Categories tags_categories COLUMN_INDEX|WITH_WEIGHT \
  Tags categories

load --table Tags
[
{"_key": "groonga", "categories": {"full-text-search": 100}},
{"_key": "mroonga", "categories": {"mysql": 100, "full-text-search": 80}},
{"_key": "ruby", "categories": {"language": 100}}
]

load --table Memos
[
{
  "_key": "Groonga is fast",
  "tags": ["groonga"]
},
{
  "_key": "Mroonga is also fast",
  "tags": ["mroonga", "groonga"]
},
{
  "_key": "Ruby is an object oriented script language",
  "tags": ["ruby"]
}
]

select Memos \
  --limit 0 \
  --output_columns _id \
  --drilldown tags \
  --drilldown_adjuster 'categories @ "full-text-search" * 2 + categories @ "mysql"' \
  --drilldown_output_columns _key,_nsubrecs,_score
[
  [
    0,
    0.0,
    0.0
  ],
  [
    [
      [
        3
      ],
      [
        [
          "_id",
          "UInt32"
        ]
      ]
    ],
    [
      [
        3
      ],
      [
        [
          "_key",
          "ShortText"
        ],
        [
          "_nsubrecs",
          "Int32"
        ],
        [
          "_score",
          "Int32"
        ]
      ],
      [
        "groonga",
        2,
        203
      ],
      [
        "mroonga",
        1,
        265
      ],
      [
        "ruby",
        1,
        0
      ]
    ]
  ]
]

In the above example, we adjust the score of records that have full-text-search or mysql in categories .

between Accept between() without borders.

From this release, max_border and min_border are now optional. If the number of arguments passed to between() is 3, the 2nd and 3rd arguments are handled as the inclusive edges.

Specifically, below.

table_create Users TABLE_HASH_KEY ShortText
column_create Users age COLUMN_SCALAR Int32

table_create Ages TABLE_PAT_KEY Int32
column_create Ages users_age COLUMN_INDEX Users age

load --table Users
[
{"_key": "alice",  "age": 17},
{"_key": "bob",    "age": 18},
{"_key": "calros", "age": 19},
{"_key": "dave",   "age": 20},
{"_key": "eric",   "age": 21}
]

select Users --filter 'between(age, 18, 20)'
[
  [
    0,
    0.0,
    0.0
  ],
  [
    [
      [
        3
      ],
      [
        [
          "_id",
          "UInt32"
        ],
        [
          "_key",
          "ShortText"
        ],
        [
          "age",
          "Int32"
        ]
      ],
      [
        2,
        "bob",
        18
      ],
      [
        3,
        "calros",
        19
      ],
      [
        4,
        "dave",
        20
      ]
    ]
  ]
]

Fixed a memory leak for normal hash table.

Fixed a bug that you sometimes can not connect to groonga just by continuing to send queries.

Conclusion

See Release 8.0.0 2018-02-09 about detailed changes since 7.1.1

Let's search by Groonga!

2018-01-29

Groonga 7.1.1 has been released

Groonga 7.1.1 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • Added quorum match support.

  • filter Added custom similarity threshold support in script syntax.

  • grndb recover Added --force-lock-clear option.

  • load added surrogate pairs support in escape syntax.

  • Added environment variable to disable reducing expire.

  • logical_range_filter Added a new filter timing.

Added quorum match support.

You can use quorum match in both script syntax and query syntax. Quorum match use when fuzzy search. Matches records with tokens that exceed the setted threshold. For example, if "I have a pen" is splitted four token, matches records with any three or more of these tokens will match.

Specifically, below.

--filter column *Q${THRESHOLD} "I have a pen"

--query *Q${THRESHOLD}"I have a pen"

filter Added custom similarity threshold support in script syntax.

You can similarity retrieval with use custom similarity threshold as below. Similarity retrieval is feature for search a similar "document" as below.

--filter column *S${SIMILARITY_THRESHOLD} "document"

grndb recover Added --force-lock-clear option.

This option, grndb forces to clear locks of database, tables and data columns. You can use your database again even if locks are remained in database, tables and data columns.

If your database is broken, your database is still broken. This option just ignores locks.

Specifically, below.

 % grndb recover --force-lock-clear DB_PATH

load added surrogate pairs support in escape syntax.

You can use surrogate pairs in escape syntax in load. For example, \\uD83C\\uDF7A is processed as 🍺.

Added environment variable to disable reducing expire.

GRN_II_REDUCE_EXPIRE_ENABLE=no disables. It's enabled by default. ``

logical_range_filter Added a new filter timing.

You can executed filter again after filtered stage generated columns are generated.

Conclusion

See Release 7.1.1 2018-01-29 about detailed changes since 7.1.0

Let's search by Groonga!

2017-12-29

Groonga 7.1.0 has been released

Groonga 7.1.0 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • load Improved the load's query-log format.

  • logical_count Improved the logical_count's query-log format.

  • logical_select Improve the logical_select's query-log format.

  • delete Improved the delete's query-log format.

  • Supported vector for drilldown calc target.

  • [bulk] Reduced the number of realloc().

  • Added new function index_column_source_records.

load Improved the load's query-log format

Added detail below items in the load's query-log.

  • outputs number of loaded records.
  • outputs number of error records and columns.
  • outputs number of total records.

Specifically, below.

2017-12-29 15:23:47.049299|0x7ffe8af29a50|:000000001209848 load(3): [1][2][3]
2017-12-29 15:23:47.049311|0x7ffe8af29a50|<000000001221494 rc=-22

The number in () of after the load is number of loaded records. The number in first [] is number of error columns. The number in second [] is number of error records. The number in third [] is number of total records.

logical_countImproved the logical_count's query-log format.

Added detail below items in the logical_count's query-log.

  • outputs number of count.

Specifically, below.

2017-12-29 15:25:06.068077|0x7fffedde8460|:000000001276405 count(2)
2017-12-29 15:25:06.068107|0x7fffedde8460|<000000001305264 rc=0

The number in () of after the count is number of count.

logical_select Improve the logical_select's query-log format.

Added detail below items in the logical_select's query-log.

  • log N outputs.
  • outputs plain drilldown.
  • outputs labeled drilldown.

Specifically, below.

2017-12-29 15:19:53.703472|0x7ffe0ce4e650|:000000001372833 filter(1)
2017-12-29 15:19:53.703499|0x7ffe0ce4e650|:000000001397623 select(1)[Logs_20170315]
2017-12-29 15:19:53.703796|0x7ffe0ce4e650|:000000001695440 filter(2)
2017-12-29 15:19:53.703813|0x7ffe0ce4e650|:000000001711123 select(2)[Logs_20170316]
2017-12-29 15:19:53.704024|0x7ffe0ce4e650|:000000001923225 filter(2)
2017-12-29 15:19:53.704040|0x7ffe0ce4e650|:000000001937931 select(2)[Logs_20170317]
2017-12-29 15:19:53.704198|0x7ffe0ce4e650|:000000002096788 output(5)
2017-12-29 15:19:53.704354|0x7ffe0ce4e650|<000000002253133 rc=0

The number in () of after the select is number of matched records or result of plain drilldown, result of labeled drilldown. These number's meaning is differernt by executed query. These numbers is displayed each shard. The above example have three shard. So, Three selects are displyed. At the trailing [], the table name searched is displayed.

deleteImproved the delete's query-log format.

Added detail below items in the delete's query-log.

  • outputs number of deleted and error records.
  • outputs number of rest number of records.

The number in () of after the delete is number of deleted records. The number in first [] is number of error records. The number in second [] is rest number of records.

Supported vector for drilldown calc target

You can drilldown against vector columns. As below, you can specify vecotr column in drilldown_calc_target. So, you can get min and max and sum, average with elements of vector columns.

table_create Tags TABLE_PAT_KEY ShortText

table_create Memos TABLE_HASH_KEY ShortText
column_create Memos tag COLUMN_SCALAR Tags
column_create Memos scores COLUMN_VECTOR Int64

load --table Memos
[
{"_key": "Groonga1", "tag": "Groonga", "scores": [10, 29]},
{"_key": "Groonga2", "tag": "Groonga", "scores": [20]},
{"_key": "Groonga3", "tag": "Groonga", "scores": [60, 71]},
{"_key": "Mroonga1", "tag": "Mroonga", "scores": [61, 62, 63]},
{"_key": "Mroonga2", "tag": "Mroonga", "scores": [24, 20, 16]},
{"_key": "Mroonga3", "tag": "Mroonga", "scores": [8, 5, 2]},
{"_key": "Rroonga1", "tag": "Rroonga", "scores": [3]},
{"_key": "Rroonga2", "tag": "Rroonga", "scores": [-9, 0, 9]},
{"_key": "Rroonga3", "tag": "Rroonga", "scores": [0]}
]

When you execute below query against the above table, you can get min and max and sum, average in the below groups.

  • Group with Groonga in tag.
  • Group with Mroonga in tag.
  • Group with Rroonga in tag.
select Memos \
  --limit 0 \
  --drilldowns[tag].keys tag \
  --drilldowns[tag].calc_types 'MAX, MIN, SUM, AVG' \
  --drilldowns[tag].calc_target scores \
  --drilldowns[tag].output_columns _key,_max,_min,_sum,_avg

It is the value of _key, _max, _min, _sum, _avg from left.

["Groonga", 71, 10, 190, 38.0],
["Mroonga", 63, 2, 261, 29.0],
["Rroonga", 9, -1, 3, 0.6],

[bulk] Reduced the number of realloc()

It improves performance for large output case on Windows. For example, it causes 100x faster for 100MB over output.

Added new function index_column_source_records

As below, this function gets source records of index column.

plugin_register functions/index_column

table_create Memos TABLE_HASH_KEY ShortText

table_create Terms TABLE_PAT_KEY ShortText \
  --default_tokenizer TokenBigram \
  --normalizer NormalizerAuto
column_create Terms index COLUMN_INDEX|WITH_POSITION Memos _key

load --table Memos
[
{"_key": "Groonga is a fast full text search engine."},
{"_key": "Mroonga is a MySQL storage engine based on Groonga."},
{"_key": "Rroonga is a Ruby bindings for Groonga."}
]

When you execute query in the below against the above table, you can get record which has token in registered in the Terms table.

select Terms \
  --limit -1 \
  --sort_keys _id \
  --columns[index_records].stage output \
  --columns[index_records].type Memos \
  --columns[index_records].flags COLUMN_VECTOR \
  --columns[index_records].value 'index_column_source_records("index")' \
  --output_columns '_id, _key, index_records'

The rightmost value is the result of index_column_source_records.

[ 1, "groonga", [ "Groonga is a fast full text search engine.", "Mroonga is a MySQL storage engine based on Groonga.", "Rroonga is a Ruby bindings for Groonga." ] ],
[ 2, "is", [ "Groonga is a fast full text search engine.", "Mroonga is a MySQL storage engine based on Groonga.", "Rroonga is a Ruby bindings for Groonga." ] ],
(Abbreviation)

Conclusion

See Release 7.1.0 2017-12-29 about detailed changes since 7.0.9

Let's search by Groonga!