BloGroonga

2019-08-05

Groonga 9.0.6 has been released

Groonga 9.0.6 has been released!

How to install: Install

IMPORTANT NOTICE

This release contains fixes for some critical bugs which affect search result.

Changes

Here are important changes in this release:

  • Fixed a bug that a search error occurs when search escalation is executed.

  • Fixed a bug that slices returns the records which should not be matched when we use nested equal condition.

  • Added support for Debian 10 (buster)

Fixed a bug that an search error occurs when search escalation is executed

In this release, the search escalation related bug was fixed. This bug causes error and it didn't return matched records.

This bug is caused when the following conditions are met:

  • The lexcon table is TABLE_HASH_KEY
  • Use @ operator
  • Search escalation is occurred

Fixed a bug that slices returns the records which should not be matched when we use nested equal condition

In this release, slices parameter related bug was fixed. This bug may return the records which should not be matched. This bug is caused when the following conditions is met:

  • Use select command with slices parameter.

Added support for Debian 10 (buster)

In this release, Debian 10 (buster - Released at July 6, 2019) has been supported. Now, you can install Groonga in similar way on Debian 9 (stretch).

Conclusion

See Release 9.0.6 2019-08-05 about detailed changes since 9.0.5

Let's search by Groonga!

2019-07-30

Groonga 9.0.5 has been released

Groonga 9.0.5 has been released!

How to install: Install

IMPORTANT NOTICE

After Groonga 9.0.5 has been released, some critical bugs are found which affects search results. We will release the new version which fixes the following bugs. Please do not use Groonga 9.0.5, and recommends to upgrade to fixed version in the future.

Here are the found bugs:

  • The search query causes error and it doesn't return matched records. This bug is caused when the following conditions are met.

    • The lexcon table is TABLE_HASH_KEY
    • Use @ operator
    • Search escalation is occurred
  • slices returns the records which should not be matched.

    • Use select command with slices parameter.

Changes

Here are important changes in this release:

  • logical_range_filter Improved that only apply an optimization when the search target shard is large enough.

  • normalizers Added new option unify_to_katakana for NormalizerNFKC100.

  • select Added drilldowns support as a slices parameter.

  • select Added columns support as a slices parameter.

  • select Improved that we can reference _score in the initial stage for slices parameter.

  • highlight_html, snippet_html Improved that extract a keyword also from an expression of before executing a slices when we specify the slices parameter.

  • Improved that collect scores also from an expression of before executing a slices when we specify the slices parameter.

  • Stopped add 1 in score automatically when add posting to posting list.

  • Added support for index search for nested equal like XXX.YYY.ZZZ == AAA.

  • Reduce rehash interval when we use hash table.

  • Improved to we can add tag prefix in the query log.

  • Added support for Apache Arrow 1.0.0.

  • Added support for Amazon Linux 2.

  • Fixed a bug that vector values of JSON like "[1, 2, 3]" are not indexed.

  • Fixed wrong parameter name in table_create tests.

  • Fixed a bug that drilldown label is empty when a drilldown command is executed by command_version=3.

  • Fixed build error for Windows package on MinGW.

  • Fixed install missing COPYING for Windows package on MinGW.

  • Fixed a bug that don't highlight when specifing non-test query as highlight target keyword.

  • Fixed a bug that broken output of MessagePack format of object_inspect.

  • Fixed a bug that broken output of MessagePack format of index_column_diff.

  • Fixed a bug that broken output of MessagePack format of suggest.

  • Fixed a bug that allocate size by realloc isn't enough when a search for a table of patricia trie and so on.

  • Fix a bug that groonga.repo is removed when updating 1.5.0 from groonga-release version before 1.5.0-1.

logical_range_filter Improved that only apply an optimization when the search target shard is large enough.

This feature reduces that duplicate search result between offset when we use same sort key. Large enough threshold is 10000 records by default.

normalizers Added new option unify_to_katakana for NormalizerNFKC100

This option normalize hiragana to katakana. For example, ぁ ぃ ぇ ぉ` is normalized toヴァヴィヴヴェヴォ``.

We can identify below terms by unify_to_katakana and unify_katakana_v_sounds.

  • ゔぁゔぃゔゔぇゔぉ
  • ばびぶべぼ
  • ヴァヴィヴヴェヴォ
  • バビブベボ

  • First, we apply unify_to_katakana.

    • ゔぁゔぃゔゔぇゔぉ -> ヴァヴィヴヴェヴォ
    • ばびぶべぼ -> バビブベボ
    • ヴァヴィヴヴェヴォ -> ヴァヴィヴヴェヴォ
    • バビブベボ -> バビブベボ
  • Second, we apply unify_katakana_v_sounds.

    • ヴァヴィヴヴェヴォ -> バビブベボ
    • バビブベボ -> バビブベボ

Conclusion

See Release 9.0.5 2019-07-30 about detailed changes since 9.0.4

Let's search by Groonga!

2019-06-29

Groonga 9.0.4 has been released

Groonga 9.0.4 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • Added support for array literal with multiple elements.

  • Added support equivalence operation of a vector.

  • logical_range_filter Increase outputting logs into query log.

  • grndb Added support new option --since

  • query Added default_operator.

  • [optimizer] Fix a bug that execution error when specified multiple filter conditions and like xxx.yyy=="keyword".

  • Added missing LICENSE files in Groonga package for Windows(VC++ version).

  • Added UCRT runtime into Groonga package for Windows(VC++ version).

  • window_function Fix a memory leak.

    • This occurs when multiple windows with sort keys are used.

Added support for array literal with multiple elements.

We can use array literal with multiple elements into filter condition as below.

table_create Values TABLE_NO_KEY

column_create Values numbers COLUMN_VECTOR Int32

load --table Values
[
{"numbers": [2, 1, 3]},
{"numbers": [2, 3, 4]},
{"numbers": [8, 9, -1]}
]

select Values  \
  --filter 'numbers == [2, 3, 4]'  \
  --output_columns 'numbers'
[[0,0.0,0.0],[[[1],[["numbers","Int32"]],[[2,3,4]]]]]

Added support equivalence operation of a vector.

We can use equivalencs operation for a vector as below.

table_create Values TABLE_NO_KEY

column_create Values numbers COLUMN_VECTOR Int32

load --table Values
[
{"numbers": [2, 1, 3]},
{"numbers": [2, 3, 4]},
{"numbers": [8, 9, -1]}
]

select Values  \
  --filter 'numbers == [2, 3, 4]'  \
  --output_columns 'numbers'
[[0,0.0,0.0],[[[1],[["numbers","Int32"]],[[2,3,4]]]]]

logical_range_filter Increase outputting logs into query log.

logical_range_filter command comes to output a log for below timing.

  • After filtering by logical_range_filter.
  • After sorting by logical_range_filter.
  • After applying dynamic column.
  • After output results.

We can see how much has been finished this command by this feature.

grndb Added support new option --since

We can specify a scope of an inspection.

We can specify the modified time as ISO 8601 format or -NUNIT format such as -3days or -2.5weeks format.

Here is an example that specifies –since option in ISO 8601 format:

% grmdb check --since=2019-06-24T18:16:22 /var/lib/groonga/db/db

In above example, the objects which are modified after 2019-06-24T18:16:22 are checked.

Here is an example that specifies –since option in -NUNIT format:

% grmdb check --since=-7d /var/lib/groonga/db/db

In above example, the objects which are modified in recent 7 days are checked.

Please also refer to grndb#since.

query Added default_operator.

We can customize operator when "keyword1 keyword2". "keyword1 Keyword2" is AND operation in default.

We can change "keyword1 keyword2"'s operator except AND as below.

table_create Products TABLE_NO_KEY

column_create Products name COLUMN_SCALAR ShortText

load --table Products
[
["name"],
["Groonga"],
["Mroonga"],
["Rroonga"],
["PGroonga"],
["Ruby"],
["PostgreSQL"]
]

select \
  --table Products \
  --filter 'query("name", "Groonga Mroonga", {"default_operator": "OR"})'
[
  [
    0,
    0.0,
    0.0
  ],
  [
    [
      [
        3
      ],
      [
        [
          "_id",
          "UInt32"
        ],
        [
          "name",
          "ShortText"
        ]
      ],
      [
        1,
        "Groonga"
      ],
      [
        4,
        "PGroonga"
      ],
      [
        2,
        "Mroonga"
      ]
    ]
  ]
]

Conclusion

See Release 9.0.4 2019-06-29 about detailed changes since 9.0.3

Let's search by Groonga!

2019-05-29

Groonga 9.0.3 has been released

Groonga 9.0.3 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • select Added more query logs.

  • logical_select Added more query logs.

  • logical_select Improved performance of sort a little when we use limit option.

  • [index_column_diff] Improved performance.

  • [Normalizers] Added a new Normalizer NormalizerNFKC121 based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 12.1.

  • [TokenFilters] Added a new TokenFilter TokenFilterNFKC121 based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 12.1.

  • grndb Added a new option --log-flags

  • snippet_html Added a new option for changing a return value when no match by search.

  • plugin_unregister Added support full path of Windows.

  • Added support for multiline log message.

  • Output key in Groonga's log when we search by index.

  • document for mutch_columns Added a document for indexes with weight.

  • document for logical_range_filter Added a explanation for order parameter.

  • document for object_inspect Added an explanation for new statistics INDEX_COLUMN_VALUE_STATISTICS_NEXT_PHYSICAL_SEGMENT_ID and INDEX_COLUMN_VALUE_STATISTICS_N_PHYSICAL_SEGMENTS.

  • Dropped Ubuntu 14.04 support.

  • [index_column_diff] Fixed a bug that too much remains are reported.

  • Fixed a build error when we use --without-onigmo option.

  • Fixed a vulnerability of "CVE: 2019-11675".

  • Removed extended path prefix \\?\ at Windows version of Groonga.

    • This extended prefix causes a bug that plugin can't be found correctly.

select Added more query logs.

select command comes to output a log for below timing.

  • After sorting by drilldown.
  • After filter by drilldown.

We can see how much has been finished this command by this feature.

logical_select Added more query logs.

logical_select command comes to output a log for below timing.

  • After making dynamic columns.
  • After grouping by drilldown.
  • After sorting by drilldown.
  • After filter by drilldown.
  • After sorting by logical_select.

We can see how much has been finished this command by this feature.

[index_column_diff] Improved performance.

We have greatly shortened the execution speed of this command.

Depends on data, this command comes to execute for about ten to a hundred times as speed as before and also decrease using memory.

This command became practical enough because of this improvement.

We can see how to use this command in Groonga 9.0.1 has been released.

grndb Added a new option --log-flags

We can specify output items of a log as with groonga executable file.

See [groonga executable file](/docs/reference/executables/groonga#cmdoption-groonga-log-flags] to know about supported log flags.

For example, we can specify return value to "[]" when no match by a search as below.

table_create Documents TABLE_HASH_KEY ShortText
[[0,0.0,0.0],true]
column_create Documents content COLUMN_SCALAR Text
[[0,0.0,0.0],true]
table_create Terms TABLE_PAT_KEY|KEY_NORMALIZE ShortText --default_tokenizer TokenBigram
[[0,0.0,0.0],true]
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Documents content
[[0,0.0,0.0],true]
load --table Documents
[
["_key", "content"],
["Groonga", "Groonga can be used with MySQL."]
]
[[0,0.0,0.0],1]
select Documents   --match_columns content --query 'MySQL'   --output_columns '_key, snippet_html(_key, {"default": []})'
[
  [
    0,
    0.0,
    0.0
  ],
  [
    [
      [
        1
      ],
      [
        [
          "_key",
          "ShortText"
        ],
        [
          "snippet_html",
          null
        ]
      ],
      [
        "Groonga",
        [

        ]
      ]
    ]
  ]
]

Conclusion

See Release 9.0.3 2019-05-29 about detailed changes since 9.0.2

Let's search by Groonga!

2019-04-29

Groonga 9.0.2 has been released

Groonga 9.0.2 has been released!

We provide a package for Windows made from VC++ from this release.

We also provide a package for Windows made form MinGW as in the past.

However, we will provide it made from VC++ instead of making from MinGW sooner or later.

How to install: Install

Changes

Here are important changes in this release:

  • column_create Added a new flag INDEX_LARGE for index column.

  • object_inspect Added a new statistics next_physical_segment_id and max_n_physical_segments for physical segment information.

  • logical_select Added support for window function over shard.

  • logical_range_filter Added support for window function over shard.

  • logical_count Added support for window function over shard.

  • io_flush Added a new option --recursive dependent

  • Fixed "unknown type name 'bool'" compilation error in some environments.

  • Fixed a bug that incorrect output number over Int32 by command of execute via mruby (e.g. logical_select, logical_range_filter, logical_count, etc.).

column_create Added a new flag INDEX_LARGE for index column.

We can make an index column has space that two times of default by this flag. However, note that it also uses two times of memory usage.

This flag useful when index target data are large. Large data must have many records (normally at least 10 millions records) and at least one of the following features.

  • Index targets are multiple columns
  • Index table has tokenizer

Here is an example to create a large index column:

  column_create \
  --table Terms \
  --name people_roles_large_index \
  --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION|INDEX_LARGE \
  --type People \
  --source roles
  [[0, 1337566253.89858, 0.000355720520019531], true]

object_inspect Added a new statistics next_physical_segment_id and max_n_physical_segments for physical segment information.

next_physical_segment_id is the ID of the segment to the inspected index column use next. That is this number shows currently the usage of the segment.

max_n_physical_segments is the max number of the segments to the inspected index column.

The max number of these statistics depend on index column size:

Index column size The max number of segments
INDEX_SMALL 2**9 (512)
INDEX_MEDIUM 2**16 (65536)
INDEX_LARGE 2**17 * 2 (262144)
Default 2**17 (131072)

logical_select Added support for window function over shard.

We can apply the window function to over multiple tables. However, we need to align the same order for shard key and leading group key or sort key.

For example, we can apply the window function to over multiple tables as below case. Because the below example aligns the same order for shard key and leading group key.

The leading group key is price and shard key is timestamp in the below example:

  plugin_register sharding
  
  table_create Logs_20170415 TABLE_NO_KEY
  column_create Logs_20170415 timestamp COLUMN_SCALAR Time
  column_create Logs_20170415 price COLUMN_SCALAR UInt32
  column_create Logs_20170415 n_likes COLUMN_SCALAR UInt32
  
  table_create Logs_20170416 TABLE_NO_KEY
  column_create Logs_20170416 timestamp COLUMN_SCALAR Time
  column_create Logs_20170416 price COLUMN_SCALAR UInt32
  column_create Logs_20170416 n_likes COLUMN_SCALAR UInt32
  
  load --table Logs_20170415
  [
  {"timestamp": "2017/04/15 00:00:00", "n_likes": 2, "price": 100},
  {"timestamp": "2017/04/15 01:00:00", "n_likes": 1, "price": 100},
  {"timestamp": "2017/04/15 01:00:00", "n_likes": 2, "price": 200}
  ]
  
  load --table Logs_20170416
  [
  {"timestamp": "2017/04/16 10:00:00", "n_likes": 1, "price": 200},
  {"timestamp": "2017/04/16 11:00:00", "n_likes": 2, "price": 300},
  {"timestamp": "2017/04/16 11:00:00", "n_likes": 1, "price": 300}
  ]
  
  logical_select Logs \
    --shard_key timestamp \
    --columns[count].stage initial \
    --columns[count].type UInt32 \
    --columns[count].flags COLUMN_SCALAR \
    --columns[count].value 'window_count()' \
    --columns[count].window.group_keys price \
    --output_columns price,count
  [
    [
      0,
      0.0,
      0.0
    ],
    [
      [
        [
          6
        ],
        [
          [
            "price",
            "UInt32"
          ],
          [
            "count",
            "UInt32"
          ]
        ],
        [
          100,
          2
        ],
        [
          100,
          2
        ],
        [
          200,
          2
        ],
        [
          200,
          2
        ],
        [
          300,
          2
        ],
        [
          300,
          2
        ]
      ]
    ]
  ]

logical_range_filter Added support for window function over shard.

We can apply the window function to over multiple tables. However, we need to align the same order for shard key and leading group key or sort key as with logical_select.

Here is an example to apply the window function to over multiple tables by logical_range_filter:

  plugin_register sharding
  
  table_create Logs_20170415 TABLE_NO_KEY
  column_create Logs_20170415 timestamp COLUMN_SCALAR Time
  column_create Logs_20170415 price COLUMN_SCALAR UInt32
  column_create Logs_20170415 n_likes COLUMN_SCALAR UInt32
  
  table_create Logs_20170416 TABLE_NO_KEY
  column_create Logs_20170416 timestamp COLUMN_SCALAR Time
  column_create Logs_20170416 price COLUMN_SCALAR UInt32
  column_create Logs_20170416 n_likes COLUMN_SCALAR UInt32
  
  load --table Logs_20170415
  [
  {"timestamp": "2017/04/15 00:00:00", "n_likes": 2, "price": 100},
  {"timestamp": "2017/04/15 01:00:00", "n_likes": 1, "price": 100},
  {"timestamp": "2017/04/15 01:00:00", "n_likes": 2, "price": 200}
  ]
  
  load --table Logs_20170416
  [
  {"timestamp": "2017/04/16 10:00:00", "n_likes": 1, "price": 200},
  {"timestamp": "2017/04/16 11:00:00", "n_likes": 2, "price": 300},
  {"timestamp": "2017/04/16 11:00:00", "n_likes": 1, "price": 300}
  ]
  
  logical_range_filter Logs \
    --shard_key timestamp \
    --columns[count].stage initial \
    --columns[count].type UInt32 \
    --columns[count].flags COLUMN_SCALAR \
    --columns[count].value 'window_count()' \
    --columns[count].window.group_keys price \
    --output_columns price,count
  [
    [
      0,
      0.0,
      0.0
    ],
    [
      [
        [
          6
        ],
        [
          [
            "price",
            "UInt32"
          ],
          [
            "count",
            "UInt32"
          ]
        ],
        [
          100,
          2
        ],
        [
          100,
          2
        ],
        [
          200,
          2
        ],
        [
          200,
          2
        ],
        [
          300,
          2
        ],
        [
          300,
          2
        ]
      ]
    ]
  ]

logical_count Added support for window function over shard.

We can apply the window function to over multiple tables. However, we need to align the same order for shard key and leading group key or sort key as with logical_select.

Here is an example to apply the window function to over multiple tables by logical_count:

  plugin_register sharding
  
  table_create Logs_20170415 TABLE_NO_KEY
  column_create Logs_20170415 timestamp COLUMN_SCALAR Time
  column_create Logs_20170415 price COLUMN_SCALAR UInt32
  column_create Logs_20170415 n_likes COLUMN_SCALAR UInt32
  
  table_create Logs_20170416 TABLE_NO_KEY
  column_create Logs_20170416 timestamp COLUMN_SCALAR Time
  column_create Logs_20170416 price COLUMN_SCALAR UInt32
  column_create Logs_20170416 n_likes COLUMN_SCALAR UInt32
  
  load --table Logs_20170415
  [
  {"timestamp": "2017/04/15 00:00:00", "n_likes": 2, "price": 100},
  {"timestamp": "2017/04/15 01:00:00", "n_likes": 1, "price": 100},
  {"timestamp": "2017/04/15 01:00:00", "n_likes": 2, "price": 200}
  ]
  
  load --table Logs_20170416
  [
  {"timestamp": "2017/04/16 10:00:00", "n_likes": 1, "price": 200},
  {"timestamp": "2017/04/16 11:00:00", "n_likes": 2, "price": 300},
  {"timestamp": "2017/04/16 11:00:00", "n_likes": 1, "price": 300}
  ]
  
  logical_count Logs \
    --shard_key timestamp \
    --columns[count].stage initial \
    --columns[count].type UInt32 \
    --columns[count].flags COLUMN_SCALAR \
    --columns[count].value 'window_count()' \
    --columns[count].window.group_keys price \
    --filter 'count >= 1'
  [
    [
      0,
      0.0,
      0.0
    ],
    [
      4
    ]
  ]

io_flush Added a new option --recursive dependent

We can flush not only target object and child objects, but also related objects by this option.

The related objects are:

  • A referenced table
  • A related index column (There is source column in target TABLE_NAME)
  • A table of related index column (There is source column in target TABLE_NAME)

Here is an example to use this option:

  io_flush --recursive "dependent" --target_name "Users"

Conclusion

See Release 9.0.2 2019-04-29 about detailed changes since 9.0.1

Let's search by Groonga!