BloGroonga

2018-09-29

Groonga 8.0.7 has been released

Groonga 8.0.7 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • New options for the TokenMecab tokenizer.
  • New options for the TokenNgram tokenizer.
  • Groonga now can grab plugins from multiple directories.

New options for the TokenMecab tokenizer

TokenMecab now accepts these new options:

  • include_class: outputs MeCab's metadata class and subclass.
  • include_reading: outputs MeCab's metadata reading.
  • include_form: outputs MeCab's metadata inflected_type, inflected_form, and base_form.
  • use_reading: allows to search terms by corresponding reading written in kana. This option will help you to search orthographical variants by kana.

For more details, see the reference.

New options for the TokenNgram tokenizer

TokenNgram now accepts these new options:

  • unify_alphabet: TokenNgram("unify_alphabet", false) will work same as TokenBigramSplitAlpha.
  • unify_symbol: TokenNgram("unify_symbol", false) will work same as TokenBigramSplitSymbol.
  • unify_digit: TokenNgram("unify_digit", false) will work same as TokenBigramSplitDigit.

For more details, see the reference.

Groonga now can grab plugins from multiple directories

A new environent variable GRN_PLUGINS_PATH is available to detect plugins from multiple directories. It is a list of path to directories, separated with ; (on Windows) or : (on other platforms).

GRN_PLUGINS_PATH has priority higher than the existing GRN_PLUGINS_DIR.

Note that this does not work on Windows currently.

Conclusion

See Release 8.0.7 2018-09-29 about detailed changes since 8.0.6

Let's search by Groonga!

2018-08-29

Groonga 8.0.6 has been released

Groonga 8.0.6 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • Optimizer is a built-in feature now.
  • Enable sequential search for enough filtered case by default.
  • load command now supports the lock_table option.

Optimizer is a built-in feature now

The optimizer plugin has became a built-in feature. It is disabled by default, and you need to set GRN_EXPR_OPTIMIZE=yes (or use the expression_rewriters plugin as before) for activation.

Enable sequential search for enough filtered case by default

Groonga now finds results by sequential search, from enough narrowed results. It may be faster than regular index search for very narrowed results, like less than 1000 records which is 1% of all results.

You can disable this feature by setting an environment variable: GRN_TABLE_SELECT_ENOUGH_FILTERED_RATIO=0.0

load command now supports the lock_table option

Now load --lock_table yes command line loads data with locking of the target table, while updating columns and applying --each. It avoids conflict of load and delete, but it will reduce the performance of loading.

Conclusion

See Release 8.0.6 2018-08-29 about detailed changes since 8.0.5

Let's search by Groonga!

2018-07-29

Groonga 8.0.5 has been released

Groonga 8.0.5 has been released!

How to install: Install

Changes

Here are important changes in this release:

Added a new function time_classify_day_of_week()

Now a new feature become available to get the day of week from time information of each search result. A new function time_classify_day_of_week() which accept an time value as its only one parameter, will return the day of week from the given time value. It returns a UInt8 value, 0 means Sunday and 6 means Saturday.

Before using the function, you need to register functions/time plugin at first, via a command line like:

plugin_register functions/time

Added a new function time_format_iso8601()

Now you can get any time value formatted in the ISO 8601 form. A new function time_format_iso8601() which accept an time value as its only one parameter, will return a string formatted in the ISO 8601 form, like 2018-07-29T23:59:59.999999+09:00.

Same as the previous one, you need to register functions/time plugin to use this function.

Dropped support of old Ubuntu and Debian versions

As you know, both Ubuntu 17.10 (Artful Aardvark) and Debian jessie are now outdated. Groonga 8.0.5 and later won't be released for those old versions.

Conclusion

See Release 8.0.5 2018-07-29 about detailed changes since 8.0.4

Let's search by Groonga!

2018-06-29

Groonga 8.0.4 has been released

Groonga 8.0.4 has been released!

How to install: Install

Changes

Here are important changes in this release:

Added more validations for column_create

Added a new function vector_find()

It returns the first element that matches the given condition from the given vector. See the document for details.

7.15.29. vector_find — Groonga v8.0.4 documentation

Future of Debian jessie support plan

As you know, last Debian jessie point release had been released.

So, we will plan to stop providing newer Groonga packages for Debian jessie from next month. We recommend to upgrade to Debian stretch.

Conclusion

See Release 8.0.4 2018-06-29 about detailed changes since 8.0.3

Let's search by Groonga!

2018-05-29

Groonga 8.0.3 has been released

Groonga 8.0.3 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • [highlight_html] Support highlight of results of the search by NormalizerNFKC100 or TokenNgram.
  • [normalizers] Added new option for NormalizerNFKC100 that unify_middle_dot option.
  • [normalizers] Added new option for NormalizerNFKC100 that unify_katakana_v_sounds option.
  • [normalizers] Added new option for NormalizerNFKC100 that unify_katakana_bu_sound option.
  • [sub_filter] Supported sub_filter optimization for the too filter case.
  • [delete] Added new options that limit.
  • [normalizers] Fixed a bug that FULLWIDTH LATIN CAPITAL LETTERs such as U+FF21 FULLWIDTH LATIN CAPITAL LETTER A aren't normalized to LATIN SMALL LETTERs such as U+0061 LATIN SMALL LETTER A. If you have been used NormalizerNFKC100 , you must recreate your indexes.

[highlight_html] Support highlight of results of the search by NormalizerNFKC100 or TokenNgram

You can highlight of keyword that searched by NormalizerNFKC100 or TokenNgram as below example.

table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText   --default_tokenizer 'TokenNgram("report_source_location", true)'   --normalizer 'NormalizerNFKC100'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
load --table Entries
[
{"body": "ア㌕Az"}
]
[[0,0.0,0.0],1]
select Entries   --match_columns body   --query 'グラム'   --output_columns 'highlight_html(body, Terms)'
[
  [
    0,
    0.0,
    0.0
  ],
  [
    [
      [
        1
      ],
      [
        [
          "highlight_html",
          null
        ]
      ],
      [
        "ア<span class=\"keyword\">㌕</span>Az"
      ]
    ]
  ]
]

[normalizers] Added new option for NormalizerNFKC100 that unify_middle_dot option

This option normalizes middle dot as below example.

normalize   'NormalizerNFKC100("unify_middle_dot", true)'   "·ᐧ•∙⋅⸱・・"   WITH_TYPES
[
  [
    0,
    0.0,
    0.0
  ],
  {
    "normalized": "········",
    "types": [
      "symbol",
      "symbol",
      "symbol",
      "symbol",
      "symbol",
      "symbol",
      "symbol",
      "symbol"
    ],
    "checks": [

    ]
  }
]

You can search with or without (middle dot) and regardless of (middle dot) position by this option.

[normalizers] Added new option for NormalizerNFKC100 that

unify_katakana_v_sounds option

This option normalizes ヴァヴィヴヴェヴォ (katakana) to バビブベボ (katakana) as below example.

normalize   'NormalizerNFKC100("unify_katakana_v_sounds", true)'   "ヴァヴィヴヴェヴォヴ"   WITH_TYPES
[
  [
    0,
    0.0,
    0.0
  ],
  {
    "normalized": "バビブベボブ",
    "types": [
      "katakana",
      "katakana",
      "katakana",
      "katakana",
      "katakana",
      "katakana"
    ],
    "checks": [

    ]
  }
]

For example, you can search バイオリン (violin) in ヴァイオリン (violin).

[normalizers] Added new option for NormalizerNFKC100 that

unify_katakana_bu_sound option

This option normalizes ヴァヴィヴゥヴェヴォ (katakana) to (katakana) as below example.

normalize   'NormalizerNFKC100("unify_katakana_bu_sound", true)'   "ヴァヴィヴヴェヴォヴ"   WITH_TYPES
[
  [
    0,
    0.0,
    0.0
  ],
  {
    "normalized": "ブブブブブブ",
    "types": [
      "katakana",
      "katakana",
      "katakana",
      "katakana",
      "katakana",
      "katakana"
    ],
    "checks": [

    ]
  }
]

For example, you can search セーブル (katakana) and セーヴル (katakana) in セーヴェル (katakana).

[sub_filter] Supported sub_filter optimization for the too filter case

For example,this optimize is valid when records are enough narrowed down before sub_filter execution as below.

table_create Files TABLE_PAT_KEY ShortText
column_create Files revision COLUMN_SCALAR UInt32

table_create Packages TABLE_PAT_KEY ShortText
column_create Packages files COLUMN_VECTOR Files

column_create Files packages_files_index COLUMN_INDEX Packages files

table_create Revisions TABLE_PAT_KEY UInt32
column_create Revisions files_revision COLUMN_INDEX Files revision

load --table Files
[
{"_key": "include/groonga.h", "revision": 100},
{"_key": "src/groonga.c",     "revision": 29},
{"_key": "lib/groonga.rb",    "revision": 12},
{"_key": "README.textile",    "revision": 24},
{"_key": "ha_mroonga.cc",     "revision": 40},
{"_key": "ha_mroonga.hpp",    "revision": 6}
]

load --table Packages
[
{"_key": "groonga", "files": ["include/groonga.h", "src/groonga.c"]},
{"_key": "rroonga", "files": ["lib/groonga.rb", "README.textile"]},
{"_key": "mroonga", "files": ["ha_mroonga.cc", "ha_mroonga.hpp"]}
]

select Packages \
  --filter '_key == "rroonga" && \
            sub_filter(files, "revision >= 10 && revision < 40")' \
  --output_columns '_key, files, files.revision'

[delete] Added new options that limit

You can limit the number of deleting records with this option as below example.

table_create Users TABLE_PAT_KEY ShortText
[[0,0.0,0.0],true]
load --table Users
[
{"_key": "alice"},
{"_key": "bob"},
{"_key": "bill"},
{"_key": "brian"}
]
[[0,0.0,0.0],4]
delete --table Users --filter '_key @^ "b"' --limit 2
[[0,0.0,0.0],true]
#>delete --filter "_key @^ \"b\"" --limit "2" --table "Users"
#:000000000000000 filter(3)
#:000000000000000 delete(2): [0][2]
#<000000000000000 rc=0
select Users
[
  [
    0,
    0.0,
    0.0
  ],
  [
    [
      [
        2
      ],
      [
        [
          "_id",
          "UInt32"
        ],
        [
          "_key",
          "ShortText"
        ]
      ],
      [
        1,
        "alice"
      ],
      [
        3,
        "bill"
      ]
    ]
  ]
]

Conclusion

See Release 8.0.3 2018-05-29 about detailed changes since 8.0.2

Let's search by Groonga!