BloGroonga

2021-03-31

Groonga 11.0.1 has been released

Groonga 11.0.1 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • Debian GNU/Linux Added support for a ARM64 package.

  • select Added support for customizing adjust weight every key word.

    • We need to specify < or > to all keywords to adjust scores until now. Because the default adjustment of weight (6 or 4) is larger than the default score (1).

      • Therefore, for example, "A"'s weight is 1 and "B"'s weight is 4 in A <B. Decremented "B"'s weight (4) is larger than not decremented "A"'s weight (1). This is not works as expected. we need to specify >A <B to use smaller weight than "A" for "B". "A"'s weight is 6 and "B"'s weight is 4 in >A <B.
    • We can customize adjustment of weight every key word by only specifying <${WEIGHT} or >${WEIGHT} to target keywords since this release. For example, "A"'s weight is 1 and "B"'s weight is 0.9 in A <0.1B ("B"'s weight decrement 0.1).

    • However, note that these forms ( >${WEIGHT}..., <${WEIGHT}..., and ~${WEIGHT}... ) are incompatible.

  • select Added support for outputting Float and Float32 value in Apache Arrow format.

  • select Added support for getting a reference destination data via index column when we output a result.

    • Until now, Groonga had returned involuntary value when we specified output value like index_column.xxx. For example, A value of --columns[tags].value purchases.tag was ["apple",["many"]],["banana",["man"]],["cacao",["man"]] in the following example. In this case, the expected values was ["apple",["man","many"]],["banana",["man"]],["cacao",["woman"]]. In this release, we can get a correct reference destination data via index column as below.

        table_create Products TABLE_PAT_KEY ShortText
      
        table_create Purchases TABLE_NO_KEY
        column_create Purchases product COLUMN_SCALAR Products
        column_create Purchases tag COLUMN_SCALAR ShortText
      
        column_create Products purchases COLUMN_INDEX Purchases product
      
        load --table Products
        [
        {"_key": "apple"},
        {"_key": "banana"},
        {"_key": "cacao"}
        ]
      
        load --table Purchases
        [
        {"product": "apple",  "tag": "man"},
        {"product": "banana", "tag": "man"},
        {"product": "cacao",  "tag": "woman"},
        {"product": "apple",  "tag": "many"}
        ]
      
        select Products \
          --columns[tags].stage output \
          --columns[tags].flags COLUMN_VECTOR \
          --columns[tags].type ShortText \
          --columns[tags].value purchases.tag \
          --output_columns _key,tags
        [
          [
            0,
            0.0,
            0.0
          ],
          [
            [
              [
                3
              ],
              [
                [
                  "_key",
                  "ShortText"
                ],
                [
                  "tags",
                  "ShortText"
                ]
              ],
              [
                "apple",
                [
                  "man",
                  "many"
                ]
              ],
              [
                "banana",
                [
                  "man"
                ]
              ],
              [
                "cacao",
                [
                  "woman"
                ]
              ]
            ]
          ]
        ]
      
  • select Added support for specifying index column directly as a part of nested index.

    • We can search source table after filtering by using index_column.except_source_column. For example, we specify comments.content when searching in the following example. In this case, at first, this query execute full text search from content column of Commentts table, then fetch the records of Articles table which refers to already searched records of Comments table.

         table_create Articles TABLE_HASH_KEY ShortText
      
         table_create Comments TABLE_NO_KEY
         column_create Comments article COLUMN_SCALAR Articles
         column_create Comments content COLUMN_SCALAR ShortText
      
         column_create Articles content COLUMN_SCALAR Text
         column_create Articles comments COLUMN_INDEX Comments article
      
         table_create Terms TABLE_PAT_KEY ShortText \
           --default_tokenizer TokenBigram \
           --normalizer NormalizerNFKC130
         column_create Terms articles_content COLUMN_INDEX|WITH_POSITION \
           Articles content
         column_create Terms comments_content COLUMN_INDEX|WITH_POSITION \
           Comments content
      
         load --table Articles
         [
         {"_key": "article-1", "content": "Groonga is fast!"},
         {"_key": "article-2", "content": "Groonga is useful!"},
         {"_key": "article-3", "content": "Mroonga is fast!"}
         ]
      
         load --table Comments
         [
         {"article": "article-1", "content": "I'm using Groonga too!"},
         {"article": "article-3", "content": "I'm using Mroonga!"},
         {"article": "article-1", "content": "I'm using PGroonga!"}
         ]
      
         select Articles --match_columns comments.content --query groonga \
           --output_columns "_key, _score, comments.content
         [
           [
             0,
             0.0,
             0.0
           ],
           [
             [
               [
                 1
               ],
               [
                 [
                   "_key",
                   "ShortText"
                 ],
                 [
                   "_score",
                   "Int32"
                 ],
                 [
                   "comments.content",
                   "ShortText"
                 ]
               ],
               [
                 "article-1",
                 1,
                 [
                   "I'm using Groonga too!",
                   "I'm using PGroonga!"
                 ]
               ]
             ]
           ]
         ]
      
  • load Added support for loading reference vector with inline object literal.

    • For example, we can load data like "key" : "[ { "key" : "value", ..., "key" : "value" } ]" as below.

        table_create Purchases TABLE_NO_KEY
        column_create Purchases item COLUMN_SCALAR ShortText
        column_create Purchases price COLUMN_SCALAR UInt32
      
        table_create Settlements TABLE_HASH_KEY ShortText
        column_create Settlements purchases COLUMN_VECTOR Purchases
        column_create Purchases settlements_purchases COLUMN_INDEX Settlements purchases
      
        load --table Settlements
        [
        {
          "_key": "super market",
          "purchases": [
             {"item": "apple", "price": 100},
             {"item": "milk",  "price": 200}
          ]
        },
        {
          "_key": "shoes shop",
          "purchases": [
             {"item": "sneakers", "price": 3000}
          ]
        }
        ]
      
    • It makes easier to add JSON data into reference columns by this feature.
    • Currently, this feature only support with JSON input.
  • load Added support for loading reference vector from JSON text.

    • We can load data to reference vector from source table with JSON text as below.

        table_create Purchases TABLE_HASH_KEY ShortText
        column_create Purchases item COLUMN_SCALAR ShortText
        column_create Purchases price COLUMN_SCALAR UInt32
      
        table_create Settlements TABLE_HASH_KEY ShortText
        column_create Settlements purchases COLUMN_VECTOR Purchases
      
        column_create Purchases settlements_purchases COLUMN_INDEX Settlements purchases
      
        load --table Settlements
        [
        {
          "_key": "super market",
          "purchases": "[{\"_key\": \"super market-1\", \"item\": \"apple\", \"price\": 100}, {\"_key\": \"super market-2\", \"item\": \"milk\",  \"price\": 200}]"
        },
        {
          "_key": "shoes shop",
          "purchases": "[{\"_key\": \"shoes shop-1\", \"item\": \"sneakers\", \"price\": 3000}]"
        }
        ]
      
        dump \
          --dump_plugins no \
          --dump_schema no
        load --table Purchases
        [
        ["_key","item","price"],
        ["super market-1","apple",100],
        ["super market-2","milk",200],
        ["shoes shop-1","sneakers",3000]
        ]
      
        load --table Settlements
        [
        ["_key","purchases"],
        ["super market",["super market-1","super market-2"]],
        ["shoes shop",["shoes shop-1"]]
        ]
      
        column_create Purchases settlements_purchases COLUMN_INDEX Settlements purchases
      
    • Currently, this feature doesn't support nested reference record.

  • [Windows] Added support for UNIX epoch for time_classify_* functions.

  • query_parallel_or Added a new function for processing queries in parallel.

  • select Added support for ignoring nonexistent sort keys.

    • Groonga had been outputted error when we specified nonexistent sort keys until now. However, Groonga ignore nonexistent sort keys since this release. (Groonga doesn't output error.)
    • This feature implements for consistency. Because we just ignore invalid values in output_columns and most of invalid values in sort_keys.
  • select Added support for ignoring nonexistent tables in drilldowns[].table.

    • Groonga had been outputted error when we specified nonexistent tables in drilldowns[].table until now. However, Groonga ignore nonexistent tables in drilldowns[].table since this release. (Groonga doesn't output error.)
    • This feature implements for consistency. Because we just ignore invalid values in output_columns and most of invalid values in sort_keys.
  • [httpd] Updated bundled nginx to 1.19.8.

Fixes

  • reference_acquire Fixed a bug that Groonga crash when a table's reference is acquired and a column is added to the table before auto release is happened.

    • Because the added column's reference isn't acquired but it's released on auto release.
  • [Windows] Fixed a bug that one or more processes fail an output backtrace on SEGV when a new backtrace logging process starts when another backtrace logging process is running in another thread.

Known Issues

  • Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.

Conclusion

Please refert to the following news for more details.

News Release 11.0.1

Let's search by Groonga!

2021-02-09

Groonga 11.0.0 has been released

Groonga 11.0.0 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • select Added support for outputting values of scalar column and vector column via nested index.

    • The nested index is that has structure as below.

      table_create Products TABLE_PAT_KEY ShortText
      
      table_create Purchases TABLE_NO_KEY
      column_create Purchases product COLUMN_SCALAR Products
      column_create Purchases tag COLUMN_SCALAR ShortText
      
      column_create Products purchases COLUMN_INDEX Purchases product
      
    • The Products.purchases column is a index of Purchases.product column in the above example. Also, Purchases.product is a reference to Products table.

  • [Windows] Dropped support for packages of Windows version that we had cross compiled by using MinGW on Linux.

    • From now on, we use the following packages for Windows.

      • groonga-latest-x86-vs2019-with-vcruntime.zip
      • groonga-latest-x64-vs2019-with-vcruntime.zip
    • If a system already has installed Microsoft Visual C++ Runtime Library, we suggest that we use the following packages.

      • groonga-latest-x86-vs2019.zip
      • groonga-latest-x64-vs2019.zip
  • Fixed a bug that there is possible that index is corrupt when Groonga executes many additions, delete, and update information in it.

    • This bug occurs when we only execute many delete information from index. However, it doesn't occur when we only execute many additions information into index.

    • We can repair the index that is corrupt by this bug using reconstruction of it.

    • This bug doesn't detect unless we reference the broken index. Therefore, the index in our indexes may has already broken.

    • We can use index_column_diff command to confirm whether the index has already been broken or not.

Conclusion

Please refert to the following news for more details.

News Release 11.0.0

Let's search by Groonga!

2021-01-25

Groonga 10.1.1 has been released

Groonga 10.1.1 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • select Added support for outputting UInt64 value in Apache Arrow format.

  • select Added support for outputting a number of hits in Apache Arrow format.

  • select Improved performance for a prefix search.

  • query Added support for optimization of "order by estimated size".

  • between Improved performance.

  • TokenMecab Improved performance for parallel construction fo token column.

  • sub_filter Fixed a bug that sub_filter doesn’t work in slices[].filter.

  • Fixed a bug that it is possible that we can’t add data and Groonga crash when we repeat much addition of data and deletion of data against a hash table.

Conclusion

Please refert to the following news for more details.

News Release 10.1.1

Let's search by Groonga!

2020-12-29

Groonga 10.1.0 has been released

Groonga 10.1.0 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • highlight_html Added support for removing leading full width spaces from highlight target.

  • status Added a new item features.

  • status Added a new item apache_arrow.

  • Window function Added support for processing all tables at once even if target tables straddle a shard. (experimental)

  • Added support for sequential search against reference column.

  • [tokenizers] Added support for the token column into TokenDocumentVectorTFIDF and TokenDocumentVectorBM25.

  • Improved performance when below case.

    • (column @ "value") && (column @ "value")
  • Ubuntu Added support for Ubuntu 20.10 (Groovy Gorilla).

  • Debian Dropped stretch support.

  • CentOS Dropped CentOS 6.

  • [httpd] Updated bundled nginx to 1.19.6.

  • Fixed a bug that Groonga crash when we use multiple keys drilldown and use multiple accessor.

  • Fixed a bug that the near phrase search did not match when the same phrase occurs multiple times.

Conclusion

Please refert to the following news for more details.

News Release 10.1.0

Let's search by Groonga!

2020-12-01

Groonga 10.0.9 has been released

Groonga 10.0.9 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • select Improved performance when we specified -1 to limit.

  • reference_acquire Added a new option --auto_release_count.

  • Modify behavior when Groonga evaluated empty vector and uvector.

    • Empty vector and uvector are evaluated to false in command version 3.
  • Normalizers Added a new Normalizer NormalizerNFKC130 based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 13.0.

  • Token filters Added a new TokenFilter TokenFilterNFKC130 based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 13.0.

  • select Improved performance for "_score = column - X".

  • reference_acquire Improved that --reference_acquire doesn't get unnecessary reference of index column when we specify the --recursive dependent option.

  • select Add support for ordered near phrase search.

    • Until now, the near phrase search have only looked for records that the distance of between specified phrases near.
    • This feature look for satisfy the following conditions records.

      • If the distance of between specified phrases is near.
      • If the specified phrases are in line with specified order.
  • [httpd] Updated bundled nginx to 1.19.5.

  • Groonga HTTP server Fixed that Groonga HTTP server finished without waiting all woker threads finished completely.

Conclusion

Please refert to the following news for more details.

News Release 10.0.9

Let's search by Groonga!