BloGroonga

2023-12-26

Groonga 13.1.0 has been released

Groonga 13.1.0 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • [select] Groonga also cached trace log.

  • Added support for outputting dict<string> in a responce of Apache Arrow format.

  • [Groonga HTTP server] Added support for new content type application/vnd.apache.arrow.stream.

  • [query] Added support empty input as below.

    table_create Users TABLE_NO_KEY
    column_create Users name COLUMN_SCALAR ShortText
    
    table_create Lexicon TABLE_HASH_KEY ShortText   --default_tokenizer TokenBigramSplitSymbolAlphaDigit   --normalizer NormalizerAuto
    column_create Lexicon users_name COLUMN_INDEX|WITH_POSITION Users name
    load --table Users
    [
    {"name": "Alice"},
    {"name": "Alisa"},
    {"name": "Bob"}
    ]
    
    select Users   --output_columns name,_score   --filter 'query("name", "  	")'
    [
      [
        0,
        0.0,
        0.0
      ],
      [
        [
          [
            0
          ],
          [
            [
              "name",
              "ShortText"
            ],
            [
              "_score",
              "Int32"
            ]
          ]
        ]
      ]
    ]
    
  • Added support for BFloat16(experimental)

    We can just load and select BFloat16. We can’t use arithmetic operations such as bfloat16_value - 1.2.

  • [column_create] Added new flag WEIGHT_BFLOAT16.

Fixes

  • [select] Fixed a bug that when Groonga cached output_pretty=yes result, Groonga returned a query with output_pretty even if we sent a query without output_pretty.

  • Fixed a wrong data created bug.

    In general, users can’t do this explicitly because the command API doesn’t accept GRN_OBJ_{APPEND,PREPEND}. This may be used internally when a dynamic numeric vector column is created and a temporary result set is created (OR is used).

    For example, the following query may create wrong data:

    select TABLE \
      --match_columns TEXT_COLUMN \
      --query 'A B OR C' \
      --columns[NUMERIC_DYNAMIC_COLUMN].stage result_set \
      --columns[NUMERIC_DYNAMIC_COLUMN].type Float32 \
      --columns[NUMERIC_DYNAMIC_COLUMN].flags COLUMN_VECTOR
    

    If this is happen, NUMERIC_DYNAMIC_COLUMN contains many garbage elements. It also causes too much memory consumption.

    Note that this is caused by an uninitialized variable on stack. So this may or may not be happen.

  • Fixed a bug that may fail to set valid normalizers/token_filters.

  • [fuzzy_search] Fixed a crash bug when the following three conditions established.

    1. Query has 2 or more multi-byte characters.

    2. ${ASCII}${ASCII}${MULTIBYTE}* characters in a patricia trie table.

    3. WITH_TRANSPOSITION is enabled.

    For example, “aaあ” in a patricia trie table with query “あiう” pair has this problem as below.

    table_create Users TABLE_NO_KEY
    column_create Users name COLUMN_SCALAR ShortText
    
    table_create Names TABLE_PAT_KEY ShortText
    column_create Names user COLUMN_INDEX Users name
    load --table Users
    [
    {"name": "aaあ"},
    {"name": "あうi"},
    {"name": "あう"},
    {"name": "あi"},
    {"name": "iう"}
    ]
    select Users
      --filter 'fuzzy_search(name, "あiう", {"with_transposition": true, "max_distance": 3})'
      --output_columns 'name, _score'
      --match_escalation_threshold -1
    

Conclusion

Please refert to the following news for more details. News Release 13.1.0

Let's search by Groonga!