BloGroonga

2024-03-14

Groonga 14.0.1 has been released

Groonga 14.0.1 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • [load] Stopped reporting an error when we load key that becomes an empty key by normalization.

    "-" becomes "" with NormalizerNFKC150("remove_symbol", true). So the following case reports a "empty key" error.

    table_create Values TABLE_HASH_KEY ShortText \
      --normalizers 'NormalizerNFKC150("remove_symbol", true)'
    table_create Data TABLE_NO_KEY
    column_create Data value COLUMN_SCALAR Values
    load --table Data
    [
    {"value": "-"}
    ]
    

    However, if we many load in such data, many error log are generated. Because Groonga output many "empty key" error because of Groonga can't register empty string to index.

    No problem even if empty string can't register to index in such case. Because we don't match anything even if we search by empty string. So, we stop reporting an "empty key" error in such case.

Fixes

  • Fixed a crash bug if a request is canceled between or range search.

    This bug doesn't necessarily occur. This bug occur when we cancel a request in the specific timing. This bug occur easily when search time is long such as sequential search.

  • Fixed a bug that highlight_html may return invalid result when the following conditions are met.

    • We use multiple normalizers such as NormalizerTable and NormalizerNFKC150.
    • We highlight string include whitespace.

    For example, this bug occur such as the following case.

    table_create NormalizationsIndex TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
    
    table_create Normalizations TABLE_HASH_KEY UInt64
    column_create Normalizations normalized COLUMN_SCALAR LongText
    column_create Normalizations target COLUMN_SCALAR NormalizationsIndex
    
    column_create NormalizationsIndex index COLUMN_INDEX Normalizations target
    
    
    table_create Lexicon TABLE_PAT_KEY ShortText \
      --normalizers 'NormalizerTable("normalized", \
                                     "Normalizations.normalized", \
                                     "target", \
                                     "target"), NormalizerNFKC150'
    
    table_create Names TABLE_HASH_KEY UInt64
    column_create Names name COLUMN_SCALAR Lexicon
    
    load --table Names
    [
    ["_key","name"],
    [1,"Sato Toshio"]
    ]
    
    select Names \
      --query '_key:1 OR name._key:@"Toshio"' \
      --output_columns 'highlight_html(name._key, Lexicon)
    
    [
      [
        0,
        1710401574.332274,
        0.001911401748657227
      ],
      [
        [
          [
            1
          ],
          [
            [
              "highlight_html",
              null
            ]
          ],
          [
            "sato <span class=\"keyword\">toshi</span>o"
          ]
        ]
      ]
    ]
    
  • [Ubuntu] We become able to provide package for Ubuntu again.

    We don't provide packages for Ubuntu in Groonga version 14.0.0. Because we fail makeing Groonga package for Ubuntu by problrm of build environment for Ubuntu package.

    We fixed problrm of build environment for Ubuntu package in 14.0.1. So, we can provide packages for Ubuntu again since this release.

  • Fixed build error when we build from source by using clang.

Conclusion

Please refert to the following news for more details. News Release 14.0.1

Let's search by Groonga!

2024-02-29

Groonga 14.0.0 has been released

Groonga 14.0.0 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • Added a new tokenizer TokenH3Index (experimental).

    TokenH3Indextokenizes WGS84GetPoint to UInt64(H3 index).

  • Added support for offline and online index construction with non text based tokenizer (experimental).

    TokenH3Index is one of non text based tokenizers.

  • [select] Added support for searching by index with non text based tokenizer (experimental).

    TokenH3Index is one of non text based tokenizers.

  • Added new functions distance_cosine(), distance_inner_product(), distance_l2_norm_squared(), distance_l1_norm().

    We can only get records that a small distance as vector with these functions and limit N

    These functions calculate distance in the output stage.

    However, we don't optimaize these functions yet.

    • distance_cosine(): Calculate cosine similarity.
    • distance_inner_product(): Calculate inner product.
    • distance_l2_norm_squared(): Calculate euclidean distance.
    • distance_l1_norm(): Calculate manhattan distance.
  • Added a new function number_round().

  • [load] Added support for parallel load.

    This feature only enable when the input_type of load is apache-arrow.

    This feature one thread per column. If there are many target columns, it will reduce load time.

  • [select] We can use uvector as much as possible for array literal in --filter.

    uvector is vector of elements with fix size.

    If all elements have the same type, we use uvector instead vector.

  • [status] Added n_workers to output of status.

  • Optimized a dynamic column creation.

  • [WAL] Added support for rebuilding broken indexes in parallel.

  • [select] Added support for Int64 in output_type=apache-arrow for columns that reference other table.

Fixes

  • [Windws] Fixed path for documents of groonga-normalizer-mysql in package for Windows.

    Documents of groonga-normalizer-mysql put under the share/ in this release.

  • [select] Fixed a bug that Groonga may crash when we use bitwise operations.

Conclusion

Please refert to the following news for more details. News Release 14.0.0

Let's search by Groonga!

2024-01-10

PGroonga (fast full text search module for PostgreSQL) 3.1.6 has been released

PGroonga 3.1.6 has been released! PGroonga makes PostgreSQL fast full text search for all languages.

Improvements

  • Added new option pgroonga.enable_row_level_security.

    This option can configure enable/disable setting of PGroonga'RLS(Row level security) support. Default value is enable. Disabling PGroonga`s RLS support may help to increase performance. However, PGroonga RLS support should not be disabled where PostgreSQL's RLS feature is applied. Disabling PGroonga RLS support in that environment would increase security risk.

    Thus, make sure to check the PostgreSQL's RLS feature is not applied in advance when you are planning to disable PGroonga's RLS support by this option.

    If you are willing to use the setting in the specific session, these SQL can switch enable/disable as follow.

    • Disable RLS support

      SET pgroonga.enable_row_level_security = off
      
    • Enable RLS support

      SET pgroonga.enable_row_level_security = on
      

      If you are willing to use the setting in the persistence, these SQL can switch enable/disable as follow.

    • Disable RLS support

      pgroonga.enable_row_level_security = off
      
    • Enable RLS support

      pgroonga.enable_row_level_security = on
      
  • Added new type pgroonga_condition. Also added related new function pgroonga_condition() .

    pgroonga_full_text_search_condition type and pgroonga_full_text_search_condition_with_scorers type are now deprecation. pgroonga_condition type is now recommended to use instead of pgroonga_full_text_search_condition type and pgroonga_full_text_search_condition_with_scorers type.

    Queries used with pgroonga full text_search condition type and pgroonga_full_text_search_condition_with_scorers type would change as follow.

    (Before-changes):

      column &@~ ('query', weights, 'scorers', index_name)::pgroonga_full_text_search_condition_with_scorers
      column &@~ ('query', weights, index_name)::pgroonga_full_text_search_condition
    

    (After changes):

      column &@~ pgroonga_conditon('query', weights, 'scorers', index_name => 'index_name')
      column &@~ pgroonga_conditon('query', weights, index_name => 'index_name')
    

    Note that 'index_name' requires designating argument name => 'value' style such as index_name => 'index_name'. Here is why 'index_name' is required to be designated as this way.

    Signature of pgroonga condition() is as follows. It is possible to leave out those arguments not required to be designated. Leaving out of the arguments would make unrecognizable where and what arguments are located. Therefore those arguments that has different location from the following signature are required to be designated by writing argument name => 'value' style.

      pgroonga_condition(query text,
                         weights int[],
                         scorers text[],
                         schema_name text,
                         index_name text,
                         column_name text)
    
  • [For Developers] Added new script to set up building environment. [GitHub#358][Patched by askdkc.]

    Here is how to use. It works in Debian/Ubuntu environment and doesn't work in those distributions delivered from Red Hat Enterprise Linux such as AlmaLinux.

      $ git clone https://github.com/pgroonga/pgroonga.git
      $ cd pgroonga
      $ ./setup.sh #create an environment to build PGroonga.
      $ ./build.sh SOURCE_DIRECTORY BUILD_DIRECTORY #Build PGroonga.
    

Fixes

  • Fixed the problem not used pgroonga_snippet_html() when update PGroonga 2.4.2 from 2.4.1[Reported by takadat]

  • Fixed the problem that PGroonga crashes when a first argument of pgroonga query expand() designate the tables not normal PostgreSQL.

    For example, PGroonga would crash if a first argument of pgroonga query expand() designate the foreign table as follow.

    CREATE EXTENSION IF NOT EXISTS postgres_fdw;
    
    CREATE SERVER remote_server
        FOREIGN DATA WRAPPER postgres_fdw
        OPTIONS (host 'localhost', port '5432', dbname 'remote_database');
    
    CREATE FOREIGN TABLE synonym_groups (
      synonyms text[]
    ) SERVER remote_server;
    
    SELECT pgroonga_query_expand('synonym_groups',
                                 'synonyms',
                                 'synonyms',
                                 'groonga');
    
    server closed the connection unexpectedly
    	This probably means the server terminated abnormally
    	before or while processing the request.
    The connection to the server was lost. Attempting reset: Failed.
    
  • Fixed the problem that PostgreSQL occurs PANIC by using up stacks to record the errors when there are too many errors within PGroonga.

    This problem occurs when using PGroonga 2.3.3 and later.

Thanks

  • askdkc
  • takadat

How to upgrade

If you're using PGroonga 2.0.0 or later, you can upgrade by steps in "Compatible case" in Upgrade document.

If you're using PGroonga 1.Y.Z, you can upgrade by steps in "Incompatible case" in Upgrade document.

Support service

If you need commercial support for PGroonga, contact us.

Conclusion

Try PGroonga when you want to perform fast full text search against all languages on PostgreSQL!

2024-01-09

Groonga 13.1.1 has been released

Groonga 13.1.1 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • Dropped support for the Windows package was built by MinGW32. [GitHub#1654]

  • Added support for index search of vector_column[N] OPERATOR literal with --match_columns and --query.

    We can search with vector_column[N] OPERATOR literal in --filter since a long time ago. We can search only the especially element in vector column with vector_column[N] OPERATOR literal.

Fixes

  • [Windws] Bundled groonga-normalizer-mysql again. [GitHub#1655]

    Groonga 13.1.0 for Windows didn't include groonga-normalizer-mysql. This problem only occured in Groonga 13.1.0.

Conclusion

Please refert to the following news for more details. News Release 13.1.1

Let's search by Groonga!

2023-12-26

Groonga 13.1.0 has been released

Groonga 13.1.0 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • [select] Groonga also cached trace log.

  • Added support for outputting dict<string> in a responce of Apache Arrow format.

  • [Groonga HTTP server] Added support for new content type application/vnd.apache.arrow.stream.

  • [query] Added support empty input as below.

    table_create Users TABLE_NO_KEY
    column_create Users name COLUMN_SCALAR ShortText
    
    table_create Lexicon TABLE_HASH_KEY ShortText   --default_tokenizer TokenBigramSplitSymbolAlphaDigit   --normalizer NormalizerAuto
    column_create Lexicon users_name COLUMN_INDEX|WITH_POSITION Users name
    load --table Users
    [
    {"name": "Alice"},
    {"name": "Alisa"},
    {"name": "Bob"}
    ]
    
    select Users   --output_columns name,_score   --filter 'query("name", "  	")'
    [
      [
        0,
        0.0,
        0.0
      ],
      [
        [
          [
            0
          ],
          [
            [
              "name",
              "ShortText"
            ],
            [
              "_score",
              "Int32"
            ]
          ]
        ]
      ]
    ]
    
  • Added support for BFloat16(experimental)

    We can just load and select BFloat16. We can’t use arithmetic operations such as bfloat16_value - 1.2.

  • [column_create] Added new flag WEIGHT_BFLOAT16.

Fixes

  • [select] Fixed a bug that when Groonga cached output_pretty=yes result, Groonga returned a query with output_pretty even if we sent a query without output_pretty.

  • Fixed a wrong data created bug.

    In general, users can’t do this explicitly because the command API doesn’t accept GRN_OBJ_{APPEND,PREPEND}. This may be used internally when a dynamic numeric vector column is created and a temporary result set is created (OR is used).

    For example, the following query may create wrong data:

    select TABLE \
      --match_columns TEXT_COLUMN \
      --query 'A B OR C' \
      --columns[NUMERIC_DYNAMIC_COLUMN].stage result_set \
      --columns[NUMERIC_DYNAMIC_COLUMN].type Float32 \
      --columns[NUMERIC_DYNAMIC_COLUMN].flags COLUMN_VECTOR
    

    If this is happen, NUMERIC_DYNAMIC_COLUMN contains many garbage elements. It also causes too much memory consumption.

    Note that this is caused by an uninitialized variable on stack. So this may or may not be happen.

  • Fixed a bug that may fail to set valid normalizers/token_filters.

  • [fuzzy_search] Fixed a crash bug when the following three conditions established.

    1. Query has 2 or more multi-byte characters.

    2. ${ASCII}${ASCII}${MULTIBYTE}* characters in a patricia trie table.

    3. WITH_TRANSPOSITION is enabled.

    For example, “aaあ” in a patricia trie table with query “あiう” pair has this problem as below.

    table_create Users TABLE_NO_KEY
    column_create Users name COLUMN_SCALAR ShortText
    
    table_create Names TABLE_PAT_KEY ShortText
    column_create Names user COLUMN_INDEX Users name
    load --table Users
    [
    {"name": "aaあ"},
    {"name": "あうi"},
    {"name": "あう"},
    {"name": "あi"},
    {"name": "iう"}
    ]
    select Users
      --filter 'fuzzy_search(name, "あiう", {"with_transposition": true, "max_distance": 3})'
      --output_columns 'name, _score'
      --match_escalation_threshold -1
    

Conclusion

Please refert to the following news for more details. News Release 13.1.0

Let's search by Groonga!