Groonga 12.0.3 has been released

BloGroonga

2022-04-29

Groonga 12.0.3 has been released

Groonga 12.0.3 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

logical_count Improved memory usage while logical_count executed.

Up to now, Groonga had been keeping objects(objects are tables and columns and indexes, and so on) and temporary tables that were allocated while logical_count executed until the execution of logical_count finished.

Groonga reduces reference immediately after processing a shard by this feature. Therefore, Groonga can release memory while logical_count executed. The usage of memory of Groonga may reduce because of these reasons.

This improvement is only valid for the reference count mode. We can valid the reference count mode by setting GRN_ENABLE_REFERENCE_COUNT=yes.

In addition, Groonga releases temporary tables dynamically while logical_count is executed by this feature. Therefore, the usage of memory of Groonga reduces. This improvement is valid even if we don't set the reference count mode.
dump Added support for MISSING_IGNORE/MISSING_NIL.

If columns had MISSING_IGNORE/MISSING_NIL, the dump of these columns had failed until now. dump command supports these columns since this release.

snippet,snippet_html Added support for text vector as input.

For example, we can extract snippets of target text around search keywords against vector in JSON data as below.

   table_create Entries TABLE_NO_KEY
   column_create Entries title COLUMN_SCALAR ShortText
   column_create Entries contents COLUMN_VECTOR ShortText

   table_create Tokens TABLE_PAT_KEY ShortText   --default_tokenizer TokenNgram   --normalizer NormalizerNFKC130
   column_create Tokens entries_title COLUMN_INDEX|WITH_POSITION Entries title
   column_create Tokens entries_contents COLUMN_INDEX|WITH_SECTION|WITH_POSITION   Entries contents

   load --table Entries
   [
   {
     "title": "Groonga and MySQL",
     "contents": [
       "Groonga is a full text search engine",
       "MySQL is a RDBMS",
       "Mroonga is a MySQL storage engine based on Groonga"
     ]
   }
   ]

   select Entries\
     --output_columns 'snippet_html(contents), contents'\
     --match_columns 'title'\
     --query Groonga
   [
     [
       0,
       0.0,
       0.0
     ],
     [
       [
         [
           1
         ],
         [
           [
             "snippet_html",
             null
           ],
           [
             "contents",
             "ShortText"
           ]
         ],
         [
           [
             "<span class=\"keyword\">Groonga</span> is a full text search engine",
             "Mroonga is a MySQL storage engine based on <span class=\"keyword\">Groonga</span>"
           ],
           [
             "Groonga is a full text search engine",
             "MySQL is a RDBMS",
             "Mroonga is a MySQL storage engine based on Groonga"
           ]
         ]
       ]
     ]
   ]

Until now, if we specified snippet* like --output_columns 'snippet_html(contents[1]), we could extract snippets of target text around search keywords against the vector as below. However, we didn't know which we should output elements. Because we didn't know which element was hit on search.

   select Entries\
     --output_columns 'snippet_html(contents[0]), contents'\
     --match_columns 'title'\
     --query Groonga
   [
     [
       0,
       0.0,
       0.0
     ],
     [
       [
         [
           1
         ],
         [
           [
             "snippet_html",
             null
           ],
           [
             "contents",
             "ShortText"
           ]
         ],
         [
           [
             "<span class=\"keyword\">Groonga</span> is a full text search engine"
           ],
           [
             "Groonga is a full text search engine",
             "MySQL is a RDBMS",
             "Mroonga is a MySQL storage engine based on Groonga"
           ]
         ]
       ]
     ]
   ]

[vector_join] Added a new function vector_join().

This function can concatenate each elements. We can specify delimiter in the second argument in this function.

For example, we could execute snippet() and snippet_html() against vector that concatenate each elements as below.

   plugin_register functions/vector

   table_create Entries TABLE_NO_KEY
   column_create Entries title COLUMN_SCALAR ShortText
   column_create Entries contents COLUMN_VECTOR ShortText

   table_create Tokens TABLE_PAT_KEY ShortText   --default_tokenizer TokenNgram   --normalizer NormalizerNFKC130
   column_create Tokens entries_title COLUMN_INDEX|WITH_POSITION Entries title
   column_create Tokens entries_contents COLUMN_INDEX|WITH_SECTION|WITH_POSITION   Entries contents

   load --table Entries
   [
   {
     "title": "Groonga and MySQL",
     "contents": [
       "Groonga is a full text search engine",
       "MySQL is a RDBMS",
       "Mroonga is a MySQL storage engine based on Groonga"
     ]
   }
   ]

   select Entries\
     --output_columns 'snippet_html(vector_join(contents, "\n")), contents'\
     --match_columns 'title'\
     --query Groonga --output_pretty yes
   [
     [
       0,
       1650849001.524027,
       0.0003361701965332031
     ],
     [
       [
         [
           1
         ],
         [
           [
             "snippet_html",
             null
           ],
           [
             "contents",
             "ShortText"
           ]
         ],
         [
           [
             "<span class=\"keyword\">Groonga</span> is a full text search engine\nMySQL is a RDBMS\nMroonga is a MySQL storage engine based on <span class=\"keyword\">Groonga</span>"
           ],
           [
             "Groonga is a full text search engine","MySQL is a RDBMS","Mroonga is a MySQL storage engine based on Groonga"
           ]
         ]
       ]
     ]
   ]

Indexing Ignore too large a token like online index construction.

Groonga doesn't occur error, but Groonga ignores too large a token when we execute static index construction. However, Groonga output warning in this case.

Fixes

Fixed a bug that we may be not able to add a key to a table of patricia trie.

This bug occurs in the following conditon.
- If a table of patricia trie already has a key.
- If the additional key is 4096 bytes.

Known Issues

Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.
*< and *> only valid when we use query() the right side of filter condition. If we specify as below, *< and *> work as &&.
- 'content @ "Groonga" *< content @ "Mroonga"'
Groonga may not return records that should match caused by GRN_II_CURSOR_SET_MIN_ENABLE.

Conclusion

Please refert to the following news for more details.

News Release 12.0.3

Let's search by Groonga!

BloGroonga