BloGroonga

2021-06-29

Groonga 11.0.4 has been released

Groonga 11.0.4 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • [Normalizer] Added support for customized normalizer.

  • Added a new command object_warm.

    This commnad ship Groonga's DB to OS's page cache.

    If we never startup Groonga after OS startup, Groonga's DB doesn't exist on OS's page cache When Groonga on the first run. Therefore, the first operation to Groonga is slow.

    If we execute this command in advance, the first operation to Groonga is fast. In Linux, we can do the same by also executing cat *.db > dev/null. However, we could not do the same thing in Windows until now.

    By using this command, we can ship Groonga's DB to OS's page cache in both Linux and Windows. Then, we can also do that in units of table, column, and index. Therefore, we can ship only table, column, and index that we often use to OS's page cache.

  • select Added support for adjusting the score of a specific record in --filter.

    We can adjust the score of a specific record by using a oprtator named *~. *~ is logical operator same as && and ||. Therefore, we can use *~ like as && ans ||. Default weight of *~ is -1.

    Therefore, for example, 'content @ "Groonga" *~ content @ "Mroonga"' mean the following operations.

    1. Extract records that match 'content @ "Groonga" and content @ "Mroonga"'.
    2. Add a score as below.
    a. Calculate the score of 'content @ "Groonga"'.
    b. Calculate the score of 'content @ "Mroonga"'.
    c. b's score multiplied by -1 by *~.
    d. The socre of this record is a + b
       Therefore, if a's socre is 1 and b's score is 1, the score of this record  is 1 + (1 * -1) = 0.
    

    Then, we can specify score quantity by *~${score_quantity}.

    In particular, the following query adjust the score of match records by the following condition('content @ "Groonga" *~2.5 content @ "Mroonga")' ).

     ```
     table_create Memos TABLE_NO_KEY
     column_create Memos content COLUMN_SCALAR ShortText
    
     table_create Terms TABLE_PAT_KEY ShortText \
       --default_tokenizer TokenBigram \
       --normalizer NormalizerAuto
     column_create Terms index COLUMN_INDEX|WITH_POSITION Memos content
    
     load --table Memos
     [
     {"content": "Groonga is a full text search engine."},
     {"content": "Rroonga is the Ruby bindings of Groonga."},
     {"content": "Mroonga is a MySQL storage engine based of Groonga."}
     ]
    
     select Memos \
       --command_version 3 \
       --filter 'content @ "Groonga" *~2.5 content @ "Mroonga"' \
       --output_columns 'content, _score' \
       --sort_keys -_score,_id
     {
       "header": {
         "return_code": 0,
         "start_time": 1624605205.641078,
         "elapsed_time": 0.002965450286865234
       },
       "body": {
         "n_hits": 3,
         "columns": [
           {
             "name": "content",
             "type": "ShortText"
           },
           {
             "name": "_score",
             "type": "Float"
           }
         ],
         "records": [
           [
             "Groonga is a full text search engine.",
             1.0
           ],
           [
             "Rroonga is the Ruby bindings of Groonga.",
             1.0
           ],
           [
             "Mroonga is a MySQL storage engine based of Groonga.",
             -1.5
           ]
         ]
       }
     }
     ```
    

    We can do the same by also useing adjuster . If we use adjuster , we need to make --filter condition and --adjuster conditon on our application, but we make only --filter condition on it by this improvement.

    We can also describe filter condition as below by using query().

    • --filter 'content @ "Groonga" *~2.5 content @ "Mroonga"'
  • select Added support for && with weight.

    We can use && with weight by using *< or *>. Default weight of *< is 0.5. Default weight of *> is 2.0.

    We can specify score quantity by *<${score_quantity} and *>${score_quantity}. Then, if we specify *<${score_quantity}, a plus or minus sign of ${score_quantity} is reverse.

    For example, 'content @ "Groonga" *<2.5 query("content", "MySQL")' is as below.

    1. Extract records that match 'content @ "Groonga" and content @ "Mroonga"'.
    2. Add a score as below.
    a. Calculate the score of 'content @ "Groonga"'.
    b. Calculate the score of 'query("content", "MySQL")'.
    c. b's score multiplied by -2.5 by *<.
    d. The socre of this record is a + b
       Therefore, if a's socre is 1 and b's score is 1, the score of this record is 1 + (1 * -2.5) = -1.5.
    

    In particular, the following query adjust the score of match records by the following condition( 'content @ "Groonga" *<2.5 query("content", "Mroonga")' ).

     ```
     table_create Memos TABLE_NO_KEY
     column_create Memos content COLUMN_SCALAR ShortText
    
     table_create Terms TABLE_PAT_KEY ShortText \
       --default_tokenizer TokenBigram \
       --normalizer NormalizerAuto
     column_create Terms index COLUMN_INDEX|WITH_POSITION Memos content
    
     load --table Memos
     [
     {"content": "Groonga is a full text search engine."},
     {"content": "Rroonga is the Ruby bindings of Groonga."},
     {"content": "Mroonga is a MySQL storage engine based of Groonga."}
     ]
    
     select Memos \
       --command_version 3 \
       --filter 'content @ "Groonga" *<2.5 query("content", "Mroonga")' \
       --output_columns 'content, _score' \
       --sort_keys -_score,_id
     {
       "header": {
         "return_code": 0,
         "start_time": 1624605205.641078,
         "elapsed_time": 0.002965450286865234
       },
       "body": {
         "n_hits": 3,
         "columns": [
           {
             "name": "content",
             "type": "ShortText"
           },
           {
             "name": "_score",
             "type": "Float"
           }
         ],
         "records": [
           [
             "Groonga is a full text search engine.",
             1.0
           ],
           [
             "Rroonga is the Ruby bindings of Groonga.",
             1.0
           ],
           [
             "Mroonga is a MySQL storage engine based of Groonga.",
             -1.5
           ]
         ]
       }
     }
     ```
    
  • Log Added support for outputting to stdout and stderr.

    Process-log and Query-log supported output to stdout and stderr.

    • If we specify as --log-path -, --query-log-path -, Groonga output log to stdout.
    • If we specify as --log-path +, --query-log-path +, Groonga output log to stderr.

    Process-log is for all of Groonga works. Query-log is just for query processing.

    This feature is useful when we execute Groonga on Docker. Docker has the feature that records stdout and stderr in standard. Therefore, we don't need to login into the environment of Docker to get Groonga's log.

    For example, this feature is useful as he following case.

    • If we want to analyze slow queries of Groonga on Docker.

      If we specify --query-log-path - when startup Groonga, we can analyze slow queries by only execution the following commands.

      • docker logs ${container_name} | groonga-query-log-analyze

    By this, we can analyze slow query with the query log that output from Groonga on Docker simply.

  • [Documentation] Filled missing documentation of string_substring.

Known Issues

  • Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.

  • [The browser based administration tool] Currently, Groonga has a bug that a search query that is inputted to non-administration mode is sent even if we input checks to the checkbox for the administration mode of a record list.

  • *< and *> only valid when we use query() the right side of filter condition. If we specify as below, *< and *> work as &&.

    • 'content @ "Groonga" *< content @ "Mroonga"'

Conclusion

Please refert to the following news for more details.

News Release 11.0.4

Let's search by Groonga!

2021-05-29

Groonga 11.0.3 has been released

Groonga 11.0.3 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • query Added support for ignoring TokenFilterStem by the query.

  • query Added support for ignoring TokenFilterStopWord by the query.

  • NormalizerNFKC Added a new option remove_new_line.

  • string_slice() Added a new function string_slice().

    • string_slice() extracts a substring of a string.
  • Ubuntu Dropped support for Ubuntu 16.04 LTS (Xenial Xerus).

  • Added EditorConfig for Visual Studio.

    • Most settings are for Visual Studio only.
  • [httpd] Updated bundled nginx to 1.20.1.

    • Contains security fix of CVE-2021-23017.

Fixes

  • Fixed a bug that Groonga may not have returned a result of a search query if we sent many search queries when tokenizer, normalizer, or token_filters that support options were used.

Known Issues

  • Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.

  • [The browser based administration tool] Currently, Groonga has a bug that a search query that is inputted to non-administration mode is sent even if we input checks to the checkbox for the administration mode of a record list.

Conclusion

Please refert to the following news for more details.

News Release 11.0.3

Let's search by Groonga!

2021-05-10

Groonga 11.0.2 has been released

Groonga 11.0.2 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • [Documentation] Removed a reference about ruby_load command.

    • Because this command has already deleted.
  • Debian GNU/Linux Added support for Debian 11(Bullseye).

  • select Added support for --post_filter.

  • select Added support for --slices[].post_filter.

  • select Added support for describing expression into --sort_keys.

  • Token filters Added support for multiple token filters with options.

    • We can specify multiple token filters with options like --token_filters 'TokenFilterStopWord("column", "ignore"), TokenFilterNFKC130("unify_kana", true)'.
  • query Added support a dynamic column of result_set stage with complex expression.

    • Complex expression is that it needs temporary result sets internally like a following expression.

      '(true && query("name * 10", "ali", {"score_column": "ali_score"})) || \
      (true && query("name * 2", "li", {"scorne_column": "li_score"}))'
      
      • In the above expressions, the temporary result sets are used to store the result of evaluating the true.
      • Therefore, for example, in the following expression, we can use a value of dynamic column of result_set stage in expression. Because temporary result sets internally are needless as below expression.

        '(query("name * 10", "ali", {"score_column": "ali_score"})) || \
        (query("name * 2", "li", {"score_column": "li_score"}))'
        
    • In this release, for example, we can set a value to li_score as below. (The value of li_score had been 0 in before version. Because the second expression could not get dynamic column.)

      table_create Users TABLE_NO_KEY
      column_create Users name COLUMN_SCALAR ShortText
      
      table_create Lexicon TABLE_HASH_KEY ShortText \
        --default_tokenizer TokenBigramSplitSymbolAlphaDigit \
        --normalizer NormalizerAuto
      column_create Lexicon users_name COLUMN_INDEX|WITH_POSITION Users name
      
      load --table Users
      [
      {"name": "Alice"},
      {"name": "Alisa"},
      {"name": "Bob"}
      ]
      
      select Users \
        --columns[ali_score].stage result_set \
        --columns[ali_score].type Float \
        --columns[ali_score].flags COLUMN_SCALAR \
        --columns[li_score].stage result_set \
        --columns[li_score].type Float \
        --columns[li_score].flags COLUMN_SCALAR \
        --output_columns name,_score,ali_score,li_score \
        --filter '(true && query("name * 10", "ali", {"score_column": "ali_score"})) || \
                  (true && query("name * 2", "li", {"score_column": "li_score"}))'
      [
        [
          0,
          0.0,
          0.0
        ],
        [
          [
            [
              2
            ],
            [
              [
                "name",
                "ShortText"
              ],
              [
                "_score",
                "Int32"
              ],
              [
                "ali_score",
                "Float"
              ],
              [
                "li_score",
                "Float"
              ]
            ],
            [
              "Alice",
              14,
              10.0,
              2.0
            ],
            [
              "Alisa",
              14,
              10.0,
              2.0
            ]
          ]
        ]
      ]
      
    • We also supported a dynamic vector column of result_set stage as below.

      table_create Users TABLE_NO_KEY
      column_create Users name COLUMN_SCALAR ShortText
      
      table_create Lexicon TABLE_HASH_KEY ShortText \
        --default_tokenizer TokenBigramSplitSymbolAlphaDigit \
        --normalizer NormalizerAuto
      column_create Lexicon users_name COLUMN_INDEX|WITH_POSITION Users name
      
      load --table Users
      [
      {"name": "Alice"},
      {"name": "Alisa"},
      {"name": "Bob"}
      ]
      
      select Users \
        --columns[tags].stage result_set \
        --columns[tags].type ShortText \
        --columns[tags].flags COLUMN_VECTOR \
        --output_columns name,tags \
        --filter '(true && query("name", "al", {"tags": ["al"], "tags_column": "tags"})) || \
                  (true && query("name", "sa", {"tags": ["sa"], "tags_column": "tags"}))'
      [
        [
          0,
          0.0,
          0.0
        ],
        [
          [
            [
              2
            ],
            [
              [
                "name",
                "ShortText"
              ],
              [
                "tags",
                "ShortText"
              ]
            ],
            [
              "Alice",
              [
                "al"
              ]
            ],
            [
              "Alisa",
              [
                "al",
                "sa"
              ]
            ]
          ]
        ]
      ]
      
      • If we use a dynamic vector column, the storing values are appended values of each element.
  • Ubuntu Added support for Ubuntu 21.04 (Hirsute Hippo).

  • [httpd] Updated bundled nginx to 1.19.10.

Known Issues

  • Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.

  • [The browser based administration tool] Currently, Groonga has a bug that a search query that is inputted to non-administration mode is sent even if we input checks to the checkbox for the administration mode of a record list.

Conclusion

Please refert to the following news for more details.

News Release 11.0.2

Let's search by Groonga!

2021-03-31

Groonga 11.0.1 has been released

Groonga 11.0.1 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • Debian GNU/Linux Added support for a ARM64 package.

  • select Added support for customizing adjust weight every key word.

    • We need to specify < or > to all keywords to adjust scores until now. Because the default adjustment of weight (6 or 4) is larger than the default score (1).

      • Therefore, for example, "A"'s weight is 1 and "B"'s weight is 4 in A <B. Decremented "B"'s weight (4) is larger than not decremented "A"'s weight (1). This is not works as expected. we need to specify >A <B to use smaller weight than "A" for "B". "A"'s weight is 6 and "B"'s weight is 4 in >A <B.
    • We can customize adjustment of weight every key word by only specifying <${WEIGHT} or >${WEIGHT} to target keywords since this release. For example, "A"'s weight is 1 and "B"'s weight is 0.9 in A <0.1B ("B"'s weight decrement 0.1).

    • However, note that these forms ( >${WEIGHT}..., <${WEIGHT}..., and ~${WEIGHT}... ) are incompatible.

  • select Added support for outputting Float and Float32 value in Apache Arrow format.

  • select Added support for getting a reference destination data via index column when we output a result.

    • Until now, Groonga had returned involuntary value when we specified output value like index_column.xxx. For example, A value of --columns[tags].value purchases.tag was ["apple",["many"]],["banana",["man"]],["cacao",["man"]] in the following example. In this case, the expected values was ["apple",["man","many"]],["banana",["man"]],["cacao",["woman"]]. In this release, we can get a correct reference destination data via index column as below.

        table_create Products TABLE_PAT_KEY ShortText
      
        table_create Purchases TABLE_NO_KEY
        column_create Purchases product COLUMN_SCALAR Products
        column_create Purchases tag COLUMN_SCALAR ShortText
      
        column_create Products purchases COLUMN_INDEX Purchases product
      
        load --table Products
        [
        {"_key": "apple"},
        {"_key": "banana"},
        {"_key": "cacao"}
        ]
      
        load --table Purchases
        [
        {"product": "apple",  "tag": "man"},
        {"product": "banana", "tag": "man"},
        {"product": "cacao",  "tag": "woman"},
        {"product": "apple",  "tag": "many"}
        ]
      
        select Products \
          --columns[tags].stage output \
          --columns[tags].flags COLUMN_VECTOR \
          --columns[tags].type ShortText \
          --columns[tags].value purchases.tag \
          --output_columns _key,tags
        [
          [
            0,
            0.0,
            0.0
          ],
          [
            [
              [
                3
              ],
              [
                [
                  "_key",
                  "ShortText"
                ],
                [
                  "tags",
                  "ShortText"
                ]
              ],
              [
                "apple",
                [
                  "man",
                  "many"
                ]
              ],
              [
                "banana",
                [
                  "man"
                ]
              ],
              [
                "cacao",
                [
                  "woman"
                ]
              ]
            ]
          ]
        ]
      
  • select Added support for specifying index column directly as a part of nested index.

    • We can search source table after filtering by using index_column.except_source_column. For example, we specify comments.content when searching in the following example. In this case, at first, this query execute full text search from content column of Commentts table, then fetch the records of Articles table which refers to already searched records of Comments table.

         table_create Articles TABLE_HASH_KEY ShortText
      
         table_create Comments TABLE_NO_KEY
         column_create Comments article COLUMN_SCALAR Articles
         column_create Comments content COLUMN_SCALAR ShortText
      
         column_create Articles content COLUMN_SCALAR Text
         column_create Articles comments COLUMN_INDEX Comments article
      
         table_create Terms TABLE_PAT_KEY ShortText \
           --default_tokenizer TokenBigram \
           --normalizer NormalizerNFKC130
         column_create Terms articles_content COLUMN_INDEX|WITH_POSITION \
           Articles content
         column_create Terms comments_content COLUMN_INDEX|WITH_POSITION \
           Comments content
      
         load --table Articles
         [
         {"_key": "article-1", "content": "Groonga is fast!"},
         {"_key": "article-2", "content": "Groonga is useful!"},
         {"_key": "article-3", "content": "Mroonga is fast!"}
         ]
      
         load --table Comments
         [
         {"article": "article-1", "content": "I'm using Groonga too!"},
         {"article": "article-3", "content": "I'm using Mroonga!"},
         {"article": "article-1", "content": "I'm using PGroonga!"}
         ]
      
         select Articles --match_columns comments.content --query groonga \
           --output_columns "_key, _score, comments.content
         [
           [
             0,
             0.0,
             0.0
           ],
           [
             [
               [
                 1
               ],
               [
                 [
                   "_key",
                   "ShortText"
                 ],
                 [
                   "_score",
                   "Int32"
                 ],
                 [
                   "comments.content",
                   "ShortText"
                 ]
               ],
               [
                 "article-1",
                 1,
                 [
                   "I'm using Groonga too!",
                   "I'm using PGroonga!"
                 ]
               ]
             ]
           ]
         ]
      
  • load Added support for loading reference vector with inline object literal.

    • For example, we can load data like "key" : "[ { "key" : "value", ..., "key" : "value" } ]" as below.

        table_create Purchases TABLE_NO_KEY
        column_create Purchases item COLUMN_SCALAR ShortText
        column_create Purchases price COLUMN_SCALAR UInt32
      
        table_create Settlements TABLE_HASH_KEY ShortText
        column_create Settlements purchases COLUMN_VECTOR Purchases
        column_create Purchases settlements_purchases COLUMN_INDEX Settlements purchases
      
        load --table Settlements
        [
        {
          "_key": "super market",
          "purchases": [
             {"item": "apple", "price": 100},
             {"item": "milk",  "price": 200}
          ]
        },
        {
          "_key": "shoes shop",
          "purchases": [
             {"item": "sneakers", "price": 3000}
          ]
        }
        ]
      
    • It makes easier to add JSON data into reference columns by this feature.
    • Currently, this feature only support with JSON input.
  • load Added support for loading reference vector from JSON text.

    • We can load data to reference vector from source table with JSON text as below.

        table_create Purchases TABLE_HASH_KEY ShortText
        column_create Purchases item COLUMN_SCALAR ShortText
        column_create Purchases price COLUMN_SCALAR UInt32
      
        table_create Settlements TABLE_HASH_KEY ShortText
        column_create Settlements purchases COLUMN_VECTOR Purchases
      
        column_create Purchases settlements_purchases COLUMN_INDEX Settlements purchases
      
        load --table Settlements
        [
        {
          "_key": "super market",
          "purchases": "[{\"_key\": \"super market-1\", \"item\": \"apple\", \"price\": 100}, {\"_key\": \"super market-2\", \"item\": \"milk\",  \"price\": 200}]"
        },
        {
          "_key": "shoes shop",
          "purchases": "[{\"_key\": \"shoes shop-1\", \"item\": \"sneakers\", \"price\": 3000}]"
        }
        ]
      
        dump \
          --dump_plugins no \
          --dump_schema no
        load --table Purchases
        [
        ["_key","item","price"],
        ["super market-1","apple",100],
        ["super market-2","milk",200],
        ["shoes shop-1","sneakers",3000]
        ]
      
        load --table Settlements
        [
        ["_key","purchases"],
        ["super market",["super market-1","super market-2"]],
        ["shoes shop",["shoes shop-1"]]
        ]
      
        column_create Purchases settlements_purchases COLUMN_INDEX Settlements purchases
      
    • Currently, this feature doesn't support nested reference record.

  • [Windows] Added support for UNIX epoch for time_classify_* functions.

  • query_parallel_or Added a new function for processing queries in parallel.

  • select Added support for ignoring nonexistent sort keys.

    • Groonga had been outputted error when we specified nonexistent sort keys until now. However, Groonga ignore nonexistent sort keys since this release. (Groonga doesn't output error.)
    • This feature implements for consistency. Because we just ignore invalid values in output_columns and most of invalid values in sort_keys.
  • select Added support for ignoring nonexistent tables in drilldowns[].table.

    • Groonga had been outputted error when we specified nonexistent tables in drilldowns[].table until now. However, Groonga ignore nonexistent tables in drilldowns[].table since this release. (Groonga doesn't output error.)
    • This feature implements for consistency. Because we just ignore invalid values in output_columns and most of invalid values in sort_keys.
  • [httpd] Updated bundled nginx to 1.19.8.

Fixes

  • reference_acquire Fixed a bug that Groonga crash when a table's reference is acquired and a column is added to the table before auto release is happened.

    • Because the added column's reference isn't acquired but it's released on auto release.
  • [Windows] Fixed a bug that one or more processes fail an output backtrace on SEGV when a new backtrace logging process starts when another backtrace logging process is running in another thread.

Known Issues

  • Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.

Conclusion

Please refert to the following news for more details.

News Release 11.0.1

Let's search by Groonga!

2021-02-09

Groonga 11.0.0 has been released

Groonga 11.0.0 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • select Added support for outputting values of scalar column and vector column via nested index.

    • The nested index is that has structure as below.

      table_create Products TABLE_PAT_KEY ShortText
      
      table_create Purchases TABLE_NO_KEY
      column_create Purchases product COLUMN_SCALAR Products
      column_create Purchases tag COLUMN_SCALAR ShortText
      
      column_create Products purchases COLUMN_INDEX Purchases product
      
    • The Products.purchases column is a index of Purchases.product column in the above example. Also, Purchases.product is a reference to Products table.

  • [Windows] Dropped support for packages of Windows version that we had cross compiled by using MinGW on Linux.

    • From now on, we use the following packages for Windows.

      • groonga-latest-x86-vs2019-with-vcruntime.zip
      • groonga-latest-x64-vs2019-with-vcruntime.zip
    • If a system already has installed Microsoft Visual C++ Runtime Library, we suggest that we use the following packages.

      • groonga-latest-x86-vs2019.zip
      • groonga-latest-x64-vs2019.zip
  • Fixed a bug that there is possible that index is corrupt when Groonga executes many additions, delete, and update information in it.

    • This bug occurs when we only execute many delete information from index. However, it doesn't occur when we only execute many additions information into index.

    • We can repair the index that is corrupt by this bug using reconstruction of it.

    • This bug doesn't detect unless we reference the broken index. Therefore, the index in our indexes may has already broken.

    • We can use index_column_diff command to confirm whether the index has already been broken or not.

Conclusion

Please refert to the following news for more details.

News Release 11.0.0

Let's search by Groonga!