BloGroonga

2021-06-29

Groonga 11.0.4 has been released

Groonga 11.0.4 has been released!

How to install: Install

Changes

Here are important changes in this release:

Improvements

  • [Normalizer] Added support for customized normalizer.

  • Added a new command object_warm.

    This commnad ship Groonga's DB to OS's page cache.

    If we never startup Groonga after OS startup, Groonga's DB doesn't exist on OS's page cache When Groonga on the first run. Therefore, the first operation to Groonga is slow.

    If we execute this command in advance, the first operation to Groonga is fast. In Linux, we can do the same by also executing cat *.db > dev/null. However, we could not do the same thing in Windows until now.

    By using this command, we can ship Groonga's DB to OS's page cache in both Linux and Windows. Then, we can also do that in units of table, column, and index. Therefore, we can ship only table, column, and index that we often use to OS's page cache.

  • select Added support for adjusting the score of a specific record in --filter.

    We can adjust the score of a specific record by using a oprtator named *~. *~ is logical operator same as && and ||. Therefore, we can use *~ like as && ans ||. Default weight of *~ is -1.

    Therefore, for example, 'content @ "Groonga" *~ content @ "Mroonga"' mean the following operations.

    1. Extract records that match 'content @ "Groonga" and content @ "Mroonga"'.
    2. Add a score as below.
    a. Calculate the score of 'content @ "Groonga"'.
    b. Calculate the score of 'content @ "Mroonga"'.
    c. b's score multiplied by -1 by *~.
    d. The socre of this record is a + b
       Therefore, if a's socre is 1 and b's score is 1, the score of this record  is 1 + (1 * -1) = 0.
    

    Then, we can specify score quantity by *~${score_quantity}.

    In particular, the following query adjust the score of match records by the following condition('content @ "Groonga" *~2.5 content @ "Mroonga")' ).

     ```
     table_create Memos TABLE_NO_KEY
     column_create Memos content COLUMN_SCALAR ShortText
    
     table_create Terms TABLE_PAT_KEY ShortText \
       --default_tokenizer TokenBigram \
       --normalizer NormalizerAuto
     column_create Terms index COLUMN_INDEX|WITH_POSITION Memos content
    
     load --table Memos
     [
     {"content": "Groonga is a full text search engine."},
     {"content": "Rroonga is the Ruby bindings of Groonga."},
     {"content": "Mroonga is a MySQL storage engine based of Groonga."}
     ]
    
     select Memos \
       --command_version 3 \
       --filter 'content @ "Groonga" *~2.5 content @ "Mroonga"' \
       --output_columns 'content, _score' \
       --sort_keys -_score,_id
     {
       "header": {
         "return_code": 0,
         "start_time": 1624605205.641078,
         "elapsed_time": 0.002965450286865234
       },
       "body": {
         "n_hits": 3,
         "columns": [
           {
             "name": "content",
             "type": "ShortText"
           },
           {
             "name": "_score",
             "type": "Float"
           }
         ],
         "records": [
           [
             "Groonga is a full text search engine.",
             1.0
           ],
           [
             "Rroonga is the Ruby bindings of Groonga.",
             1.0
           ],
           [
             "Mroonga is a MySQL storage engine based of Groonga.",
             -1.5
           ]
         ]
       }
     }
     ```
    

    We can do the same by also useing adjuster . If we use adjuster , we need to make --filter condition and --adjuster conditon on our application, but we make only --filter condition on it by this improvement.

    We can also describe filter condition as below by using query().

    • --filter 'content @ "Groonga" *~2.5 content @ "Mroonga"'
  • select Added support for && with weight.

    We can use && with weight by using *< or *>. Default weight of *< is 0.5. Default weight of *> is 2.0.

    We can specify score quantity by *<${score_quantity} and *>${score_quantity}. Then, if we specify *<${score_quantity}, a plus or minus sign of ${score_quantity} is reverse.

    For example, 'content @ "Groonga" *<2.5 query("content", "MySQL")' is as below.

    1. Extract records that match 'content @ "Groonga" and content @ "Mroonga"'.
    2. Add a score as below.
    a. Calculate the score of 'content @ "Groonga"'.
    b. Calculate the score of 'query("content", "MySQL")'.
    c. b's score multiplied by -2.5 by *<.
    d. The socre of this record is a + b
       Therefore, if a's socre is 1 and b's score is 1, the score of this record is 1 + (1 * -2.5) = -1.5.
    

    In particular, the following query adjust the score of match records by the following condition( 'content @ "Groonga" *<2.5 query("content", "Mroonga")' ).

     ```
     table_create Memos TABLE_NO_KEY
     column_create Memos content COLUMN_SCALAR ShortText
    
     table_create Terms TABLE_PAT_KEY ShortText \
       --default_tokenizer TokenBigram \
       --normalizer NormalizerAuto
     column_create Terms index COLUMN_INDEX|WITH_POSITION Memos content
    
     load --table Memos
     [
     {"content": "Groonga is a full text search engine."},
     {"content": "Rroonga is the Ruby bindings of Groonga."},
     {"content": "Mroonga is a MySQL storage engine based of Groonga."}
     ]
    
     select Memos \
       --command_version 3 \
       --filter 'content @ "Groonga" *<2.5 query("content", "Mroonga")' \
       --output_columns 'content, _score' \
       --sort_keys -_score,_id
     {
       "header": {
         "return_code": 0,
         "start_time": 1624605205.641078,
         "elapsed_time": 0.002965450286865234
       },
       "body": {
         "n_hits": 3,
         "columns": [
           {
             "name": "content",
             "type": "ShortText"
           },
           {
             "name": "_score",
             "type": "Float"
           }
         ],
         "records": [
           [
             "Groonga is a full text search engine.",
             1.0
           ],
           [
             "Rroonga is the Ruby bindings of Groonga.",
             1.0
           ],
           [
             "Mroonga is a MySQL storage engine based of Groonga.",
             -1.5
           ]
         ]
       }
     }
     ```
    
  • Log Added support for outputting to stdout and stderr.

    Process-log and Query-log supported output to stdout and stderr.

    • If we specify as --log-path -, --query-log-path -, Groonga output log to stdout.
    • If we specify as --log-path +, --query-log-path +, Groonga output log to stderr.

    Process-log is for all of Groonga works. Query-log is just for query processing.

    This feature is useful when we execute Groonga on Docker. Docker has the feature that records stdout and stderr in standard. Therefore, we don't need to login into the environment of Docker to get Groonga's log.

    For example, this feature is useful as he following case.

    • If we want to analyze slow queries of Groonga on Docker.

      If we specify --query-log-path - when startup Groonga, we can analyze slow queries by only execution the following commands.

      • docker logs ${container_name} | groonga-query-log-analyze

    By this, we can analyze slow query with the query log that output from Groonga on Docker simply.

  • [Documentation] Filled missing documentation of string_substring.

Known Issues

  • Currently, Groonga has a bug that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column.

  • [The browser based administration tool] Currently, Groonga has a bug that a search query that is inputted to non-administration mode is sent even if we input checks to the checkbox for the administration mode of a record list.

  • *< and *> only valid when we use query() the right side of filter condition. If we specify as below, *< and *> work as &&.

    • 'content @ "Groonga" *< content @ "Mroonga"'

Conclusion

Please refert to the following news for more details.

News Release 11.0.4

Let's search by Groonga!