BloGroonga

2017-08-29

Groonga 7.0.6 has been released

Groonga 7.0.6 has been released!

How to install: Install

Changes

Here are important changes in this release:

  • object_inspect command has been supported to show disk usage
  • Falllback feature when parsing query has been supported
  • The score adjusting about keyword in query has been supported

object_inspect command has been supported to show disk usage

In this release, object_inspect command has been supported to show disk usage.

In the previous versions, there is no easy way to calculate the disk usage about each objects such as tables, index columns and so on.

object_inspect command returns disk_usage parameter in response. It returns size in bytes.

table_create --name Site --flags TABLE_HASH_KEY --key_type ShortText
column_create --table Site --name title --type ShortText
load --table Site
[
{"_key":"http://example.org/","title":"This is test record 1!"},
{"_key":"http://example.net/","title":"test record 2."},
{"_key":"http://example.com/","title":"test test record three."},
{"_key":"http://example.net/afr","title":"test record four."},
{"_key":"http://example.org/aba","title":"test test test record five."},
{"_key":"http://example.com/rab","title":"test test test test record six."},
{"_key":"http://example.net/atv","title":"test test test record seven."},
{"_key":"http://example.org/gat","title":"test test record eight."},
{"_key":"http://example.com/vdw","title":"test test record nine."},
]
table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
column_create --table Terms --name blog_title --flags COLUMN_INDEX|WITH_POSITION --type Site --source title

Execute the following command to check the disk usage about Terms table.

object_inspect --output_pretty yes Terms

Then, object_inspect command returns the following result.

{
  "id": 258,
  "name": "Terms",
  "type": {
    "id": 49,
    "name": "table:pat_key"
  },
  "key": {
    "type": {
      "id": 14,
      "name": "ShortText",
      "type": {
        "id": 32,
        "name": "type"
      },
      "size": 4096
    },
    "total_size": 21,
    "max_total_size": 4294967294
  },
  "value": {
    "type": null
  },
  "n_records": 15,
  "disk_usage": 8437760
}

It turns out that the disk usage is "disk_usage": 8437760.

Let's check the disk usage about index column.

Execute the following command to check blog_title column in Terms table.

object_inspect --output_pretty yes Terms.blog_title

object_inspect command returns the following result.

{
  "id": 259,
  "name": "blog_title",
  "table": {
    "id": 258,
    "name": "Terms",
    "type": {
      "id": 49,
      "name": "table:pat_key"
    },
  (省略)
  ],
  "disk_usage": 5283840
}

It turns out that the disk usage is "disk_usage": 5283840.

Falllback feature when parsing query has been supported

It is enabled when QUERY_NO_SYNTAX_ERROR flag is set to query_flags.

This feature is disabled by default.

If this flag is set, query never causes syntax error. For example, "A +" is parsed and escaped automatically into "A" and "+". This behavior is useful when application uses user input directly and doesn't want to show syntax error to user and in log.

Here is the example how to use QUERY_NO_SYNTAX_ERROR.

table_create --name Magazine --flags TABLE_HASH_KEY --key_type ShortText
column_create --table Magazine --name title --type ShortText
load --table Magazine
[
{"_key":"http://gihyo.jp/magazine/wdpress","title":"WEB+DB PRESS"},
{"_key":"http://gihyo.jp/magazine/SD","title":"Software Design"},
]
table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
column_create --table Terms --name title --flags COLUMN_INDEX|WITH_POSITION --type Magazine --source title

Let's search by keyword - WEB +.

select Magazine --output_pretty yes --query 'WEB +' --match_columns title"

It causes an syntax error.

[
  [
    -63,
    1503902587.063566,
    0.0007965564727783203,
    "Syntax error: <WEB +||>",
    [
      [
        "yy_syntax_error",
        "grn_ecmascript.lemon",
        37
      ]
    ]
  ]
]

Let's try with QUERY_NO_SYNTAX_ERROR flag.

select Magazine --output_pretty yes --match_columns title --query 'WEB +'  --query_flags ALLOW_PRAGMA|ALLOW_COLUMN|QUERY_NO_SYNTAX_ERROR

It turns out that there is no syntax error.

[
  [
    0,
    1503902343.382929,
    0.0419621467590332
  ],
  [
    [
      [
        1
      ],
      [
        [
          "_id",
          "UInt32"
        ],
        [
          "_key",
          "ShortText"
        ],
        [
          "title",
          "ShortText"
        ]
      ],
      [
        1,
        "http://gihyo.jp/magazine/wdpress",
        "WEB+DB PRESS"
      ]
    ]
  ]
]

With QUERY_NO_SYNTAX_ERROR flag in query, The keyword in above query is parsed into WEB and +. So, it doesn't cause an syntax error.

The score adjusting about keyword in query has been supported

In this release, The feature which adjusts score for term in query has been supported. Actually, >, <, and ~ operators are supported.

For example, >Groonga increments score of Groonga, <Groonga decrements score of Groonga. ~Groonga decreases score of matched document in the current search result. ~ operator doesn't change search result itself.

Here is the sample to show usage.

table_create --name Shops --flags TABLE_NO_KEY
column_create --table Shops --name keyword --type ShortText
load --table Shops
[
{"keyword":"restraunt western food"},
{"keyword":"restraunt japanese food"},
{"keyword":"restraunt chinese food"},
{"keyword":"cafe western food"},
]

Let's search restraunt by the following query.

select Shops --output_pretty yes --match_columns keyword --output_columns keyword,_score --sort_keys -_score --query 'restraunt'

It returns the following result.

[
  [
    3
  ],
  [
    [
      "keyword",
      "ShortText"
    ],
    [
      "_score",
      "Int32"
    ]
  ],
  [
    "restraunt western food",
    1
  ],
  [
    "restraunt chinese food",
    1
  ],
  [
    "restraunt japanese food",
    1
  ]
]

The query returns response which contains same score - 1.

Let's search japanese food with > to adjust score.

select Shops --output_pretty yes --match_columns keyword --output_columns keyword,_score --sort_keys -_score --query 'restraunt (>japanese OR western OR chinese)'

[
  [
    3
  ],
  [
    [
      "keyword",
      "ShortText"
    ],
    [
      "_score",
      "Int32"
    ]
  ],
  [
    "restraunt japanese food",
    8
  ],
  [
    "restraunt chinese food",
    2
  ],
  [
    "restraunt western food",
    2
  ]
]

Now that score of japanese food is largest in the tree restraunt.

Then, try to adjust score with < to raise western food.

select Shops --output_pretty yes --match_columns keyword --output_columns keyword,_score --sort_keys -_score --query 'restraunt (>japanese OR <western OR chinese)'

[
  [
    3
  ],
  [
    [
      "keyword",
      "ShortText"
    ],
    [
      "_score",
      "Int32"
    ]
  ],
  [
    "restraunt japanese food",
    8
  ],
  [
    "restraunt western food",
    7
  ],
  [
    "restraunt chinese food",
    2
  ]
]

As you can see, the score is adjustable by <, and > combination with each keyword.

Conclusion

See Release 7.0.6 2017-08-29 about detailed changes since 7.0.5.

Let's search by Groonga!