4.7. match_columns parameter#

4.7.1. Full-text search against multiple columns#

Groonga supports full-text search against multiple columns. Let’s consider blog site. Usually, blog site has a table which contains title column and content column. How do you search the blog entry which contains specified keywords in title or content?

In such a case, there are two ways to create indexes. One way is creating column index against each column. The other way is creating one column index against multiple columns. Either way, Groonga supports similar full-text search syntax.

4.7.1.1. Creating column index against each column#

Here is the example which create column index against each column.

First, create Blog1 table, add title column which stores title string, message column which stores content of blog entry.

Then create IndexBlog1 table for column indexes, add index_title column for title column, index_message column for message column.

Execution example:

table_create --name Blog1 --flags TABLE_HASH_KEY --key_type ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create --table Blog1 --name title --flags COLUMN_SCALAR --type ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create --table Blog1 --name message --flags COLUMN_SCALAR --type ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
table_create --name IndexBlog1 --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0,1337566253.89858,0.000355720520019531],true]
column_create --table IndexBlog1 --name index_title --flags COLUMN_INDEX|WITH_POSITION --type Blog1 --source title
# [[0,1337566253.89858,0.000355720520019531],true]
column_create --table IndexBlog1 --name index_message --flags COLUMN_INDEX|WITH_POSITION --type Blog1 --source message
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Blog1
[
{"_key":"grn1","title":"Groonga test","message":"Groonga message"},
{"_key":"grn2","title":"baseball result","message":"rakutan eggs 4 - 4 Groonga moritars"},
{"_key":"grn3","title":"Groonga message","message":"none"}
]
# [[0,1337566253.89858,0.000355720520019531],3]

match_columns option of select command accepts multiple columns as search target. Specify query string to query option. Then you can do full-text search title and content of blog entries.

Let’s try to search blog entries.

Execution example:

select --table Blog1 --match_columns title||message --query groonga
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "message",
#           "ShortText"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "grn1",
#         "Groonga message",
#         "Groonga test"
#       ],
#       [
#         3,
#         "grn3",
#         "none",
#         "Groonga message"
#       ],
#       [
#         2,
#         "grn2",
#         "rakutan eggs 4 - 4 Groonga moritars",
#         "baseball result"
#       ]
#     ]
#   ]
# ]
select --table Blog1 --match_columns title||message --query message
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "message",
#           "ShortText"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ],
#       [
#         3,
#         "grn3",
#         "none",
#         "Groonga message"
#       ],
#       [
#         1,
#         "grn1",
#         "Groonga message",
#         "Groonga test"
#       ]
#     ]
#   ]
# ]
select --table Blog1 --match_columns title --query message
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "message",
#           "ShortText"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ],
#       [
#         3,
#         "grn3",
#         "none",
#         "Groonga message"
#       ]
#     ]
#   ]
# ]

4.7.1.2. Creating one column index against multiple columns#

Groonga also supports one column index against multiple columns.

The difference for previous example is only one column index exists. Thus, There is one common column index against title and message column.

Even though same column index is used, Groonga supports to search against title column only, message column only and title or message column.

Execution example:

table_create --name Blog2 --flags TABLE_HASH_KEY --key_type ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create --table Blog2 --name title --flags COLUMN_SCALAR --type ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create --table Blog2 --name message --flags COLUMN_SCALAR --type ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
table_create --name IndexBlog2 --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0,1337566253.89858,0.000355720520019531],true]
column_create --table IndexBlog2 --name index_blog --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION --type Blog2 --source title,message
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Blog2
[
{"_key":"grn1","title":"Groonga test","message":"Groonga message"},
{"_key":"grn2","title":"baseball result","message":"rakutan eggs 4 - 4 Groonga moritars"},
{"_key":"grn3","title":"Groonga message","message":"none"}
]
# [[0,1337566253.89858,0.000355720520019531],3]

Let’s search same query in previous section. You can get same search results.

Execution example:

select --table Blog2 --match_columns title||message --query groonga
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "message",
#           "ShortText"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "grn1",
#         "Groonga message",
#         "Groonga test"
#       ],
#       [
#         2,
#         "grn2",
#         "rakutan eggs 4 - 4 Groonga moritars",
#         "baseball result"
#       ],
#       [
#         3,
#         "grn3",
#         "none",
#         "Groonga message"
#       ]
#     ]
#   ]
# ]
select --table Blog2 --match_columns title||message --query message
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         2
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "message",
#           "ShortText"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "grn1",
#         "Groonga message",
#         "Groonga test"
#       ],
#       [
#         3,
#         "grn3",
#         "none",
#         "Groonga message"
#       ]
#     ]
#   ]
# ]
select --table Blog2 --match_columns title --query message
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "message",
#           "ShortText"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ],
#       [
#         3,
#         "grn3",
#         "none",
#         "Groonga message"
#       ]
#     ]
#   ]
# ]

Note

There may be a question that “which is the better solution for indexing.” It depends on the case.

  • Indexes for each column - The update performance tends to be better than multiple colum index because there is enough buffer for updating. On the other hand, the efficiency of disk usage is not so good.

  • Indexes for multiple column - It saves disk usage because it shares common buffer. On the other hand, the update performance is not so good.

4.7.2. Full text search with specific index name#

Groonga also supports full text search with specific index name.

In this section, you learn how to use specific index column efficiently.

Here is the concrete example about specific index name.

Execution example:

table_create Entries TABLE_HASH_KEY ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries title COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Entries body COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
load --table Entries
[
{"_key": "http://example.com/entry1", "title":"Hello Groonga.", "body":"This is my first entry."},
{"_key": "http://example.com/entry2", "title":"Hello world.", "body":"I love Groonga!"},
{"_key": "http://example.com/entry3", "title":"Hello Mroonga, bye Groonga.", "body":"I use Mroonga."},
{"_key": "http://example.com/entry4", "title":"Say, Hello Groonga!", "body":"I'm back."}
]
# [[0,1337566253.89858,0.000355720520019531],4]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_title COLUMN_INDEX|WITH_POSITION Entries title
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_body COLUMN_INDEX|WITH_POSITION Entries body
# [[0,1337566253.89858,0.000355720520019531],true]
column_create Terms entries_whole COLUMN_INDEX|WITH_POSITION|WITH_SECTION Entries title,body
# [[0,1337566253.89858,0.000355720520019531],true]

The table which stores entries has columns for title and body. And the terms table has index columns for title and body to entries table.

There are three index columns in terms table.

  • entries_title: index column for title

  • entries_body: index column for body

  • entries_whole: index column for title and body

If you specify index column which is related to specific data column, related data column is searched with that index implicitly.

For example, if you want to search title or body only, specify Terms.entries_title or Terms.entries_body index column.

Execution example:

select --table Entries --output_columns title --match_columns Terms.entries_title --query "Groonga"
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "title",
#           "ShortText"
#         ]
#       ],
#       [
#         "Hello Groonga."
#       ],
#       [
#         "Hello Mroonga, bye Groonga."
#       ],
#       [
#         "Say, Hello Groonga!"
#       ]
#     ]
#   ]
# ]

This example uses Terms.entries_title as index, then search “Groonga” against title data column.

Execution example:

select --table Entries --output_columns body --match_columns Terms.entries_body --query "Groonga"
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "body",
#           "ShortText"
#         ]
#       ],
#       [
#         "I love Groonga!"
#       ]
#     ]
#   ]
# ]

This example uses Terms.entries_body as index, then search “Groonga” against body data column.

If you specify multiple index column which is related to specific data columns, you can also specify data column name as suffix. It means that “Use specific index and search specific data column explicitly”.

For example, if you want to search title or body only with entries_whole index, specify Terms.entries_whole.title or Terms.entries_whole.body. It uses Terms.entries_whole index and search title column or body column explicitly.

Execution example:

select --table Entries --output_columns title --match_columns Terms.entries_whole.title --query "Groonga"
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "title",
#           "ShortText"
#         ]
#       ],
#       [
#         "Hello Groonga."
#       ],
#       [
#         "Hello Mroonga, bye Groonga."
#       ],
#       [
#         "Say, Hello Groonga!"
#       ]
#     ]
#   ]
# ]

This example uses Terms.entries_whole as index, then search “Groonga” against title data column.

Execution example:

select --table Entries --output_columns body --match_columns Terms.entries_whole.body --query "Groonga"
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "body",
#           "ShortText"
#         ]
#       ],
#       [
#         "I love Groonga!"
#       ]
#     ]
#   ]
# ]

This example uses Terms.entries_whole as index, then search “Groonga” against body data column.

4.7.4. Indexes with Weight#

If index columns are created for data columns, you can search by indexes with weight. For example, let’s try to search blog entries by indexes with weight which contains Groonga as important keyword in Blog1 table. Generally speaking, an important keyword tend to be included in blog title, so if title column contains Groonga, its score ( _score ) must be raised in contrast to message column. The indexes with weight is used for such a purpose.

Here is the example which search blog entries with Groonga as important keyword in title or message columns.

The sample schema and data is same as Creating column index against each column.

Execution example:

select --table Blog1 --match_columns 'IndexBlog1.index_title * 10 || IndexBlog1.index_message' --query 'Groonga' --output_columns "_id, _score, *"
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_score",
#           "Int32"
#         ],
#         [
#           "message",
#           "ShortText"
#         ],
#         [
#           "title",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         11,
#         "Groonga message",
#         "Groonga test"
#       ],
#       [
#         3,
#         10,
#         "none",
#         "Groonga message"
#       ],
#       [
#         2,
#         1,
#         "rakutan eggs 4 - 4 Groonga moritars",
#         "baseball result"
#       ]
#     ]
#   ]
# ]

In above query, 'IndexBlog1.index_title * 10 || IndexBlog1.index_message' is specified for --match_columns. It means that if title column (search title column using IndexBlog1.index_title index) matches to Groonga, its weight is multiplied to 10 and if message column (search message column using IndexBlog1.index_message index) matches to Groonga, its weight is 1 (default). If Groonga matches to title and message, its weight is 11 (10 + 1) in this case.

As a result, Groonga test blog entry is listed in first.