7.3.11. column_create

7.3.11.1. Summary

column_create creates a new column in a table.

You need to create one or more columns to store multiple data in one record.

Groonga provides an index as a column. It's different from other systems. An index is just an index in other systems. Implementing an index as a column provides flexibility. For example, you can add metadata to each token.

See Column for column details.

7.3.11.2. Syntax

This command takes many parameters.

Most parameters are required:

column_create table
              name
              flags
              type
              [source=null]

7.3.11.3. Usage

This section describes about the followings:

Here is the People table definition. The People table is used in examples:

Execution example:

table_create \
  --name People \
  --flags TABLE_HASH_KEY \
  --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

7.3.11.3.1. Create a scalar column

Groonga provides scalar column to store one value. For example, scalar column should be used for storing age into a person record. Because a person record must have only one age.

If you want to store multiple values into a record, scalar column isn't suitable. Use Create a vector column instead.

You must specify COLUMN_SCALAR to the flags parameter to create a scalar column.

Here is an example to create the age column to the People table. age column is a scalar column. It can store one UInt8 (0-255) value as its value:

Execution example:

column_create \
  --table People \
  --name age \
  --flags COLUMN_SCALAR \
  --type UInt8
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can store one value (7) by the following load command:

Execution example:

load --table People
[
{"_key": "alice", "roles": ["adventurer", "younger-sister"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

You can confirm the stored one value (7) by the following select command:

Execution example:

select --table People
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "age",
#           "UInt8"
#         ]
#       ],
#       [
#         1,
#         "alice",
#         7
#       ]
#     ]
#   ]
# ]

7.3.11.3.2. Create a vector column

Groonga provides vector column to store multiple values. For example, vector column may be used for storing roles into a person record. Because a person record may have multiple roles.

If you want to store only one value into a record, vector column isn't suitable. Use Create a scalar column instead.

You must specify COLUMN_VECTOR to the flags parameter to create a vector column.

Here is an example to create the roles column to the People table. roles column is a vector column. It can store zero or more ShortText values as its value:

Execution example:

column_create \
  --table People \
  --name roles \
  --flags COLUMN_VECTOR \
  --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can store multiple values (["adventurer", "younger-sister"]) by the following load command:

Execution example:

load --table People
[
{"_key": "alice", "roles": ["adventurer", "younger-sister"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

You can confirm the stored multiple values (["adventurer", "younger-sister"]) by the following select command:

Execution example:

select --table People
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "age",
#           "UInt8"
#         ],
#         [
#           "roles",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "alice",
#         7,
#         [
#           "adventurer",
#           "younger-sister"
#         ]
#       ]
#     ]
#   ]
# ]

7.3.11.3.3. Create a weight vector column

TODO: See also Weight vector column and adjuster.

7.3.11.3.4. Create a column that refers a table's record

Both scalar column and vector column can store reference to record of an existing table as column value. It's useful to store relationship between records.

For example, using a column that refers a person record is better for storing a character into a book record. Because one person may be appeared in some books.

You must specify table name to be referenced to the type parameter to create a column that refers a table's record.

Here is an example to create the character column to the Books table. The character column refers the People table. It can store one People table's record.

Here is the Books table definition:

Execution example:

table_create \
  --name Books \
  --flags TABLE_HASH_KEY \
  --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is the character column definition in the Books table. --type People is important:

Execution example:

column_create \
  --table Books \
  --name character \
  --flags COLUMN_SCALAR \
  --type People
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can store one reference ("alice") by the following load command. You can use key value (People._key value) for referring a record:

Execution example:

load --table Books
[
{"_key": "Alice's Adventure in Wonderland", "character": "alice"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

You can confirm the stored reference ("alice" record) by the following select command. It retrieves the age column and the roles column values:

Execution example:

select \
 --table Books \
 --output_columns _key,character._key,character.age,character.roles
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "character._key",
#           "ShortText"
#         ],
#         [
#           "character.age",
#           "UInt8"
#         ],
#         [
#           "character.roles",
#           "ShortText"
#         ]
#       ],
#       [
#         "Alice's Adventure in Wonderland",
#         "alice",
#         7,
#         [
#           "adventurer",
#           "younger-sister"
#         ]
#       ]
#     ]
#   ]
# ]

7.3.11.3.5. Create an index column

Groonga provides index column for fast search. It doesn't store your data. It stores data for fast search.

You don't need to update index column by yourself. Index column is updated automatically when you store data into a data column (scalar column or vector column) that is marked as index target column. You can set multiple columns as index target columns to one index column.

If Groonga has an index column for the age column of the People table, Groonga can do fast equal search, fast comparison search and fast range search against age column values.

You must specify the following parameters to create an index column:

  • The flags parameter: COLUMN_INDEX
  • The type parameter: The table name of index target column such as People
  • The source parameter: The index target column name such as age

You don't need additional flags to the flags parameter for equal search, comparison search and range search index column. You need additional flags to the flags parameter for full text search index column or multiple column index column. See Create an index column for full text search and Create a multiple columns index column for details.

Here is an example to create an index column for the age column of the People table.

First, you need to create a table for range index column. See Create range index table for details. This example creates the Ages table as TABLE_PAT_KEY:

Execution example:

table_create \
  --name Ages \
  --flags TABLE_PAT_KEY \
  --key_type UInt8
# [[0, 1337566253.89858, 0.000355720520019531], true]

Now, you can create an index column for the age column of the People table. COLUMN_INDEX in the flags parameter, People in the type parameter and age in the source parameter are important:

Execution example:

column_create \
  --table Ages \
  --name people_age_index \
  --flags COLUMN_INDEX \
  --type People \
  --source age
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can confirm that age > 5 is evaluated by the Ages.people_age_index newly created index column from log. Groonga reports used index columns in info log level. You can change log level dynamically by log_level command.

Execution example:

log_level --level info
# [[0, 1337566253.89858, 0.000355720520019531], true]
select \
  --table People \
  --filter 'age > 5'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "age",
#           "UInt8"
#         ],
#         [
#           "roles",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "alice",
#         7,
#         [
#           "adventurer",
#           "younger-sister"
#         ]
#       ]
#     ]
#   ]
# ]
# log: 2016-10-13 19:10:02.292711|i| [table][select][index][range] <Ages.people_age_index>
log_level --level notice
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can confirm that the Ages.people_age_index is used from the following log:

[table][select][index][range] <Ages.people_age_index>

The log says Ages.people_age_index index column is used for range search.

7.3.11.3.7. Create a multiple columns index column

You can create an index column for multiple columns. It means that you can do fast search for multiple columns with one index column. Multiple columns index column has better space efficiency than single column index column only when multiple columns have many same tokens. Multiple columns index column may be slower than single column index column. Because multiple columns index column will be a bigger index column.

You can't use multiples columns in different tables as index target columns in the same multiple columns index column. You must specify columns in the same tables as index target columns to one multiple columns index column. For example, you can't create a multiple columns index for People._key and Books._key because they are columns of different tables. You can create a multiple columns index column for People._key and People.roles because they are columns of the same table.

There is a difference between for single column index column and for multiple columns index column. You need to add WITH_SECTION to the flags parameter. It means that you need to specify COLUMN_INDEX|WITH_SECTION to the flags parameter. It's the difference.

If you want to create a multiple columns index column for full text search, you need to specify COLUMN_INDEX|WITH_POSITION|WITH_SECTION to the flags parameter. See Create an index column for full text search for full text search index column details.

Here is an example to create a multiple columns full text search index column for the key of the People table and the roles column of the People table.

There is no difference between index table for single column index column and multiple columns index column. In this example, the Terms table created at Create an index column for full text search is used.

You can create a multiple columns full text search index column for the key of the People table and roles column of the People table. COLUMN_INDEX|WITH_POSITION|WITH_SECTION in the flags parameter, People in the type parameter and _key,roles in the source parameter are important:

Execution example:

column_create \
  --table Terms \
  --name people_key_roles_index \
  --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION \
  --type People \
  --source _key,roles
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can confirm that --match_columns _key and --query Alice are evaluated by the Terms.people_key_roles_index newly created multiple columns full text search index column from log. Groonga reports used index columns in info log level. You can change log level dynamically by log_level command.

Execution example:

log_level --level info
# [[0, 1337566253.89858, 0.000355720520019531], true]
select \
  --table People \
  --match_columns _key,roles \
  --query Alice
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_id",
#           "UInt32"
#         ],
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "age",
#           "UInt8"
#         ],
#         [
#           "roles",
#           "ShortText"
#         ]
#       ],
#       [
#         1,
#         "alice",
#         7,
#         [
#           "adventurer",
#           "younger-sister"
#         ]
#       ]
#     ]
#   ]
# ]
# log: 2016-10-13 19:10:03.513171|i| [object][search][index][key][exact] <Terms.people_key_roles_index>
# log: 2016-10-13 19:10:03.513194|i| grn_ii_sel > (Alice)
# log: 2016-10-13 19:10:03.513219|i| n=1 (Alice)
# log: 2016-10-13 19:10:03.513247|i| exact: 1
# log: 2016-10-13 19:10:03.513251|i| hits=1
log_level --level notice
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can confirm that the Terms.people_roles_index is used from the following log:

[object][search][index][key][exact] <Terms.people_key_roles_index>

The log says Terms.people_roles_index index column is used for full text search. (To be precise, the index column is used for exact term search by inverted index.)

7.3.11.3.8. Create a small index column

If you know index target data are small, you can reduce memory usage for the index column. Memory usage is 1/256 of the default index column.

How many data are small? It depends on data. Small index column can't handle 1 billion records at least. If index target is only one scalar column with no text family type (ShortText, Text or LongText), the maximum handleable records are depends of the number of kinds of index target data. If index target column has 1, 1, 2 and 3, the number of kinds of them are 3 (1 and 2 and 3). The following table shows the relationship between the number of kinds of index target data and the number of handleable records:

The number of kinds of index target data and the number of handleable records in a small index column
The number of kinds of index target data The number of hanleable records
1 16779234
2 4648070
4 7238996
8 8308622
16 11068624
32 12670817
64 18524211
128 38095511
256 51265384

You need to add INDEX_SMALL to the flags parameter such as COLUMN_INDEX|INDEX_SMALL to create a small index column.

If the People table has only 1 million records, you can use a small index column for the age column:

Execution example:

column_create \
  --table Ages \
  --name people_age_small_index \
  --flags COLUMN_INDEX|INDEX_SMALL \
  --type People \
  --source age
# [[0, 1337566253.89858, 0.000355720520019531], true]

7.3.11.3.9. Create a medium index column

If you know index target data are medium, you can reduce memory usage for the index column. Memory usage is 5/24 of the default index column.

How many data are medium? It depends on data.

If index target is only one scalar column, a medium index column can handle all records.

A medium index column may not handle all records at the following cases:

  • Index target is one text family (ShortText, Text or LongText) scalar column
  • Index target is one vector column
  • Index targets are multiple columns
  • Index table has tokenizer

You need to add INDEX_MEDIUM to the flags parameter such as COLUMN_INDEX|INDEX_MEDIUM to create a medium index column.

You can use a medium index column for an index column of the age column of the People table safely. Because it's one scalar column with UInt8 type.

Here is an example to create a medium index column:

Execution example:

column_create \
  --table Ages \
  --name people_age_medium_index \
  --flags COLUMN_INDEX|INDEX_MEDIUM \
  --type People \
  --source age
# [[0, 1337566253.89858, 0.000355720520019531], true]

7.3.11.4. Parameters

This section describes all parameters.

7.3.11.4.1. Required parameters

There are some required parameters.

7.3.11.4.1.1. table

Specifies an existing table name for the new column.

7.3.11.4.1.2. name

Specifies the column name to be created.

The column name must be unique in the same table.

Here are available characters:

  • 0 .. 9 (digit)
  • a .. z (alphabet, lower case)
  • A .. Z (alphabet, upper case)
  • # (hash)
  • @ (at mark)
  • - (hyphen)
  • _ (underscore) (NOTE: Underscore can't be used as the first character.)

You need to create a name with one or more the above characters. Note that you can't use _ as the first character such as _name.

7.3.11.4.1.3. flags

Specifies the column type and column customize options.

Here are available flags:

Flag Description
COLUMN_SCALAR Scalar column. It can store one value. See also Scalar column or Create a scalar column.
COLUMN_VECTOR Vector column. It can store multiple values. See also Vector column or Create a vector column.
COLUMN_INDEX Index column. It stores data for fast search. See also Index column or Create an index column.
COMPRESS_ZLIB

It enables column value compression by zlib. You need Groonga that enables zlib support.

Compression by zlib is higher space efficiency than compression by LZ4. But compression by zlib is slower than compression by LZ4.

This flag is available only for COLUMN_SCALAR and COLUMN_VECTOR.

COMPRESS_LZ4

It enables column value compression by LZ4. You need Groonga that enables LZ4 support.

Compression by LZ4 is faster than compression by zlib. But compression by LZ4 is lower space efficiency than compression by zlib.

This flag is available only for COLUMN_SCALAR and COLUMN_VECTOR.

WITH_SECTION

It enables section support to index column.

If section support is enabled, you can support multiple documents in the same index column.

You must specify this flag to create a multiple columns index column. See also Create a multiple columns index column for details.

Section support requires additional spaces. If you don't need section support, you should not enable section support.

This flag is available only for COLUMN_INDEX.

WITH_WEIGHT

It enables weight support to vector column or index column.

If weight support is enabled for vector column, you can add weight for each element. If weight support is enabled for index column, you can add weight for each posting. They are useful to compute suitable search score.

You must specify this flag to use adjuster. See also Create a weight vector column for details.

Weight support requires additional spaces. If you don't need weight support, you should not enable weight support.

This flag is available only for COLUMN_VECTOR or COLUMN_INDEX.

WITH_POSITION

It enables position support to index column. It means that the index column is full inverted index. (Index column is implemented as inverted index.)

If position support is enabled, you can add position in the document for each posting. It's required for phrase search. It means that index column for full text search must enable position support because most full text search uses phrase search.

You must specify this flag to create a full text search index column. See also Create an index column for full text search for details.

Position support requires additional spaces. If you don't need position support, you should not enable position support.

This flag is available only for COLUMN_INDEX.

INDEX_SMALL

New in version 6.0.8.

It requires to create a small index column.

If index target data are small, small index column is enough. Small index column uses fewer memory than a normal index column or a medium index column. See also Create a small index column for knowing what are "small data" and how to use this flag.

This flag is available only for COLUMN_INDEX.

INDEX_MEDIUM

New in version 6.0.8.

It requires to create a medium index column.

If index target data are medium, medium index column is enough. Medium index column uses fewer memory than a normal index column. See also Create a medium index column for knowing what are "medium data" and how to use this flag.

This flag is available only for COLUMN_INDEX.

You must specify one of COLUMN_${TYPE} flags. You can't specify two or more COLUMN_${TYPE} flags. For example, COLUMN_SCALAR|COLUMN_VECTOR is invalid.

You can combine flags with | (vertical bar) such as COLUMN_INDEX|WITH_SECTION|WITH_POSITION.

7.3.11.4.1.4. type

Specifies type of the column value.

If the column is scalar column or vector column, here are available types:

  • Builtin types described in Data types
  • Tables defined by users

If the column is index column, here are available types:

  • Tables defined by users

See also the followings:

7.3.11.4.2. Optional parameters

There is an optional parameter.

7.3.11.4.2.1. source

Specifies index target columns. You can specify one or more columns to the source parameter.

This parameter is only available for index column.

You can only specify columns of the table specified as type. You can also use the _key pseudo column for specifying the table key as index target.

If you specify multiple columns to the source parameter, separate columns with , (comma) such as _key,roles.

7.3.11.5. Return value

column_create returns true as body on success such as:

[HEADER, true]

If column_create fails, column_create returns false as body:

[HEADER, false]

Error details are in HEADER.

See Output format for HEADER.

7.3.11.6. See also