7.3.2. Parallel execution#

7.3.2.1. Summary#

Groonga executes serially by default. However, by specifying the option you can execute in parallel.

The next section shows you how to set up for parallel execution. Please read the notes before using this option.

7.3.2.2. How to use#

7.3.2.2.1. Set per Groonga command#

Only works for the command to be executed.

7.3.2.2.1.1. Specified by --n_workers option of Groonga command#

Groonga command example:

load --table Data --n_workers -1
[
{"_key", "value"}
]

7.3.2.2.2. Set the default value#

If you set a default value, you do not need to specify it for each Groonga command. The default value is used for all Groonga commands.

7.3.2.2.2.1. Specified by --default-n-workers option of groonga executable file#

Execution example:

$ groonga --default-n-workers -1 DB_PATH status

7.3.2.2.2.2. Specified by environment variable GRN_N_WORKERS_DEFAULT#

Execution example:

$ GRN_N_WORKERS_DEFAULT=-1 groonga DB_PATH status

7.3.2.2.3. Available Values#

You can set the number of parallels. If you specify -1 or 2 or more, it will execute in parallel.

n_workers

Behavior

When specifying 0 or 1

Execute in serial.

When specifying 2 or more

Execute in parallel with at most the specified number of threads.

When specifying -1

Execute in parallel with the threads of at most the number of CPU cores.

7.3.2.3. Check the settings#

You can check it by the value of n_workers and default_n_workers in the status command.

Execution example:

status
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "alloc_count": 29,
#     "starttime": 1696558618,
#     "start_time": 1696558618,
#     "uptime": 1,
#     "version": "2.9.1",
#     "n_queries": 0,
#     "cache_hit_rate": 0.0,
#     "command_version": 1,
#     "default_command_version": 1,
#     "max_command_version": 3,
#     "n_jobs": 0,
#     "features": {
#       "nfkc": true,
#       "mecab": true,
#       "message_pack": true,
#       "mruby": true,
#       "onigmo": true,
#       "zlib": true,
#       "lz4": true,
#       "zstandard": true,
#       "kqueue": false,
#       "epoll": true,
#       "poll": false,
#       "rapidjson": false,
#       "apache_arrow": true,
#       "xxhash": true,
#       "blosc": true,
#       "bfloat16": true,
#       "h3": true,
#       "simdjson": true,
#       "llama.cpp": true,
#       "back_trace": true,
#       "reference_count": false
#     },
#     "apache_arrow": {
#       "version_major": 2,
#       "version_minor": 9,
#       "version_patch": 1,
#       "version": "2.9.1"
#     },
#     "memory_map_size": 2929,
#     "n_workers": 0,
#     "default_n_workers": 0,
#     "os": "Linux",
#     "cpu": "x86_64"
#   }
# ]

n_workers is per Groonga command value. default_n_workers is the default value.

7.3.2.4. Notes#

7.3.2.4.1. Apache Arrow is required#

This feature requires that Apache Arrow is enabled in Groonga.

It depends on package provider whether Apache Arrow is enabled or not.

To check whether Apache Arrow is enabled, you can use status command that show the result of apache_arrow is true or not.

7.3.2.4.2. For use as a daemon process#

For example, consider using Groonga HTTP server on a system with 6 CPUs.

Groonga HTTP server allocates 1 thread (= 1 CPU) for each request.

When the average number of concurrent connections is 6, there are no free CPU resources because 6 CPUs are already in use. All the CPU is used to process each request.

When the average number of concurrent connections is 2, there are 4 free CPU resources because only 2 CPUs are already in use. When specifying 2 for n_workers, it uses at most 3 CPUs, including the thread for processing requests. Therefore, if two requests to Groonga process with 2 specified for n_workers are requested at the same time, they will use at most 6 CPUs in total and will be processed fastly by using all of the resources. When specifying greater than 2, the degree of parallelism can be higher than the CPU resources, so it may actually slow down the execution time.

7.3.2.5. Parallel execution support#