Skip to main content

Ctrl+K

GitHub
Twitter
Blog

GitHub
Twitter
Blog

Section Navigation

7.1. Executables
7.2. Output
7.3. Command
7.4. Data types
7.5. Tables
7.6. Column
7.7. Normalizers
7.8. Tokenizers
7.9. Token filters
7.10. Query expanders
- 7.10.1. QueryExpanderTSV
7.11. Scorer
- 7.11.3.1. scorer_tf_at_most
- 7.11.3.2. scorer_tf_idf
7.12. Cast
7.13. grn_expr
- 7.13.1. Query syntax
- 7.13.2. Script syntax
7.14. Regular expression
7.15. Function
7.16. Window function
7.17. Operations
- 7.17.1. Geolocation search
- 7.17.2. Prefix RK search
7.18. Configuration
7.19. Alias
7.20. Suggest
7.21. Indexing
7.22. Sharding
7.23. Log
7.24. Tuning
7.25. API

7. Reference manual
7.8. Tokenizers

7.8.17. `TokenTrigram`#

7.8.17.1. Summary#

TokenTrigram is similar to TokenBigram. The differences between them is token unit.

7.8.17.2. Syntax#

TokenTrigram hasn’t parameter:

TokenTrigram

7.8.17.3. Usage#

If normalizer is used, TokenTrigram uses white-space-separate like tokenize method for ASCII characters. TokenTrigram uses trigram tokenize method for non-ASCII characters.

If TokenTrigram tokenize non-ASCII charactors, TokenTrigram uses 3 character per token as below example.

Execution example:

tokenize TokenTrigram "日本語の勉強" NormalizerAuto
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     {
#       "value": "日本語",
#       "position": 0,
#       "force_prefix": false,
#       "force_prefix_search": false
#     },
#     {
#       "value": "本語の",
#       "position": 1,
#       "force_prefix": false,
#       "force_prefix_search": false
#     },
#     {
#       "value": "語の勉",
#       "position": 2,
#       "force_prefix": false,
#       "force_prefix_search": false
#     },
#     {
#       "value": "の勉強",
#       "position": 3,
#       "force_prefix": false,
#       "force_prefix_search": false
#     },
#     {
#       "value": "勉強",
#       "position": 4,
#       "force_prefix": false,
#       "force_prefix_search": false
#     },
#     {
#       "value": "強",
#       "position": 5,
#       "force_prefix": false,
#       "force_prefix_search": false
#     }
#   ]
# ]

previous

7.8.16. TokenTable

next

7.8.18. TokenUnigram

On this page

7.8.17.1. Summary
7.8.17.2. Syntax
7.8.17.3. Usage

© Copyright 2009-2024 Groonga Project.

Built with the PyData Sphinx Theme 0.15.2.