7.7.2.2. NormalizerNFKC100#

7.7.2.2.1. Summary#

Added in version 8.0.2.

NormalizerNFKC100 normalizes text by Unicode NFKC (Normalization Form Compatibility Composition) for Unicode version 10.0.

This normalizer can change behavior by specifying options.

7.7.2.2.2. Syntax#

NormalizerNFKC100 has optional parameter.

No options:

NormalizerNFKC100

NormalizerNFKC100 normalizes text by Unicode NFKC (Normalization Form Compatibility Composition) for Unicode version 10.0.

Specify option:

NormalizerNFKC100("unify_kana", true)

NormalizerNFKC100("unify_kana_case", true)

NormalizerNFKC100("unify_kana_voiced_sound_mark", true)

NormalizerNFKC100("unify_hyphen", true)

NormalizerNFKC100("unify_prolonged_sound_mark", true)

NormalizerNFKC100("unify_hyphen_and_prolonged_sound_mark", true)

NormalizerNFKC100("unify_middle_dot", true)

NormalizerNFKC100("unify_katakana_v_sounds", true)

NormalizerNFKC100("unify_katakana_bu_sound", true)

NormalizerNFKC100("unify_to_romaji", true)

NormalizerNFKC100("unify_katakana_gu_small_sounds", true)

NormalizerNFKC100("unify_katakana_di_sound", true)

NormalizerNFKC100("unify_katakana_wo_sound", true)

NormalizerNFKC100("unify_katakana_zu_small_sounds", true)

NormalizerNFKC100("unify_katakana_du_sound", true)

NormalizerNFKC100("unify_katakana_trailing_o", true)

NormalizerNFKC100("unify_katakana_du_small_sounds", true)

NormalizerNFKC100("unify_kana_prolonged_sound_mark", true)

NormalizerNFKC100("unify_kana_hyphen", true)

Added in version 8.0.3: unify_middle_dot is added.

unify_katakana_v_sounds is added.

unify_katakana_bu_sound is added.

Added in version 8.0.9: unify_to_romaji is added.

Added in version 13.0.1: unify_kana_prolonged_sound_mark is added.

unify_kana_hyphen is added.

Specify multiple options:

NormalizerNFKC100("unify_to_romaji", true, "unify_kana_case", true, "unify_hyphen_and_prolonged_sound_mark", true)

NormalizerNFKC100 also specify multiple options as above. You can also specify mingle multiple options except above example.

7.7.2.2.3. Usage#

7.7.2.2.3.1. Simple usage#

Here is an example of NormalizerNFKC100. NormalizerNFKC100 normalizes text by Unicode NFKC (Normalization Form Compatibility Composition) for Unicode version 10.0.

Execution example:

normalize NormalizerNFKC100 "©" WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "©",
#     "types": [
#       "emoji",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_kana option.

This option enables that same pronounced characters in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character as below.

Execution example:

normalize   'NormalizerNFKC100("unify_kana", true)'   "あイウェおヽヾ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "あいうぇおゝゞ",
#     "types": [
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_kana_case option.

This option enables that large and small versions of same letters in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character as below.

Execution example:

normalize   'NormalizerNFKC100("unify_kana_case", true)'   "ぁあぃいぅうぇえぉおゃやゅゆょよゎわゕかゖけ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "ああいいううええおおややゆゆよよわわかかけけ",
#     "types": [
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Execution example:

normalize   'NormalizerNFKC100("unify_kana_case", true)'   "ァアィイゥウェエォオャヤュユョヨヮワヵカヶケ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "アアイイウウエエオオヤヤユユヨヨワワカカケケ",
#     "types": [
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_kana_voiced_sound_mark option.

This option enables that letters with/without voiced sound mark and semi voiced sound mark in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character as below.

Execution example:

normalize   'NormalizerNFKC100("unify_kana_voiced_sound_mark", true)'   "かがきぎくぐけげこごさざしじすずせぜそぞただちぢつづてでとどはばぱひびぴふぶぷへべぺほぼぽ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "かかききくくけけここささししすすせせそそたたちちつつててととはははひひひふふふへへへほほほ",
#     "types": [
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "hiragana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Execution example:

normalize   'NormalizerNFKC100("unify_kana_voiced_sound_mark", true)'   "カガキギクグケゲコゴサザシジスズセゼソゾタダチヂツヅテデトドハバパヒビピフブプヘベペホボポ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "カカキキククケケココササシシススセセソソタタチチツツテテトトハハハヒヒヒフフフヘヘヘホホホ",
#     "types": [
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_hyphen option. This option enables normalize hyphen to “-” (U+002D HYPHEN-MINUS) as below.

Execution example:

normalize   'NormalizerNFKC100("unify_hyphen", true)'   "-˗֊‐‑‒–⁃⁻₋−"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "-----------",
#     "types": [
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_prolonged_sound_mark option. This option enables normalize prolonged sound to “-” (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK) as below.

Execution example:

normalize   'NormalizerNFKC100("unify_prolonged_sound_mark", true)'   "ー—―─━ー"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "ーーーーーー",
#     "types": [
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_hyphen_and_prolonged_sound_mark option. This option enables normalize hyphen and prolonged sound to “-” (U+002D HYPHEN-MINUS) as below.

Execution example:

normalize   'NormalizerNFKC100("unify_hyphen_and_prolonged_sound_mark", true)'   "-˗֊‐‑‒–⁃⁻₋− ﹣- ー—―─━ー"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "----------- -- ------",
#     "types": [
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "others",
#       "symbol",
#       "symbol",
#       "others",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_middle_dot option. This option enables normalize middle dot to “·” (U+00B7 MIDDLE DOT) as below.

Execution example:

normalize   'NormalizerNFKC100("unify_middle_dot", true)'   "·ᐧ•∙⋅⸱・・"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "········",
#     "types": [
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "symbol",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_katakana_v_sounds option. This option enables normalize “ヴァヴィヴヴェヴォ” to “バビブベボ” as below.

Execution example:

normalize   'NormalizerNFKC100("unify_katakana_v_sounds", true)'   "ヴァヴィヴヴェヴォヴ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "バビブベボブ",
#     "types": [
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_katakana_bu_sound option. This option enables normalize “ヴァヴィヴゥヴェヴォ” to “ブ” as below.

Execution example:

normalize   'NormalizerNFKC100("unify_katakana_bu_sound", true)'   "ヴァヴィヴヴェヴォヴ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "ブブブブブブ",
#     "types": [
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_to_romaji option. This option enables normalize hiragana and katakana to romaji as below.

Execution example:

normalize   'NormalizerNFKC100("unify_to_romaji", true)'   "アァイィウゥエェオォ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "axaixiuxuexeoxo",
#     "types": [
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_katakana_gu_small_sounds option. This option enables to normalize “グァグィグェグォ” to “ガギゲゴ” as below.

Execution example:

normalize   'NormalizerNFKC100("unify_katakana_gu_small_sounds", true)'   "グァグィグェグォ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "ガギゲゴ",
#     "types": [
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_katakana_di_sound option. This option enables to normalize “ヂ” to “ジ” as below.

Execution example:

normalize   'NormalizerNFKC100("unify_katakana_di_sound", true)'   "ヂ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "ジ",
#     "types": [
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_katakana_wo_sound option. This option enables to normalize “ヲ” to “オ” as below.

Execution example:

normalize   'NormalizerNFKC100("unify_katakana_wo_sound", true)'   "ヲ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "オ",
#     "types": [
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_katakana_zu_small_sounds option. This option enables to normalize “ズァズィズェズォ” to “ザジゼゾ” as below.

Execution example:

normalize   'NormalizerNFKC100("unify_katakana_zu_small_sounds", true)'   "ズァズィズェズォ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "ザジゼゾ",
#     "types": [
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_katakana_du_sound option. This option enables to normalize “ヅ” to “ズ” as below.

Execution example:

normalize   'NormalizerNFKC100("unify_katakana_du_sound", true)'   "ヅ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "ズ",
#     "types": [
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_katakana_trailing_o option. This option enables to normalize “オ” to “ウ” when the vowel in the previous letter is “オ” as below.

Execution example:

normalize   'NormalizerNFKC100("unify_katakana_trailing_o", true)'   "オオコオソオトオノオ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "オウコウソウトウノウ",
#     "types": [
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_katakana_du_small_sounds option. This option enables to normalize “ヅァヅィヅェヅォ” to “ザジゼゾ”.

Execution example:

normalize   'NormalizerNFKC100("unify_katakana_du_small_sounds", true)'   "ヅァヅィヅェヅォ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "ザジゼゾ",
#     "types": [
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_kana_prolonged_sound_mark option. This option enables to normalize “ー” (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK) to a vowel of a previous kana letter.

If a previous kana letter is “ん” , “ー” is normalized to “ん”, And a previous kana letter is “ン” , “ー” is normalized to “ン”.

Execution example:

normalize   'NormalizerNFKC100("unify_kana_prolonged_sound_mark", true)'   "カーキークーケーコー"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "カアキイクウケエコオ",
#     "types": [
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

Here is an example of unify_kana_hyphen option. This option enables to normalize “-” (U+002D HYPHEN-MINUS) to a vowel of a previous kana letter.

If a previous kana letter is “ん” , “-” is normalized to “ん”, And a previous kana letter is “ン” , “-” is normalized to “ン”.

Execution example:

normalize   'NormalizerNFKC100("unify_kana_hyphen", true)'   "カ-キ-ク-ケ-コ-"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "カアキイクウケエコオ",
#     "types": [
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "katakana",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

7.7.2.2.3.2. Advanced usage#

You can output romaji of specific a part of speech with using to combine TokenMecab and NormalizerNFKC100 as below.

First of all, you extract reading of a noun with excluding non-independent word and suffix of person name with target_class option and include_reading option.

Next, you normalize reading of the noun that extracted with unify_to_romaji option of NormalizerNFKC100.

Execution example:

tokenize 'TokenMecab("target_class", "-名詞/非自立", "target_class", "-名詞/接尾/人名", "target_class", "名詞", "include_reading", true)' '彼の名前は山田さんのはずです。'
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     {
#       "value": "彼",
#       "position": 0,
#       "force_prefix": false,
#       "force_prefix_search": false,
#       "metadata": {
#         "reading": "カレ"
#       }
#     },
#     {
#       "value": "名前",
#       "position": 1,
#       "force_prefix": false,
#       "force_prefix_search": false,
#       "metadata": {
#         "reading": "ナマエ"
#       }
#     },
#     {
#       "value": "山田",
#       "position": 2,
#       "force_prefix": false,
#       "force_prefix_search": false,
#       "metadata": {
#         "reading": "ヤマダ"
#       }
#     }
#   ]
# ]
normalize   'NormalizerNFKC100("unify_to_romaji", true)'   "カレ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "kare",
#     "types": [
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]
normalize   'NormalizerNFKC100("unify_to_romaji", true)'   "ナマエ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "namae",
#     "types": [
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]
normalize   'NormalizerNFKC100("unify_to_romaji", true)'   "ヤマダ"   WITH_TYPES
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "normalized": "yamada",
#     "types": [
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "alpha",
#       "null"
#     ],
#     "checks": [
#
#     ]
#   }
# ]

7.7.2.2.4. Parameters#

7.7.2.2.4.1. Optional parameter#

There are optional parameters as below.

7.7.2.2.4.1.1. unify_kana#

This option enables that same pronounced characters in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character.

7.7.2.2.4.1.2. unify_kana_case#

This option enables that large and small versions of same letters in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character.

7.7.2.2.4.1.3. unify_kana_voiced_sound_mark#

This option enables that letters with/without voiced sound mark and semi voiced sound mark in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character.

7.7.2.2.4.1.4. unify_hyphen#

This option enables normalize hyphen to “-” (U+002D HYPHEN-MINUS).

Hyphen of the target of normalizing is as below.

  • “-” (U+002D HYPHEN-MINUS)

  • “֊” (U+058A ARMENIAN HYPHEN)

  • “˗” (U+02D7 MODIFIER LETTER MINUS SIGN)

  • “‐” (U+2010 HYPHEN)

  • “—” (U+2014 EM DASH)

  • “⁃” (U+2043 HYPHEN BULLET)

  • “⁻” (U+207B SUPERSCRIPT MINUS)

  • “₋” (U+208B SUBSCRIPT MINUS)

  • “−” (U+2212 MINUS SIGN)

7.7.2.2.4.1.5. unify_prolonged_sound_mark#

This option enables normalize prolonged sound to “-” (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK).

Prolonged sound of the target of normalizing is as below.

  • “—” (U+2014 EM DASH)

  • “―” (U+2015 HORIZONTAL BAR)

  • “─” (U+2500 BOX DRAWINGS LIGHT HORIZONTAL)

  • “━” (U+2501 BOX DRAWINGS HEAVY HORIZONTAL)

  • “ー” (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK)

  • “ー” (U+FF70 HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK)

7.7.2.2.4.1.6. unify_hyphen_and_prolonged_sound_mark#

This option enables normalize hyphen and prolonged sound to “-” (U+002D HYPHEN-MINUS).

Hyphen and prolonged sound of the target normalizing is below.

  • “-” (U+002D HYPHEN-MINUS)

  • “֊” (U+058A ARMENIAN HYPHEN)

  • “˗” (U+02D7 MODIFIER LETTER MINUS SIGN)

  • “‐” (U+2010 HYPHEN)

  • “—” (U+2014 EM DASH)

  • “⁃” (U+2043 HYPHEN BULLET)

  • “⁻” (U+207B SUPERSCRIPT MINUS)

  • “₋” (U+208B SUBSCRIPT MINUS)

  • “−” (U+2212 MINUS SIGN)

  • “—” (U+2014 EM DASH)

  • “―” (U+2015 HORIZONTAL BAR)

  • “─” (U+2500 BOX DRAWINGS LIGHT HORIZONTAL)

  • “━” (U+2501 BOX DRAWINGS HEAVY HORIZONTAL)

  • “ー” (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK)

  • “ー” (U+FF70 HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK)

7.7.2.2.4.1.7. unify_middle_dot#

Added in version 8.0.3.

This option enables normalize middle dot to “·” (U+00B7 MIDDLE DOT).

Middle dot of the target of normalizing is as below.

  • “·” (U+00B7 MIDDLE DOT)

  • “ᐧ” (U+1427 CANADIAN SYLLABICS FINAL MIDDLE DOT)

  • “•” (U+2022 BULLET)

  • “∙” (U+2219 BULLET OPERATOR)

  • “⋅” (U+22C5 DOT OPERATOR)

  • “⸱” (U+2E31 WORD SEPARATOR MIDDLE DOT)

  • “・” (U+30FB KATAKANA MIDDLE DOT)

  • “・” (U+FF65 HALFWIDTH KATAKANA MIDDLE DOT)

7.7.2.2.4.1.8. unify_katakana_v_sounds#

Added in version 8.0.3.

This option enables normalize “ヴァヴィヴヴェヴォ” to “バビブベボ”.

7.7.2.2.4.1.9. unify_katakana_bu_sound#

Added in version 8.0.3.

This option enables normalize “ヴァヴィヴゥヴェヴォ” to “ブ”.

7.7.2.2.4.1.10. unify_to_romaji#

Added in version 8.0.9.

This option enables normalize hiragana and katakana to romaji.

7.7.2.2.4.1.11. unify_katakana_gu_small_sounds#

Added in version 13.0.0.

This option enables to normalize “グァグィグェグォ” to “ガギゲゴ”.

7.7.2.2.4.1.12. unify_katakana_di_sound#

Added in version 13.0.0.

This option enables to normalize “ヂ” to “ジ”.

7.7.2.2.4.1.13. unify_katakana_wo_sound#

Added in version 13.0.0.

This option enables to normalize “ヲ” to “オ”.

7.7.2.2.4.1.14. unify_katakana_zu_small_sounds#

Added in version 13.0.0.

This option enables to normalize “ズァズィズェズォ” to “ザジゼゾ”.

7.7.2.2.4.1.15. unify_katakana_du_sound#

Added in version 13.0.0.

This option enables to normalize “ヅ” to “ズ”.

7.7.2.2.4.1.16. unify_katakana_trailing_o#

Added in version 13.0.0.

This option enables to normalize “オ” to “ウ” when the vowel in the previous letter is “オ”.

  • “ォオ” -> “ォウ”

  • “オオ” -> “オウ”

  • “コオ” -> “コウ”

  • “ソオ” -> “ソウ”

  • “トオ” -> “トウ”

  • “ノオ” -> “ノウ”

  • “ホオ” -> “ホウ”

  • “モオ” -> “モウ”

  • “ョオ” -> “ョオ”

  • “ヨオ” -> “ヨウ”

  • “ロオ” -> “ロウ”

  • “ゴオ” -> “ゴウ”

  • “ゾオ” -> “ゾウ”

  • “ドオ” -> “ドウ”

  • “ボオ” -> “ボウ”

  • “ポオ” -> “ポウ”

  • “ヺオ” -> “ヺウ”

7.7.2.2.4.1.17. unify_katakana_du_small_sounds#

Added in version 13.0.0.

This option enables to normalize “ヅァヅィヅェヅォ” to “ザジゼゾ”.

7.7.2.2.4.1.18. unify_kana_prolonged_sound_mark#

Added in version 13.0.1.

This option enables to normalize “ー” (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK) to a vowel of a previous kana letter.

If a previous kana letter is “ん” , “ー” is normalized to “ん”, And a previous kana letter is “ン” , “ー” is normalized to “ン”.

ァー -> ァア, アー -> アア, ヵー -> ヵア, カー -> カア, ガー -> ガア, サー -> サア, ザー -> ザア,
ター -> タア, ダー -> ダア, ナー -> ナア, ハー -> ハア, バー -> バア, パー -> パア, マー -> マア,
ャー -> ャア, ヤー -> ヤア, ラー -> ラア, ヮー -> ヮア, ワー -> ワア, ヷー -> ヷア,
ィー -> ィイ, イー -> イイ, キー -> キイ, ギー -> ギイ, シー -> シイ, ジー -> ジイ, チー -> チイ,
ヂー -> ヂイ, ニー -> ニイ, ヒー -> ヒイ, ビー -> ビイ, ピー -> ピイ, ミー -> ミイ, リー -> リイ,
ヰー -> ヰイ, ヸー -> ヸイ,

ゥー -> ゥウ, ウー -> ウウ, クー -> クウ, グー -> グウ, スー -> スウ, ズー -> ズウ, ツー -> ツウ,
ヅー -> ヅウ, ヌー -> ヌウ, フー -> フウ, ブー -> ブウ, プー -> プウ, ムー -> ムウ, ュー -> ュウ,
ユー -> ユウ, ルー -> ルウ, ヱー -> ヱウ, ヴー -> ヴウ,

ェー -> ェエ, エー -> エエ, ヶー -> ヶエ, ケー -> ケエ, ゲー -> ゲエ, セー -> セエ, ゼー -> ゼエ,
テー -> テエ, デー -> デエ, ネー -> ネエ, ヘー -> ヘエ, ベー -> ベエ, ペー -> ペエ, メー -> メエ,
レー -> レエ, ヹー -> ヹエ,

ォー -> ォオ, オー -> オオ, コー -> コオ, ゴー -> ゴオ, ソー -> ソオ, ゾー -> ゾオ, トー -> トオ,
ドー -> ドオ, ノー -> ノオ, ホー -> ホオ, ボー -> ボオ, ポー -> ポオ, モー -> モオ, ョー -> ョオ,
ヨー -> ヨオ, ロー -> ロオ, ヲー -> ヲオ, ヺー -> ヺオ,

ンー -> ンン

ぁー -> ぁあ, あー -> ああ, ゕー -> ゕあ, かー -> かあ, がー -> があ, さー -> さあ, ざー -> ざあ,
たー -> たあ, だー -> だあ, なー -> なあ, はー -> はあ, ばー -> ばあ, ぱー -> ぱあ, まー -> まあ,
ゃー -> ゃあ, やー -> やあ, らー -> らあ, ゎー -> ゎあ, わー -> わあ

ぃー -> ぃい, いー -> いい, きー -> きい, ぎー -> ぎい, しー -> しい, じー -> じい, ちー -> ちい,
ぢー -> ぢい, にー -> にい, ひー -> ひい, びー -> びい, ぴー -> ぴい, みー -> みい, りー -> りい,
ゐー -> ゐい

ぅー -> ぅう, うー -> うう, くー -> くう, ぐー -> ぐう, すー -> すう, ずー -> ずう, つー -> つう,
づー -> づう, ぬー -> ぬう, ふー -> ふう, ぶー -> ぶう, ぷー -> ぷう, むー -> むう, ゅー -> ゅう,
ゆー -> ゆう, るー -> るう, ゑー -> ゑう, ゔー -> ゔう

ぇー -> ぇえ, えー -> ええ, ゖー -> ゖえ, けー -> けえ, げー -> げえ, せー -> せえ, ぜー -> ぜえ,
てー -> てえ, でー -> でえ, ねー -> ねえ, へー -> へえ, べー -> べえ, ぺー -> ぺえ, めー -> めえ,
れー -> れえ

ぉー -> ぉお, おー -> おお, こー -> こお, ごー -> ごお, そー -> そお, ぞー -> ぞお, とー -> とお,
どー -> どお, のー -> のお, ほー -> ほお, ぼー -> ぼお, ぽー -> ぽお, もー -> もお, ょー -> ょお,
よー -> よお, ろー -> ろお, をー -> をお

んー -> んん

7.7.2.2.4.1.19. unify_kana_hyphen#

Added in version 13.0.1.

This option enables to normalize “-” (U+002D HYPHEN-MINUS) to a vowel of a previous kana letter.

If a previous kana letter is “ん” , “-” is normalized to “ん”, And a previous kana letter is “ン” , “-” is normalized to “ン”.

- -> ァア, - -> アア, - -> ヵア, - -> カア, - -> ガア, - -> サア, - -> ザア,
- -> タア, - -> ダア, - -> ナア, - -> ハア, - -> バア, - -> パア, - -> マア,
- -> ャア, - -> ヤア, - -> ラア, - -> ヮア, - -> ワア, - -> ヷア,
- -> ィイ, - -> イイ, - -> キイ, - -> ギイ, - -> シイ, - -> ジイ, - -> チイ,
- -> ヂイ, - -> ニイ, - -> ヒイ, - -> ビイ, - -> ピイ, - -> ミイ, - -> リイ,
- -> ヰイ, - -> ヸイ,

- -> ゥウ, - -> ウウ, - -> クウ, - -> グウ, - -> スウ, - -> ズウ, - -> ツウ,
- -> ヅウ, - -> ヌウ, - -> フウ, - -> ブウ, - -> プウ, - -> ムウ, - -> ュウ,
- -> ユウ, - -> ルウ, - -> ヱウ, - -> ヴウ,

- -> ェエ, - -> エエ, - -> ヶエ, - -> ケエ, - -> ゲエ, - -> セエ, - -> ゼエ,
- -> テエ, - -> デエ, - -> ネエ, - -> ヘエ, - -> ベエ, - -> ペエ, - -> メエ,
- -> レエ, - -> ヹエ,

- -> ォオ, - -> オオ, - -> コオ, - -> ゴオ, - -> ソオ, - -> ゾオ, - -> トオ,
- -> ドオ, - -> ノオ, - -> ホオ, - -> ボオ, - -> ポオ, - -> モオ, - -> ョオ,
- -> ヨオ, - -> ロオ, - -> ヲオ, - -> ヺオ,

- -> ンン

- -> ぁあ, - -> ああ, - -> ゕあ, - -> かあ, - -> があ, - -> さあ, - -> ざあ,
- -> たあ, - -> だあ, - -> なあ, - -> はあ, - -> ばあ, - -> ぱあ, - -> まあ,
- -> ゃあ, - -> やあ, - -> らあ, - -> ゎあ, - -> わあ

- -> ぃい, - -> いい, - -> きい, - -> ぎい, - -> しい, - -> じい, - -> ちい,
- -> ぢい, - -> にい, - -> ひい, - -> びい, - -> ぴい, - -> みい, - -> りい,
- -> ゐい

- -> ぅう, - -> うう, - -> くう, - -> ぐう, - -> すう, - -> ずう, - -> つう,
- -> づう, - -> ぬう, - -> ふう, - -> ぶう, - -> ぷう, - -> むう, - -> ゅう,
- -> ゆう, - -> るう, - -> ゑう, - -> ゔう

- -> ぇえ, - -> ええ, - -> ゖえ, - -> けえ, - -> げえ, - -> せえ, - -> ぜえ,
- -> てえ, - -> でえ, - -> ねえ, - -> へえ, - -> べえ, - -> ぺえ, - -> めえ,
- -> れえ

- -> ぉお, - -> おお, - -> こお, - -> ごお, - -> そお, - -> ぞお, - -> とお,
- -> どお, - -> のお, - -> ほお, - -> ぼお, - -> ぽお, - -> もお, - -> ょお,
- -> よお, - -> ろお, - -> をお

- -> んん

7.7.2.2.5. See also#