7.15.12. html_untag
#
7.15.12.1. Summary#
html_untag
strips HTML tags from HTML and outputs plain text.
html_untag
is used in --output_columns
described at
output_columns.
7.15.12.2. Syntax#
html_untag
requires only one argument. It is html
.
html_untag(html)
7.15.12.3. Requirements#
html_untag
requires Groonga 3.0.5 or later.
html_untag
requires Command version 2 or
later.
7.15.12.4. Usage#
Here are a schema definition and sample data to show usage.
Sample schema:
Execution example:
table_create WebClips TABLE_HASH_KEY ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
column_create WebClips content COLUMN_SCALAR ShortText
# [[0,1337566253.89858,0.000355720520019531],true]
Sample data:
Execution example:
load --table WebClips
[
{"_key": "http://groonga.org", "content": "groonga is <span class='emphasize'>fast</span>"},
{"_key": "http://mroonga.org", "content": "mroonga is <span class=\"emphasize\">fast</span>"},
]
# [[0,1337566253.89858,0.000355720520019531],2]
Here is the simple usage of html_untag
function which strips HTML tags from content of column.
Execution example:
select WebClips --output_columns "html_untag(content)" --command_version 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "html_untag",
# null
# ]
# ],
# [
# "groonga is fast"
# ],
# [
# "mroonga is fast"
# ]
# ]
# ]
# ]
When executing the above query, you can see “span” tag with “class” attribute is stripped.
Note that you must specify --command_version 2
to use html_untag
function.
7.15.12.5. Parameters#
There is only one required parameter.
7.15.12.5.1. html
#
Specifies HTML text to be untagged.
7.15.12.6. Return value#
html_untag
returns plain text which is stripped HTML tags from HTML text.