HTML::MyHTML is a fast HTML Parser using Threads with no outside dependencies
This Parser based on MyHTML library (it includes version 4.0.4)
- Asynchronous Parsing, Build Tree and Indexation
- Fully conformant with the HTML5 specification
- Manipulation of elements: add, change, delete and other (available in C lib, in Perl coming soon)
- Manipulation of elements attributes: add, change, delete and other (available in C lib, in Perl coming soon)
- Support 39 character encoding by specification encoding.spec.whatwg.org
- Support detecting character encodings
- Support Single Mode parsing
- Support for fragment parsing
- Support for parsing by chunks
- No outside dependencies
- Passes all tree construction tests from html5lib-tests
Make module:
perl Makefile.PL
make
make test
make install
use utf8;
use strict;
use HTML::MyHTML;
my $body = "<div><span>Best of Fragments</span><a>click to make happy</a></div><div some=value></div>";
# init
my $myhtml = HTML::MyHTML->new(MyHTML_OPTIONS_DEFAULT, 1);
my $tree = $myhtml->new_tree();
# parse
$myhtml->parse($tree, MyENCODING_UTF_8, $body);
# print result
print "Print HTML Tree:\n";
$tree->document->print_children(*STDOUT);
print "\nGet all DIV elements of HTML Tree:\n";
my $list = $tree->get_elements_by_tag_name("div");
# or my $list = $tree->body()->get_nodes_by_tag_id(MyHTML_TAG_DIV);
foreach my $node (@$list) {
my $info = $node->info();
print "Tag id: ", $info->{tag_id}, "\n";
print "Tag name: ", $info->{tag}, "\n";
print "Namespace: ", $info->{namespace}, "\n";
print "Namespace id: ", $info->{namespace_id}, "\n";
my $attr = $info->{attr};
if (keys %$attr) {
print "Attributes: \n";
foreach my $key (keys %$attr) {
print "\t", "$key=\"", $attr->{$key}, "\"\n";
}
}
print "\n";
}
# or you can get span
# tree -> document -> HTML -> BODY -> DIV -> SPAN
my $span = $tree->document->child->last_child->child->child;
my $info_of_span = $span->info();
$tree->destroy();
Create a MyHTML object. Allocating and Initialization resources for a MyHTML object
# $opt[in] work options, how many threads will be. Default: MyHTML_OPTIONS_PARSE_MODE_SEPARATELY
# $thread_count[in] thread count, it depends on the choice of work options. Default: 1
# $out_status[out] status
my $myhtml = HTML::MyHTML->new($opt, $thread_count, $out_status);
Return: HTML::MyHTML if successful, otherwise a UNDEF value
Destroy a MyHTML object.
$myhtml->destroy();
Create a MyHTML::TREE object. Allocating and Initialization resources for a MyHTML::TREE object
my $tree = $myhtml->new_tree($out_status);
Return: MyHTML::TREE object if successful, otherwise a UNDEF value
Parsing HTML
# $tree[in] previously created object MyHTML::TREE
# $encoding[in] Input character encoding; Default: MyHTML_ENCODING_UTF_8 or MyHTML_ENCODING_DEFAULT or 0
# $html[in] HTML
my $status = $myhtml->parse($tree, $encoding, $html);
Return: MyHTML_STATUS_OK if successful, otherwise an error status
Parsing fragment of HTML
# $tree[in] previously created object MyHTML::TREE
# $encoding[in] Input character encoding; Default: MyHTML_ENCODING_UTF_8 or MyHTML_ENCODING_DEFAULT or 0
# $html[in] HTML
# $tag_id[in] fragment base (root) tag id. Default: MyHTML_TAG_DIV if set 0
# $my_namespace[in] fragment NAMESPACE. Default: MyHTML_NAMESPACE_HTML if set 0
my $status = $myhtml->parse_fragment($tree, $encoding, $html, $tag_id, $my_namespace);
Return: MyHTML_STATUS_OK if successful, otherwise an error status
Parsing HTML in Single Mode. No matter what was said during initialization MyHTML
# $tree[in] previously created object MyHTML::TREE
# $encoding[in] Input character encoding; Default: MyHTML_ENCODING_UTF_8 or MyHTML_ENCODING_DEFAULT or 0
# $html[in] HTML
my $status = $myhtml->parse_single($tree, $encoding, $html);
Return: MyHTML_STATUS_OK if successful, otherwise an error status
Parsing fragment of HTML in Single Mode. No matter what was said during initialization MyHTML
# $tree[in] previously created object MyHTML::TREE
# $encoding[in] Input character encoding; Default: MyHTML_ENCODING_UTF_8 or MyHTML_ENCODING_DEFAULT or 0
# $html[in] HTML
# $tag_id[in] fragment base (root) tag id. Default: MyHTML_TAG_DIV if set 0
# $my_namespace[in] fragment NAMESPACE. Default: MyHTML_NAMESPACE_HTML if set 0
my $status = $myhtml->parse_fragment_single($tree, $encoding, $html, $tag_id, $my_namespace);
Return: MyHTML_STATUS_OK if successful, otherwise an error status
Parsing HTML chunk. For end parsing call parse_chunk_end method
my $status = $myhtml->parse_chunk($tree, $html);
Return: MyHTML_STATUS_OK if successful, otherwise an error status
Parsing chunk of fragment HTML. For end parsing call parse_chunk_end method
my $status = $myhtml->parse_chunk_fragment($tree, $html, $tag_id, $my_namespace);
Return: MyHTML_STATUS_OK if successful, otherwise an error status
Parsing HTML chunk in Single Mode.
my $status = $myhtml->parse_chunk_single($tree, $html);
Return: MyHTML_STATUS_OK if successful, otherwise an error status
Parsing chunk of fragment of HTML in Single Mode. No matter what was said during initialization MyHTML
my $status = $myhtml->parse_chunk_fragment_single($tree, $html, $tag_id, $my_namespace);
Return: MyHTML_STATUS_OK if successful, otherwise an error status
End of parsing HTML chunks
my $status = $myhtml->parse_chunk_end($tree);
Return: MyHTML_STATUS_OK if successful, otherwise an error status
Get HTML::MyHTML::Tag from a HTML::MyHTML
my $tag = $myhtml->get_tag();
Return: HTML::MyHTML::Tag if exists, otherwise a UNDEF value
Clears resources before new parsing
$tree->clean();
Destroy of a MyHTML_TREE structure
my $tree = $tree->destroy();
Return: UNDEF if successful, otherwise an HTML::MyHTML::Tree structure
Set Parse Flags for Tree
$tree->parse_flags_set($parse_flags);
Example:
$tree->parse_flags_set( MyHTML_TREE_PARSE_FLAGS_WITHOUT_BUILD_TREE|MyHTML_TREE_PARSE_FLAGS_WITHOUT_DOCTYPE_IN_TREE|MyHTML_TREE_PARSE_FLAGS_SKIP_WHITESPACE_TOKEN );
Get Parse Flags of Tree
my $parse_flags = $tree->parse_flags();
Return: myhtml_tree_parse_flags_t
Get HTML::MyHTML from a HTML::MyHTML::Tree object
my $myhtml = $tree->get_myhtml();
Return: HTML::MyHTML if exists, otherwise a UNDEF value
Get HTML::MyHTML::Tag from a HTML::MyHTML::Tree
my $tag = $tree->get_tag();
Return: HTML::MyHTML::Tag if exists, otherwise a UNDEF value
Get HTML::MyHTML::Tag from a HTML::MyHTML::Tree
my $tag_index = $tree->get_tag_index();
Return: HTML::MyHTML::Tag::Index if exists, otherwise a UNDEF value
Get Tree Document (Root of Tree)
my $node = $tree->document();
Return: HTML::MyHTML::Tree::Node if successful, otherwise a UNDEF value
Get node HTML (Document -> HTML, Root of HTML Document)
my $node = $tree->html();
Return: HTML::MyHTML::Tree::Node if successful, otherwise a UNDEF value
Get node HEAD (Document -> HTML -> HEAD)
my $node = $tree->head();
Return: HTML::MyHTML::Tree::Node if successful, otherwise a UNDEF value
Get node BODY (Document -> HTML -> BODY)
my $node = $tree->body();
Return: HTML::MyHTML::Tree::Node if successful, otherwise a UNDEF value
my $mchar_async_t = $tree->get_mchar();
Return: mchar_async_t* if exists, otherwise a UNDEF value
my $id = $tree->get_mchar_node_id();
Return: node id
my $res = $tree->get_elements_by_tag_id($tag_id);
Return: array list of elements HTML::MyHTML::Tree::Node
my $res = $tree->get_elements_by_tag_name($tag_name);
Return: array list of elements HTML::MyHTML::Tree::Node
Set callback for tokens before processing.
Important!!! Only for Perl! Do not use this callback in Thread mode parsing; Build without threads or use methods parse_single, parse_fragment_single, parse_chunk_single, parse_chunk_fragment_single or create myhtml with MyHTML_OPTIONS_PARSE_MODE_SINGLE option;
$tree->callback_before_token_done_set($sub_callback [, $ctx]);
Set callback for tokens after processing
Important!!! Only for Perl! Do not use this callback in Thread mode parsing; Build without threads or use methods parse_single, parse_fragment_single, parse_chunk_single, parse_chunk_fragment_single or create myhtml with MyHTML_OPTIONS_PARSE_MODE_SINGLE option;
$tree->callback_after_token_done_set($sub_callback [, $ctx]);
Set callback for tree node after inserted
Important!!! Only for Perl! Do not use this callback in Thread mode parsing; Build without threads or use methods parse_single, parse_fragment_single, parse_chunk_single, parse_chunk_fragment_single or create myhtml with MyHTML_OPTIONS_PARSE_MODE_SINGLE option;
$tree->callback_node_insert_set($sub_callback [, $ctx]);
Set callback for tree node after removed
Important!!! Only for Perl! Do not use this callback in Thread mode parsing; Build without threads or use methods parse_single, parse_fragment_single, parse_chunk_single, parse_chunk_fragment_single or create myhtml with MyHTML_OPTIONS_PARSE_MODE_SINGLE option;
$tree->callback_node_remove_set($sub_callback [, $ctx]);
Get information of attribute: key, value, namespace
my $res = $attr->info();
Return: hash ref
Get attribute name (key)
my $res = $attr->name();
Return: name (key) if exists, otherwise an UNDEF value
Get attribute value
my $res = $attr->value();
Return: value if exists, otherwise an UNDEF value
Get next sibling attribute of one node
my $attr = $attr->next();
Return: HTML::MyHTML::Tree::Attr if exists, otherwise an UNDEF value
Get previous sibling attribute of one node
my $attr = $attr->prev();
Return: HTML::MyHTML::Tree::Attr if exists, otherwise an UNDEF value
Get attribute namespace
my $namespace = $attr->namespace();
Return: namespace id
Remove attribute reference. Do not release the resources
my $attr = $attr->remove($node);
Return: HTML::MyHTML::Tree::Attr if successful, otherwise a UNDEF value
Remove attribute and release allocated resources
$attr->delete($tree, $node);
Release allocated resources
$attr->free($tree);
Get information of node: tag name, tag id, namespace, namespace id, attr
my $res = $node->info();
Return: hash ref
Get next sibling node
my $node = $node->next();
Return: HTML::MyHTML::Tree::Node if exists, otherwise an UNDEF value
Get previous sibling node
my $node = $node->prev();
Return: HTML::MyHTML::Tree::Node if exists, otherwise an UNDEF value
Get parent node
my $node = $node->parent();
Return: HTML::MyHTML::Tree::Node if exists, otherwise an UNDEF value
Get child (first child) of node
my $node = $node->child();
Return: HTML::MyHTML::Tree::Node if exists, otherwise an UNDEF value
Get last child of node
my $node = $node->last_child();
Return: HTML::MyHTML::Tree::Node if exists, otherwise an UNDEF value
Get token node
my $token_node = $node->token();
Return: HTML::MyHTML::Token::Node if exists, otherwise an UNDEF value
Get nodes by attribute key of current node
my $nodes = $node->get_nodes_by_attribute_key($key [, $out_status]);
Return: HTML::MyHTML::Tree::Node ARRAY if exists, otherwise an UNDEF value
Get nodes by attribute value; exactly equal; like a [foo="bar"]
# $case_insensitive: 1 or 0
# $key: may bу undef for find in all keys
my $nodes = $node->get_nodes_by_attribute_value($case_insensitive, $key, $value [, $out_status]);
Return: HTML::MyHTML::Tree::Node ARRAY if exists, otherwise an UNDEF value
Get nodes by attribute value; whitespace separated; like a [foo~="bar"]
# $case_insensitive: 1 or 0
# $key: may bу undef for find in all keys
my $nodes = $node->get_nodes_by_attribute_value_whitespace_separated($case_insensitive, $key, $value [, $out_status]);
Return: HTML::MyHTML::Tree::Node ARRAY if exists, otherwise an UNDEF value
Get nodes by attribute value; value begins exactly with the string; like a [foo^="bar"]
# $case_insensitive: 1 or 0
# $key: may bу undef for find in all keys
my $nodes = $node->get_nodes_by_attribute_value_begin($case_insensitive, $key, $value [, $out_status]);
Return: HTML::MyHTML::Tree::Node ARRAY if exists, otherwise an UNDEF value
Get nodes by attribute value; value ends exactly with the string; like a [foo$="bar"]
# $case_insensitive: 1 or 0
# $key: may bу undef for find in all keys
my $nodes = $node->get_nodes_by_attribute_value_end($case_insensitive, $key, $value [, $out_status]);
Return: HTML::MyHTML::Tree::Node ARRAY if exists, otherwise an UNDEF value
Get nodes by attribute value; value contains the substring; like a [foo*="bar"]
# $case_insensitive: 1 or 0
# $key: may bу undef for find in all keys
my $nodes = $node->get_nodes_by_attribute_value_contain($case_insensitive, $key, $value [, $out_status]);
Return: HTML::MyHTML::Tree::Node ARRAY if exists, otherwise an UNDEF value
Get nodes by attribute value; attribute value is a hyphen-separated list of values beginning
# $case_insensitive: 1 or 0
# $key: may bу undef for find in all keys
my $nodes = $node->get_nodes_by_attribute_value_hyphen_separated($case_insensitive, $key, $value [, $out_status]);
Return: HTML::MyHTML::Tree::Node ARRAY if exists, otherwise an UNDEF value
Get nodes by tag id in node scope
my $nodes = $node->get_nodes_by_tag_id($tag_id [, $out_status]);
Return: HTML::MyHTML::Tree::Node ARRAY if exists, otherwise an UNDEF value
Release allocated resources
$node->free();
Remove node of tree
my $node = $node->remove();
Return: HTML::MyHTML::Tree::Node if successful, otherwise a UNDEF value
Remove node of tree and release allocated resources
$node->delete();
Remove nodes of tree recursively and release allocated resources
$node->delete_recursive();
Get node tag id
my $tag_id = $node->tag_id();
Return: tag_id
Get node namespace
my $namespace = $node->namespace();
Return: namespace id
Get tag name of a node
my $res = $node->tag_name();
Return: tag name
Node has self-closing flag?
my $bool = $node->is_close_self();
Return: 1 (true) or 0 (false)
Get first attribute of a node
my $attr = $node->attr_first();
Return: HTML::MyHTML::Tree::Attr if exists, otherwise an UNDEF value
Get last attribute of a node
my $attr = $node->attr_last();
Return: HTML::MyHTML::Tree::Attr if exists, otherwise an UNDEF value
Add attribute to tree node
my $attr = $node->attr_add($key, $value, $encoding);
Return: HTML::MyHTML::Tree::Attr if successful, otherwise an UNDEF value
Remove attribute by key reference. Do not release the resources
my $attr = $node->attr_remove_by_key($key);
Return: HTML::MyHTML::Tree::Attr if successful, otherwise an UNDEF value
Get attribute by key
my $attr = $node->attr_by_key($key);
Return: HTML::MyHTML::Tree::Attr if exists, otherwise an UNDEF value
Get text of a node. Only for a MyHTML_TAG__TEXT or MyHTML_TAG__COMMENT tags
my $res = $node->text();
Return: text if exists, otherwise an UNDEF value
Get myhtml_string_t object by Tree node
my $string = $node->string();
Return: HTML::MyHTML::String if exists, otherwise an NULL value
Print a node
$node->print($fh);
Print tree of a node. Print excluding current node
$node->print_childs($fh);
Print tree of a node. Print including current node
$node->print_all($fh);
Get information of token node: tag name, tag id, attr
my $res = $token_node->info($tree);
Return: hash ref
Get token node tag id
my $tag_id = $token_node->tag_id();
Return: tag_id
Get tag name of a token node
my $res = $token_node->tag_name($tree);
Return: tag name
Node has self-closing flag?
my $bool = $token_node->is_close_self();
Return: 1 (true) or 0 (false)
Get first attribute of a token node
my $attr = $token_node->attr_first();
Return: HTML::MyHTML::Tree::Attr if exists, otherwise an UNDEF value
Get last attribute of a token node
my $attr = $token_node->attr_last();
Return: HTML::MyHTML::Tree::Attr if exists, otherwise an UNDEF value
Get text of a token node. Only for a MyHTML_TAG__TEXT or MyHTML_TAG__COMMENT tags
my $res = $token_node->text();
Return: text if exists, otherwise an UNDEF value
Get myhtml_string_t object by token node
my $string = $token_node->string();
Return: HTML::MyHTML::String if exists, otherwise an NULL value
Wait for process token all parsing stage. Need if you use thread mode
$token_node->wait_for_done();
Detect character encoding in html by tag. See HTML spec
# $text[in] text
# $len[in] optional, length for search. If not set len == length($text)
my $encoding = prescan_stream_to_determine_encoding($text, 1024);
Return: encoding if found, otherwise MyENCODING_NOT_DETERMINED
Detect character encoding.
Now available for detect UTF-8, UTF-16LE, UTF-16BE and Russians: windows-1251, koi8-r, iso-8859-5, x-mac-cyrillic, ibm866. Other in progress
# $text[in] text
# $out_encoding[out] detected encoding
my $bool = $myhtml->encoding_detect($text, $out_encoding);
Return: 1 (true) if encoding found, otherwise 0 (false)
Detect Russian character encoding
Now available for detect windows-1251, koi8-r, iso-8859-5, x-mac-cyrillic, ibm866
# $text[in] text
# $out_encoding[out] detected encoding
my $bool = $myhtml->encoding_detect_russian($text, $out_encoding);
Return: 1 (true) if encoding found, otherwise 0 (false)
Detect Unicode character encoding
Now available for detect UTF-8, UTF-16LE, UTF-16BE
# $text[in] text
# $out_encoding[out] detected encoding
my $bool = $myhtml->encoding_detect_unicode($text, $out_encoding);
Return: 1 (true) if encoding found, otherwise 0 (false)
Detect Unicode character encoding by BOM
Now available for detect UTF-8, UTF-16LE, UTF-16BE
# $text[in] text
# $out_encoding[out] detected encoding
my $bool = $myhtml->encoding_detect_bom($text, $out_encoding);
Return: 1 (true) if encoding found, otherwise 0 (false)
Get Incoming Buffer by position
my $incoming_buffer = $incoming_buffer->find_by_position($begin_position);
Return: HTML::Incoming::Buffer if successful, otherwise a UNDEF value
Get data of Incoming Buffer
my $data = $incoming_buffer->data();
Return: text scalar if successful, otherwise a UNDEF value
Get data length of Incoming Buffer
my $length = $incoming_buffer->length();
Return: scalar length
Get data size of Incoming Buffer
my $size = $incoming_buffer->size();
Return: scalar size
Get data offset of Incoming Buffer. Global position of begin Incoming Buffer.
my $offset = $incoming_buffer->offset();
Return: scalar offset
Get Relative Position for Incoming Buffer. Incoming Buffer should be prepared by find_by_position.
my $relative_begin = $incoming_buffer->relative_begin();
Return: scalar relative begin
This function returns number of available data by Incoming Buffer. Incoming buffer may be incomplete. See next.
my $available_length = $incoming_buffer->available_length();
Return: scalar available length
Get next buffer
my $next_incoming_buffer = $incoming_buffer->next();
Return: HTML::Incoming::Buffer if exists, otherwise a UNDEF value
Get prev buffer
my $prev_incoming_buffer = $incoming_buffer->prev();
Return: HTML::Incoming::Buffer if exists, otherwise a UNDEF value
Get namespace text by namespace type (id)
my $namespace_name = namespace_name_by_id($namespace_id);
Return: text if successful, otherwise a UNDEF value
Get namespace type (id) by namespace text
my $namespace_id = namespace_id_by_name($namespace_name);
Return: namespace id
MyHTML_TAG__UNDEF
MyHTML_TAG__TEXT
MyHTML_TAG__COMMENT
MyHTML_TAG__DOCTYPE
MyHTML_TAG_A
MyHTML_TAG_ABBR
MyHTML_TAG_ACRONYM
MyHTML_TAG_ADDRESS
MyHTML_TAG_ANNOTATION_XML
MyHTML_TAG_APPLET
MyHTML_TAG_AREA
MyHTML_TAG_ARTICLE
MyHTML_TAG_ASIDE
MyHTML_TAG_AUDIO
MyHTML_TAG_B
MyHTML_TAG_BASE
MyHTML_TAG_BASEFONT
MyHTML_TAG_BDI
MyHTML_TAG_BDO
MyHTML_TAG_BGSOUND
MyHTML_TAG_BIG
MyHTML_TAG_BLINK
MyHTML_TAG_BLOCKQUOTE
MyHTML_TAG_BODY
MyHTML_TAG_BR
MyHTML_TAG_BUTTON
MyHTML_TAG_CANVAS
MyHTML_TAG_CAPTION
MyHTML_TAG_CENTER
MyHTML_TAG_CITE
MyHTML_TAG_CODE
MyHTML_TAG_COL
MyHTML_TAG_COLGROUP
MyHTML_TAG_COMMAND
MyHTML_TAG_COMMENT
MyHTML_TAG_DATALIST
MyHTML_TAG_DD
MyHTML_TAG_DEL
MyHTML_TAG_DETAILS
MyHTML_TAG_DFN
MyHTML_TAG_DIALOG
MyHTML_TAG_DIR
MyHTML_TAG_DIV
MyHTML_TAG_DL
MyHTML_TAG_DT
MyHTML_TAG_EM
MyHTML_TAG_EMBED
MyHTML_TAG_FIELDSET
MyHTML_TAG_FIGCAPTION
MyHTML_TAG_FIGURE
MyHTML_TAG_FONT
MyHTML_TAG_FOOTER
MyHTML_TAG_FORM
MyHTML_TAG_FRAME
MyHTML_TAG_FRAMESET
MyHTML_TAG_H1
MyHTML_TAG_H2
MyHTML_TAG_H3
MyHTML_TAG_H4
MyHTML_TAG_H5
MyHTML_TAG_H6
MyHTML_TAG_HEAD
MyHTML_TAG_HEADER
MyHTML_TAG_HGROUP
MyHTML_TAG_HR
MyHTML_TAG_HTML
MyHTML_TAG_I
MyHTML_TAG_IFRAME
MyHTML_TAG_IMAGE
MyHTML_TAG_IMG
MyHTML_TAG_INPUT
MyHTML_TAG_INS
MyHTML_TAG_ISINDEX
MyHTML_TAG_KBD
MyHTML_TAG_KEYGEN
MyHTML_TAG_LABEL
MyHTML_TAG_LEGEND
MyHTML_TAG_LI
MyHTML_TAG_LINK
MyHTML_TAG_LISTING
MyHTML_TAG_MAIN
MyHTML_TAG_MAP
MyHTML_TAG_MARK
MyHTML_TAG_MARQUEE
MyHTML_TAG_MENU
MyHTML_TAG_MENUITEM
MyHTML_TAG_META
MyHTML_TAG_METER
MyHTML_TAG_MTEXT
MyHTML_TAG_NAV
MyHTML_TAG_NOBR
MyHTML_TAG_NOEMBED
MyHTML_TAG_NOFRAMES
MyHTML_TAG_NOSCRIPT
MyHTML_TAG_OBJECT
MyHTML_TAG_OL
MyHTML_TAG_OPTGROUP
MyHTML_TAG_OPTION
MyHTML_TAG_OUTPUT
MyHTML_TAG_P
MyHTML_TAG_PARAM
MyHTML_TAG_PLAINTEXT
MyHTML_TAG_PRE
MyHTML_TAG_PROGRESS
MyHTML_TAG_Q
MyHTML_TAG_RB
MyHTML_TAG_RP
MyHTML_TAG_RT
MyHTML_TAG_RTC
MyHTML_TAG_RUBY
MyHTML_TAG_S
MyHTML_TAG_SAMP
MyHTML_TAG_SCRIPT
MyHTML_TAG_SECTION
MyHTML_TAG_SELECT
MyHTML_TAG_SMALL
MyHTML_TAG_SOURCE
MyHTML_TAG_SPAN
MyHTML_TAG_STRIKE
MyHTML_TAG_STRONG
MyHTML_TAG_STYLE
MyHTML_TAG_SUB
MyHTML_TAG_SUMMARY
MyHTML_TAG_SUP
MyHTML_TAG_SVG
MyHTML_TAG_TABLE
MyHTML_TAG_TBODY
MyHTML_TAG_TD
MyHTML_TAG_TEMPLATE
MyHTML_TAG_TEXTAREA
MyHTML_TAG_TFOOT
MyHTML_TAG_TH
MyHTML_TAG_THEAD
MyHTML_TAG_TIME
MyHTML_TAG_TITLE
MyHTML_TAG_TR
MyHTML_TAG_TRACK
MyHTML_TAG_TT
MyHTML_TAG_U
MyHTML_TAG_UL
MyHTML_TAG_VAR
MyHTML_TAG_VIDEO
MyHTML_TAG_WBR
MyHTML_TAG_XMP
MyHTML_TAG_ALTGLYPH
MyHTML_TAG_ALTGLYPHDEF
MyHTML_TAG_ALTGLYPHITEM
MyHTML_TAG_ANIMATE
MyHTML_TAG_ANIMATECOLOR
MyHTML_TAG_ANIMATEMOTION
MyHTML_TAG_ANIMATETRANSFORM
MyHTML_TAG_CIRCLE
MyHTML_TAG_CLIPPATH
MyHTML_TAG_COLOR_PROFILE
MyHTML_TAG_CURSOR
MyHTML_TAG_DEFS
MyHTML_TAG_DESC
MyHTML_TAG_ELLIPSE
MyHTML_TAG_FEBLEND
MyHTML_TAG_FECOLORMATRIX
MyHTML_TAG_FECOMPONENTTRANSFER
MyHTML_TAG_FECOMPOSITE
MyHTML_TAG_FECONVOLVEMATRIX
MyHTML_TAG_FEDIFFUSELIGHTING
MyHTML_TAG_FEDISPLACEMENTMAP
MyHTML_TAG_FEDISTANTLIGHT
MyHTML_TAG_FEDROPSHADOW
MyHTML_TAG_FEFLOOD
MyHTML_TAG_FEFUNCA
MyHTML_TAG_FEFUNCB
MyHTML_TAG_FEFUNCG
MyHTML_TAG_FEFUNCR
MyHTML_TAG_FEGAUSSIANBLUR
MyHTML_TAG_FEIMAGE
MyHTML_TAG_FEMERGE
MyHTML_TAG_FEMERGENODE
MyHTML_TAG_FEMORPHOLOGY
MyHTML_TAG_FEOFFSET
MyHTML_TAG_FEPOINTLIGHT
MyHTML_TAG_FESPECULARLIGHTING
MyHTML_TAG_FESPOTLIGHT
MyHTML_TAG_FETILE
MyHTML_TAG_FETURBULENCE
MyHTML_TAG_FILTER
MyHTML_TAG_FONT_FACE
MyHTML_TAG_FONT_FACE_FORMAT
MyHTML_TAG_FONT_FACE_NAME
MyHTML_TAG_FONT_FACE_SRC
MyHTML_TAG_FONT_FACE_URI
MyHTML_TAG_FOREIGNOBJECT
MyHTML_TAG_G
MyHTML_TAG_GLYPH
MyHTML_TAG_GLYPHREF
MyHTML_TAG_HKERN
MyHTML_TAG_LINE
MyHTML_TAG_LINEARGRADIENT
MyHTML_TAG_MARKER
MyHTML_TAG_MASK
MyHTML_TAG_METADATA
MyHTML_TAG_MISSING_GLYPH
MyHTML_TAG_MPATH
MyHTML_TAG_PATH
MyHTML_TAG_PATTERN
MyHTML_TAG_POLYGON
MyHTML_TAG_POLYLINE
MyHTML_TAG_RADIALGRADIENT
MyHTML_TAG_RECT
MyHTML_TAG_SET
MyHTML_TAG_STOP
MyHTML_TAG_SWITCH
MyHTML_TAG_SYMBOL
MyHTML_TAG_TEXT
MyHTML_TAG_TEXTPATH
MyHTML_TAG_TREF
MyHTML_TAG_TSPAN
MyHTML_TAG_USE
MyHTML_TAG_VIEW
MyHTML_TAG_VKERN
MyHTML_TAG_MATH
MyHTML_TAG_MACTION
MyHTML_TAG_MALIGNGROUP
MyHTML_TAG_MALIGNMARK
MyHTML_TAG_MENCLOSE
MyHTML_TAG_MERROR
MyHTML_TAG_MFENCED
MyHTML_TAG_MFRAC
MyHTML_TAG_MGLYPH
MyHTML_TAG_MI
MyHTML_TAG_MLABELEDTR
MyHTML_TAG_MLONGDIV
MyHTML_TAG_MMULTISCRIPTS
MyHTML_TAG_MN
MyHTML_TAG_MO
MyHTML_TAG_MOVER
MyHTML_TAG_MPADDED
MyHTML_TAG_MPHANTOM
MyHTML_TAG_MROOT
MyHTML_TAG_MROW
MyHTML_TAG_MS
MyHTML_TAG_MSCARRIES
MyHTML_TAG_MSCARRY
MyHTML_TAG_MSGROUP
MyHTML_TAG_MSLINE
MyHTML_TAG_MSPACE
MyHTML_TAG_MSQRT
MyHTML_TAG_MSROW
MyHTML_TAG_MSTACK
MyHTML_TAG_MSTYLE
MyHTML_TAG_MSUB
MyHTML_TAG_MSUP
MyHTML_TAG_MSUBSUP
MyHTML_TAG__END_OF_FILE
MyHTML_TAG_FIRST_ENTRY
MyHTML_TAG_LAST_ENTRY
MyHTML_TREE_PARSE_FLAGS_CLEAN
MyHTML_TREE_PARSE_FLAGS_WITHOUT_BUILD_TREE
MyHTML_TREE_PARSE_FLAGS_WITHOUT_PROCESS_TOKEN
MyHTML_TREE_PARSE_FLAGS_SKIP_WHITESPACE_TOKEN
MyHTML_TREE_PARSE_FLAGS_WITHOUT_DOCTYPE_IN_TREE
MyHTML_NAMESPACE_UNDEF
MyHTML_NAMESPACE_HTML
MyHTML_NAMESPACE_MATHML
MyHTML_NAMESPACE_SVG
MyHTML_NAMESPACE_XLINK
MyHTML_NAMESPACE_XML
MyHTML_NAMESPACE_XMLNS
MyHTML_NAMESPACE_LAST_ENTRY
MyENCODING_DEFAULT
MyENCODING_NOT_DETERMINED
MyENCODING_UTF_8
MyENCODING_UTF_16LE
MyENCODING_UTF_16BE
MyENCODING_X_USER_DEFINED
MyENCODING_BIG5
MyENCODING_EUC_JP
MyENCODING_EUC_KR
MyENCODING_GB18030
MyENCODING_GBK
MyENCODING_IBM866
MyENCODING_ISO_2022_JP
MyENCODING_ISO_8859_10
MyENCODING_ISO_8859_13
MyENCODING_ISO_8859_14
MyENCODING_ISO_8859_15
MyENCODING_ISO_8859_16
MyENCODING_ISO_8859_2
MyENCODING_ISO_8859_3
MyENCODING_ISO_8859_4
MyENCODING_ISO_8859_5
MyENCODING_ISO_8859_6
MyENCODING_ISO_8859_7
MyENCODING_ISO_8859_8
MyENCODING_ISO_8859_8_I
MyENCODING_KOI8_R
MyENCODING_KOI8_U
MyENCODING_MACINTOSH
MyENCODING_SHIFT_JIS
MyENCODING_WINDOWS_1250
MyENCODING_WINDOWS_1251
MyENCODING_WINDOWS_1252
MyENCODING_WINDOWS_1253
MyENCODING_WINDOWS_1254
MyENCODING_WINDOWS_1255
MyENCODING_WINDOWS_1256
MyENCODING_WINDOWS_1257
MyENCODING_WINDOWS_1258
MyENCODING_WINDOWS_874
MyENCODING_X_MAC_CYRILLIC
MyENCODING_LAST_ENTRY
MyCORE_STATUS_OK
MyCORE_STATUS_ERROR
MyCORE_STATUS_ERROR_MEMORY_ALLOCATION
MyCORE_STATUS_THREAD_ERROR_MEMORY_ALLOCATION
MyCORE_STATUS_THREAD_ERROR_LIST_INIT
MyCORE_STATUS_THREAD_ERROR_ATTR_MALLOC
MyCORE_STATUS_THREAD_ERROR_ATTR_INIT
MyCORE_STATUS_THREAD_ERROR_ATTR_SET
MyCORE_STATUS_THREAD_ERROR_ATTR_DESTROY
MyCORE_STATUS_THREAD_ERROR_NO_SLOTS
MyCORE_STATUS_THREAD_ERROR_BATCH_INIT
MyCORE_STATUS_THREAD_ERROR_WORKER_MALLOC
MyCORE_STATUS_THREAD_ERROR_WORKER_SEM_CREATE
MyCORE_STATUS_THREAD_ERROR_WORKER_THREAD_CREATE
MyCORE_STATUS_THREAD_ERROR_MASTER_THREAD_CREATE
MyCORE_STATUS_THREAD_ERROR_SEM_PREFIX_MALLOC
MyCORE_STATUS_THREAD_ERROR_SEM_CREATE
MyCORE_STATUS_THREAD_ERROR_QUEUE_MALLOC
MyCORE_STATUS_THREAD_ERROR_QUEUE_NODES_MALLOC
MyCORE_STATUS_THREAD_ERROR_QUEUE_NODE_MALLOC
MyCORE_STATUS_THREAD_ERROR_MUTEX_MALLOC
MyCORE_STATUS_THREAD_ERROR_MUTEX_INIT
MyCORE_STATUS_THREAD_ERROR_MUTEX_LOCK
MyCORE_STATUS_THREAD_ERROR_MUTEX_UNLOCK
MyCORE_STATUS_PERF_ERROR_COMPILED_WITHOUT_PERF
MyCORE_STATUS_PERF_ERROR_FIND_CPU_CLOCK
MyCORE_STATUS_MCOBJECT_ERROR_CACHE_CREATE
MyCORE_STATUS_MCOBJECT_ERROR_CHUNK_CREATE
MyCORE_STATUS_MCOBJECT_ERROR_CHUNK_INIT
MyCORE_STATUS_MCOBJECT_ERROR_CACHE_REALLOC
MyCORE_STATUS_ASYNC_ERROR_LOCK
MyCORE_STATUS_ASYNC_ERROR_UNLOCK
MyCORE_STATUS_ERROR_NO_FREE_SLOT
MyHTML_STATUS_OK
MyHTML_STATUS_ERROR
MyHTML_STATUS_ERROR_MEMORY_ALLOCATION
MyHTML_STATUS_RULES_ERROR_MEMORY_ALLOCATION
MyHTML_STATUS_TOKENIZER_ERROR_MEMORY_ALLOCATION
MyHTML_STATUS_TOKENIZER_ERROR_FRAGMENT_INIT
MyHTML_STATUS_TAGS_ERROR_MEMORY_ALLOCATION
MyHTML_STATUS_TAGS_ERROR_MCOBJECT_CREATE
MyHTML_STATUS_TAGS_ERROR_MCOBJECT_MALLOC
MyHTML_STATUS_TAGS_ERROR_MCOBJECT_CREATE_NODE
MyHTML_STATUS_TAGS_ERROR_CACHE_MEMORY_ALLOCATION
MyHTML_STATUS_TAGS_ERROR_INDEX_MEMORY_ALLOCATION
MyHTML_STATUS_TREE_ERROR_MEMORY_ALLOCATION
MyHTML_STATUS_TREE_ERROR_MCOBJECT_CREATE
MyHTML_STATUS_TREE_ERROR_MCOBJECT_INIT
MyHTML_STATUS_TREE_ERROR_MCOBJECT_CREATE_NODE
MyHTML_STATUS_TREE_ERROR_INCOMING_BUFFER_CREATE
MyHTML_STATUS_ATTR_ERROR_ALLOCATION
MyHTML_STATUS_ATTR_ERROR_CREATE
MyHTML_STATUS_STREAM_BUFFER_ERROR_CREATE
MyHTML_STATUS_STREAM_BUFFER_ERROR_INIT
MyHTML_STATUS_STREAM_BUFFER_ENTRY_ERROR_CREATE
MyHTML_STATUS_STREAM_BUFFER_ENTRY_ERROR_INIT
MyHTML_STATUS_STREAM_BUFFER_ERROR_ADD_ENTRY
MyENCODING_STATUS_OK
MyENCODING_STATUS_ERROR
MyENCODING_STATUS_CONTINUE
MyENCODING_STATUS_DONE
MyHTML_OPTIONS_DEFAULT
MyHTML_OPTIONS_PARSE_MODE_SINGLE
MyHTML_OPTIONS_PARSE_MODE_ALL_IN_ONE
MyHTML_OPTIONS_PARSE_MODE_SEPARATELY
See example directory in current module
$myhtml->destroy();
Free mem and destroy object.
Alexander Borisov [email protected]
Copyright (C) 2015-2018 Alexander Borisov
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA