HTMLPurifier_Lexer_PH5P
Experimental HTML5-based parser using Jeroen van der Meer's PH5P library.
Occupies space in the HTML5 pseudo-namespace, which may cause conflicts.
- Full name:
\HTMLPurifier_Lexer_PH5P
- Parent class:
\HTMLPurifier_Lexer_DOMLex
Methods
tokenizeHTML
Lexes an HTML string into tokens.
public tokenizeHTML(string $html, \HTMLPurifier_Config $config, \HTMLPurifier_Context $context): \HTMLPurifier_Token[]
Parameters:
Parameter | Type | Description |
---|---|---|
$html |
string | |
$config |
\HTMLPurifier_Config | |
$context |
\HTMLPurifier_Context |
Inherited methods
create
Retrieves or sets the default Lexer as a Prototype Factory.
By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$config |
\HTMLPurifier_Config |
Throws:
__construct
parseText
Parameters:
Parameter | Type | Description |
---|---|---|
$string |
mixed | |
$config |
mixed |
parseAttr
Parameters:
Parameter | Type | Description |
---|---|---|
$string |
mixed | |
$config |
mixed |
parseData
Parses special entities into the proper characters.
This string will translate escaped versions of the special characters into the correct ones.
Parameters:
Parameter | Type | Description |
---|---|---|
$string |
string | String character data to be parsed. |
$is_attr |
mixed | |
$config |
mixed |
Return Value:
Parsed character data.
tokenizeHTML
Lexes an HTML string into tokens.
public tokenizeHTML(string $html, \HTMLPurifier_Config $config, \HTMLPurifier_Context $context): \HTMLPurifier_Token[]
Parameters:
Parameter | Type | Description |
---|---|---|
$html |
string | |
$config |
\HTMLPurifier_Config | |
$context |
\HTMLPurifier_Context |
escapeCDATA
Translates CDATA sections into regular sections (through escaping).
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$string |
string | HTML string to process. |
Return Value:
HTML with CDATA sections escaped.
escapeCommentedCDATA
Special CDATA case that is especially convoluted for
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$string |
string | HTML string to process. |
Return Value:
HTML with CDATA sections escaped.
removeIEConditional
Special Internet Explorer conditional comments should be removed.
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$string |
string | HTML string to process. |
Return Value:
HTML with conditional comments removed.
CDATACallback
Callback function for escapeCDATA() that does the work.
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array | PCRE matches array, with index 0 the entire match and 1 the inside of the CDATA section. |
Return Value:
Escaped internals of the CDATA section.
normalize
Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.
public normalize(string $html, \HTMLPurifier_Config $config, \HTMLPurifier_Context $context): string
Parameters:
Parameter | Type | Description |
---|---|---|
$html |
string | HTML. |
$config |
\HTMLPurifier_Config | |
$context |
\HTMLPurifier_Context |
extractBody
Takes a string of HTML (fragment or document) and returns the content
Parameters:
Parameter | Type | Description |
---|---|---|
$html |
mixed |
tokenizeDOM
Iterative function that tokenizes a node, putting it into an accumulator.
To iterate is human, to recurse divine - L. Peter Deutsch
Parameters:
Parameter | Type | Description |
---|---|---|
$node |
\DOMNode | DOMNode to be tokenized. |
$tokens |
\HTMLPurifier_Token[] | Array-list of already tokenized tokens. |
$config |
mixed |
getTagName
Portably retrieve the tag name of a node; deals with older versions of libxml like 2.7.6
Parameters:
Parameter | Type | Description |
---|---|---|
$node |
\DOMNode |
getData
Portably retrieve the data of a node; deals with older versions of libxml like 2.7.6
Parameters:
Parameter | Type | Description |
---|---|---|
$node |
\DOMNode |
createStartNode
protected createStartNode(\DOMNode $node, \HTMLPurifier_Token[]& $tokens, bool $collect, mixed $config): bool
Parameters:
Parameter | Type | Description |
---|---|---|
$node |
\DOMNode | DOMNode to be tokenized. |
$tokens |
\HTMLPurifier_Token[] | Array-list of already tokenized tokens. |
$collect |
bool | Says whether or start and close are collected, set to false at first recursion because it's the implicit DIV tag you're dealing with. |
$config |
mixed |
Return Value:
if the token needs an endtoken
createEndNode
Parameters:
Parameter | Type | Description |
---|---|---|
$node |
\DOMNode | |
$tokens |
\HTMLPurifier_Token[] |
transformAttrToAssoc
Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.
Parameters:
Parameter | Type | Description |
---|---|---|
$node_map |
\DOMNamedNodeMap | DOMNamedNodeMap of DOMAttr objects. |
Return Value:
Associative array of attributes.
muteErrorHandler
An error handler that mutes all errors
Parameters:
Parameter | Type | Description |
---|---|---|
$errno |
int | |
$errstr |
string |
callbackUndoCommentSubst
Callback function for undoing escaping of stray angled brackets in comments
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
callbackArmorCommentEntities
Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
wrapHTML
Wraps an HTML fragment in the necessary HTML
protected wrapHTML(string $html, \HTMLPurifier_Config $config, \HTMLPurifier_Context $context, mixed $use_div = true): string
Parameters:
Parameter | Type | Description |
---|---|---|
$html |
string | |
$config |
\HTMLPurifier_Config | |
$context |
\HTMLPurifier_Context | |
$use_div |
mixed |
Automatically generated on 2025-03-18