Skip to content

HTMLPurifier_Lexer_DOMLex

Parser that uses PHP 5's DOM extension (part of the core).

In PHP 5, the DOM XML extension was revamped into DOM and added to the core. It gives us a forgiving HTML parser, which we use to transform the HTML into a DOM, and then into the tokens. It is blazingly fast (for large documents, it performs twenty times faster than HTMLPurifier_Lexer_DirectLex,and is the default choice for PHP 5.

Properties

factory

private $factory

Methods

__construct

public __construct(): mixed

tokenizeHTML

Lexes an HTML string into tokens.

public tokenizeHTML(string $html, \HTMLPurifier_Config $config, \HTMLPurifier_Context $context): \HTMLPurifier_Token[]

Parameters:

Parameter Type Description
$html string
$config \HTMLPurifier_Config
$context \HTMLPurifier_Context

tokenizeDOM

Iterative function that tokenizes a node, putting it into an accumulator.

protected tokenizeDOM(\DOMNode $node, \HTMLPurifier_Token[]& $tokens, mixed $config): mixed

To iterate is human, to recurse divine - L. Peter Deutsch

Parameters:

Parameter Type Description
$node \DOMNode DOMNode to be tokenized.
$tokens \HTMLPurifier_Token[] Array-list of already tokenized tokens.
$config mixed

getTagName

Portably retrieve the tag name of a node; deals with older versions of libxml like 2.7.6

protected getTagName(\DOMNode $node): mixed

Parameters:

Parameter Type Description
$node \DOMNode

getData

Portably retrieve the data of a node; deals with older versions of libxml like 2.7.6

protected getData(\DOMNode $node): mixed

Parameters:

Parameter Type Description
$node \DOMNode

createStartNode

protected createStartNode(\DOMNode $node, \HTMLPurifier_Token[]& $tokens, bool $collect, mixed $config): bool

Parameters:

Parameter Type Description
$node \DOMNode DOMNode to be tokenized.
$tokens \HTMLPurifier_Token[] Array-list of already tokenized tokens.
$collect bool Says whether or start and close are collected, set to
false at first recursion because it's the implicit DIV
tag you're dealing with.
$config mixed

Return Value:

if the token needs an endtoken


createEndNode

protected createEndNode(\DOMNode $node, \HTMLPurifier_Token[]& $tokens): mixed

Parameters:

Parameter Type Description
$node \DOMNode
$tokens \HTMLPurifier_Token[]

transformAttrToAssoc

Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.

protected transformAttrToAssoc(\DOMNamedNodeMap $node_map): array

Parameters:

Parameter Type Description
$node_map \DOMNamedNodeMap DOMNamedNodeMap of DOMAttr objects.

Return Value:

Associative array of attributes.


muteErrorHandler

An error handler that mutes all errors

public muteErrorHandler(int $errno, string $errstr): mixed

Parameters:

Parameter Type Description
$errno int
$errstr string

callbackUndoCommentSubst

Callback function for undoing escaping of stray angled brackets in comments

public callbackUndoCommentSubst(array $matches): string

Parameters:

Parameter Type Description
$matches array

callbackArmorCommentEntities

Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them

public callbackArmorCommentEntities(array $matches): string

Parameters:

Parameter Type Description
$matches array

wrapHTML

Wraps an HTML fragment in the necessary HTML

protected wrapHTML(string $html, \HTMLPurifier_Config $config, \HTMLPurifier_Context $context, mixed $use_div = true): string

Parameters:

Parameter Type Description
$html string
$config \HTMLPurifier_Config
$context \HTMLPurifier_Context
$use_div mixed

Inherited methods

create

Retrieves or sets the default Lexer as a Prototype Factory.

public static create(\HTMLPurifier_Config $config): \HTMLPurifier_Lexer

By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.

  • This method is static.

Parameters:

Parameter Type Description
$config \HTMLPurifier_Config

Throws:


__construct

public __construct(): mixed

parseText

public parseText(mixed $string, mixed $config): mixed

Parameters:

Parameter Type Description
$string mixed
$config mixed

parseAttr

public parseAttr(mixed $string, mixed $config): mixed

Parameters:

Parameter Type Description
$string mixed
$config mixed

parseData

Parses special entities into the proper characters.

public parseData(string $string, mixed $is_attr, mixed $config): string

This string will translate escaped versions of the special characters into the correct ones.

Parameters:

Parameter Type Description
$string string String character data to be parsed.
$is_attr mixed
$config mixed

Return Value:

Parsed character data.


tokenizeHTML

Lexes an HTML string into tokens.

public tokenizeHTML(mixed $string, \HTMLPurifier_Config $config, \HTMLPurifier_Context $context): \HTMLPurifier_Token[]

Parameters:

Parameter Type Description
$string mixed String HTML.
$config \HTMLPurifier_Config
$context \HTMLPurifier_Context

Return Value:

array representation of HTML.


escapeCDATA

Translates CDATA sections into regular sections (through escaping).

protected static escapeCDATA(string $string): string
  • This method is static.

Parameters:

Parameter Type Description
$string string HTML string to process.

Return Value:

HTML with CDATA sections escaped.


escapeCommentedCDATA

Special CDATA case that is especially convoluted for