Skip to content

HTMLPurifier_Lexer_DirectLex

Our in-house implementation of a parser.

A pure PHP parser, DirectLex has absolutely no dependencies, making it a reasonably good default for PHP4. Written with efficiency in mind, it can be four times faster than HTMLPurifier_Lexer_PEARSax3, although it pales in comparison to HTMLPurifier_Lexer_DOMLex.

Properties

tracksLineNumbers

Whether or not this lexer implements line-number/column-number tracking.

public $tracksLineNumbers

_whitespace

Whitespace characters for str(c)spn.

protected $_whitespace

Methods

scriptCallback

Callback function for script CDATA fudge

protected scriptCallback(array $matches): string

Parameters:

Parameter Type Description
$matches array , in form of array(opening tag, contents, closing tag)

tokenizeHTML

Lexes an HTML string into tokens.

public tokenizeHTML(string $html, \HTMLPurifier_Config $config, \HTMLPurifier_Context $context): array|\HTMLPurifier_Token[]

Parameters:

Parameter Type Description
$html string
$config \HTMLPurifier_Config
$context \HTMLPurifier_Context

substrCount

PHP 5.0.x compatible substr_count that implements offset and length

protected substrCount(string $haystack, string $needle, int $offset, int $length): int

Parameters:

Parameter Type Description
$haystack string
$needle string
$offset int
$length int

parseAttributeString

Takes the inside of an HTML tag and makes an assoc array of attributes.

public parseAttributeString(string $string, \HTMLPurifier_Config $config, \HTMLPurifier_Context $context): array

Parameters:

Parameter Type Description
$string string Inside of tag excluding name.
$config \HTMLPurifier_Config
$context \HTMLPurifier_Context

Return Value:

Assoc array of attributes.


Inherited methods

create

Retrieves or sets the default Lexer as a Prototype Factory.

public static create(\HTMLPurifier_Config $config): \HTMLPurifier_Lexer

By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.

  • This method is static.

Parameters:

Parameter Type Description
$config \HTMLPurifier_Config

Throws:


__construct

public __construct(): mixed

parseText

public parseText(mixed $string, mixed $config): mixed

Parameters:

Parameter Type Description
$string mixed
$config mixed

parseAttr

public parseAttr(mixed $string, mixed $config): mixed

Parameters:

Parameter Type Description
$string mixed
$config mixed

parseData

Parses special entities into the proper characters.

public parseData(string $string, mixed $is_attr, mixed $config): string

This string will translate escaped versions of the special characters into the correct ones.

Parameters:

Parameter Type Description
$string string String character data to be parsed.
$is_attr mixed
$config mixed

Return Value:

Parsed character data.


tokenizeHTML

Lexes an HTML string into tokens.

public tokenizeHTML(mixed $string, \HTMLPurifier_Config $config, \HTMLPurifier_Context $context): \HTMLPurifier_Token[]

Parameters:

Parameter Type Description
$string mixed String HTML.
$config \HTMLPurifier_Config
$context \HTMLPurifier_Context

Return Value:

array representation of HTML.


escapeCDATA

Translates CDATA sections into regular sections (through escaping).

protected static escapeCDATA(string $string): string
  • This method is static.

Parameters:

Parameter Type Description
$string string HTML string to process.

Return Value:

HTML with CDATA sections escaped.


escapeCommentedCDATA

Special CDATA case that is especially convoluted for