Markdown
Markdown Parser Class
- Full name:
\Michelf\Markdown
- This class implements:
\Michelf\MarkdownInterface
Constants
Constant | Visibility | Type | Value |
---|---|---|---|
MARKDOWNLIB_VERSION |
public | string | "2.0.0" |
Properties
empty_element_suffix
Change to ">" for HTML output.
tab_width
The width of indentation of the output markup
no_markup
Change to true
to disallow markup or entities.
no_entities
hard_wrap
Change to true
to enable line breaks on \n without two trailling spaces
predef_urls
Predefined URLs and titles for reference links and images.
predef_titles
url_filter_func
Optional filter function for URLs
header_id_func
Optional header id="" generation callback function.
code_block_content_func
Optional function for converting code block content to HTML
code_span_content_func
Optional function for converting code span content to HTML.
enhanced_ordered_list
Class attribute to toggle "enhanced ordered list" behaviour setting this to true will allow ordered lists to start from the index number that is defined first.
For example: 2. List item two 3. List item three
Becomes:
- List item two
- List item three
nested_brackets_depth
Regex to match balanced [brackets].
Needed to insert a maximum bracked depth while converting to PHP.
nested_brackets_re
nested_url_parenthesis_depth
nested_url_parenthesis_re
escape_chars
Table of hash values for escaped characters:
escape_chars_re
urls
Internal hashes used during transformation.
titles
html_hashes
in_anchor
Status flag to avoid invalid nesting.
in_emphasis_processing
Status flag to avoid invalid nesting.
document_gamut
Define the document gamut
block_gamut
Define the block gamut - these are all the transformations that form block-level tags like paragraphs, headers, and list items.
span_gamut
These are all the transformations that occur within block-level tags like paragraphs, headers, and list items.
list_level
Nesting tracker for list levels
em_relist
Define the emphasis operators with their regex matches
strong_relist
Define the strong operators with their regex matches
em_strong_relist
Define the emphasis + strong operators with their regex matches
em_strong_prepared_relist
Container for prepared regular expressions
utf8_strlen
String length function for detab. _initDetab
will create a function to
handle UTF-8 if the default function does not exist.
can be a string or function
Methods
defaultTransform
Simple function interface - Initialize the parser and return the result of its transform method. This will work fine for derived classes too.
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
__construct
Constructor function. Initialize appropriate member variables.
setup
Called before the transformation process starts to setup parser states.
teardown
Called after the transformation process to clear any variable which may be taking up memory unnecessarly.
transform
Main function. Performs some preprocessing on the input text and pass it through the document gamut.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
stripLinkDefinitions
Strips link definitions from text, stores the URLs and titles in hash references
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
_stripLinkDefinitions_callback
The callback to strip link definitions
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
hashHTMLBlocks
Hashify HTML blocks
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
_hashHTMLBlocks_callback
The callback for hashing HTML blocks
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
string |
hashPart
Called whenever a tag must be hashed when a function insert an atomic element in the text stream. Passing $text to through this function gives a unique text-token which will be reverted back when calling unhash.
The $boundary argument specify what character should be used to surround the token. By convension, "B" is used for block elements that needs not to be wrapped into paragraph tags at the end, ":" is used for elements that are word separators and "X" is used in the general case.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string | |
$boundary |
string |
hashBlock
Shortcut function for hashPart with block-level boundaries.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
runBlockGamut
Run block gamut tranformations.
We need to escape raw HTML in Markdown source before doing anything else. This need to be done for each block, and not only at the begining in the Markdown function since hashed blocks can be part of list items and could have been indented. Indented blocks would have been seen as a code block in a previous pass of hashHTMLBlocks.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
runBasicBlockGamut
Run block gamut tranformations, without hashing HTML blocks. This is useful when HTML blocks are known to be already hashed, like in the first whole-document pass.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
doHorizontalRules
Convert horizontal rules
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
runSpanGamut
Run span gamut transformations
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
doHardBreaks
Do hard breaks
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
_doHardBreaks_callback
Trigger part hashing for the hard break (callback method)
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
doAnchors
Turn Markdown link shortcuts into XHTML tags.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
_doAnchors_reference_callback
Callback method to parse referenced anchors
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
_doAnchors_inline_callback
Callback method to parse inline anchors
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
doImages
Turn Markdown image shortcuts into tags.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
_doImages_reference_callback
Callback to parse references image tags
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
_doImages_inline_callback
Callback to parse inline image tags
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
doHeaders
Parse Markdown heading elements to HTML
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
_doHeaders_callback_setext
Setext header parsing callback
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
_doHeaders_callback_atx
ATX header parsing callback
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
_generateIdFromHeaderValue
If a header_id_func property is set, we can use it to automatically generate an id attribute.
This method returns a string in the form id="foo", or an empty string otherwise.
Parameters:
Parameter | Type | Description |
---|---|---|
$headerValue |
string |
doLists
Form HTML ordered (numbered) and unordered (bulleted) lists.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
_doLists_callback
List parsing callback
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
processListItems
Process the contents of a single ordered or unordered list, splitting it into individual list items.
Parameters:
Parameter | Type | Description |
---|---|---|
$list_str |
string | |
$marker_any_re |
string |
_processListItems_callback
List item parsing callback
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
doCodeBlocks
Process Markdown <pre><code>
blocks.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
_doCodeBlocks_callback
Code block parsing callback
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
makeCodeSpan
Create a code span markup for $code. Called from handleSpanToken.
Parameters:
Parameter | Type | Description |
---|---|---|
$code |
string |
prepareItalicsAndBold
Prepare regular expressions for searching emphasis tokens in any context.
doItalicsAndBold
Convert Markdown italics (emphasis) and bold (strong) to HTML
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
doBlockQuotes
Parse Markdown blockquotes to HTML
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
_doBlockQuotes_callback
Blockquote parsing callback
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
_doBlockQuotes_callback2
Blockquote parsing callback
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
formParagraphs
Parse paragraphs
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string | String to process in paragraphs |
$wrap_in_p |
bool | Whether paragraphs should be wrapped in <p> tags |
encodeAttribute
Encode text for a double-quoted HTML attribute. This function is not suitable for attributes enclosed in single quotes.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
encodeURLAttribute
Encode text for a double-quoted HTML attribute containing a URL, applying the URL filter if set. Also generates the textual representation for the URL (removing mailto: or tel:) storing it in $text.
This function is not suitable for attributes enclosed in single quotes.
Parameters:
Parameter | Type | Description |
---|---|---|
$url |
string | |
$text |
string | Passed by reference |
Return Value:
URL
encodeAmpsAndAngles
Smart processing for ampersands and angle brackets that need to be encoded. Valid character entities are left alone unless the no-entities mode is set.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
doAutoLinks
Parse Markdown automatic links to anchor HTML tags
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
_doAutoLinks_url_callback
Parse URL callback
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
_doAutoLinks_email_callback
Parse email address callback
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
encodeEntityObfuscatedAttribute
Input: some text to obfuscate, e.g. "mailto:foo@example.com"
protected encodeEntityObfuscatedAttribute(string $text, string& $tail = null, int $head_length): string
Output: the same text but with most characters encoded as either a decimal or hex entity, in the hopes of foiling most address harvesting spam bots. E.g.:
mailto:foo
@example.co
m
Note: the additional output $tail is assigned the same value as the ouput, minus the number of characters specified by $head_length.
Based by a filter by Matthew Wickline, posted to BBEdit-Talk. With some optimizations by Milian Wolff. Forced encoding of HTML attribute special characters by Allan Odgaard.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string | |
$tail |
string | Passed by reference |
$head_length |
int |
parseSpan
Take the string $str and parse it into tokens, hashing embeded HTML, escaped characters and handling code spans.
Parameters:
Parameter | Type | Description |
---|---|---|
$str |
string |
handleSpanToken
Handle $token provided by parseSpan by determining its nature and returning the corresponding value that should replace it.
Parameters:
Parameter | Type | Description |
---|---|---|
$token |
string | |
$str |
string | Passed by reference |
outdent
Remove one level of line-leading tabs or spaces
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
detab
Replace tabs with the appropriate amount of spaces.
For each line we separate the line in blocks delemited by tab characters. Then we reconstruct every line by adding the appropriate number of space between each blocks.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
_detab_callback
Replace tabs callback
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
string |
_initDetab
Check for the availability of the function in the utf8_strlen
property
(initially mb_strlen
). If the function is not available, create a
function that will loosely count the number of UTF-8 characters with a
regular expression.
unhash
Swap back in all the tags hashed by _HashHTMLBlocks.
Parameters:
Parameter | Type | Description |
---|---|---|
$text |
string |
_unhash_callback
Unhashing callback
Parameters:
Parameter | Type | Description |
---|---|---|
$matches |
array |
Automatically generated on 2025-03-18