Native C CommonMark + GitHub Flavored Markdown parser for PHP. ~10-20Γ faster than pure-PHP alternatives (Parsedown, cebe, michelf) on a clean optimized build, targeting CommonMark 0.31 (652/652 spec examples pass; see docs/spec-coverage.md). GFM extensions: tables, strikethrough, task lists, autolinks, tagfilter. Installable via PIE (the PHP Foundation's PECL successor); ships as a single .so. PHP 8.2 minimum, OO API with final classes and readonly options.
# PIE (PHP Foundation's extension installer; uses the composer.json
# at the repo root with type: "php-ext")
pie install iliaal/mdparserOn a minimal PHP image (e.g. php:8.x-cli from Docker Hub), PIE needs a few build tools installed first:
# Debian/Ubuntu
sudo apt install -y git bison libtool-bin
# macOS
brew install bison libtoolgit clone https://github.com/iliaal/mdparser.git
cd mdparser
phpize && ./configure --enable-mdparser
make -j
sudo make install
echo 'extension=mdparser.so' | sudo tee /etc/php/conf.d/mdparser.iniPre-built DLLs for PHP 8.3, 8.4, and 8.5 (TS/NTS, x86/x64) are attached to each GitHub release.
use MdParser\Parser;
use MdParser\Options;
// Default parser: safe mode on, GFM extensions on.
$parser = new Parser();
echo $parser->toHtml('# Hello');
// <h1>Hello</h1>
// Custom options via named arguments. All fields readonly.
$parser = new Parser(new Options(
smart: true, // --- -> em dash, -- -> en dash, "..." -> curly
footnotes: true, // enable [^ref] / [^ref]: syntax
unsafe: false, // raw HTML is escaped (default)
));
echo $parser->toHtml($markdown);
// Three output formats from one parser.
$html = $parser->toHtml($markdown);
$xml = $parser->toXml($markdown); // CommonMark XML, DOCTYPE-wrapped
$ast = $parser->toAst($markdown); // nested arrays, see below
// AST shape is documented in tests/006_ast.phpt. Brief example:
// [
// 'type' => 'document',
// 'children' => [
// ['type' => 'heading', 'level' => 1, 'children' => [
// ['type' => 'text', 'literal' => 'Hello'],
// ]],
// ],
// ]Against the major pure-PHP Markdown libraries, on PHP 8.4 (clean optimized build, each parser in its default configuration):
| Corpus | mdparser ops/sec | Best pure-PHP ops/sec | Speedup |
|---|---|---|---|
| 200 B | ~530,000 | ~26,000 (Parsedown) | ~20Γ |
| 1.8 KB | ~110,000 | ~6,000 (cebe/GitHub) | ~19Γ |
| 200 KB | ~980 | ~95 (cebe/GitHub) | ~10Γ |
~10-20Γ faster across the corpora (up to ~45Γ vs the slowest), from small messages to full 200 KB spec documents. bench/README.md is the source of truth: methodology, all parsers, caveats, league/commonmark notes, and how to reproduce. (Always benchmark a clean optimized PHP build β a debug/ASan build inflates these numbers.)
Comparison with the major pure-PHP Markdown libraries. "via ext" means the feature exists but requires opting in to a non-default extension; "Extra" means the feature ships in the library's Markdown Extra dialect, not its base mode; "β" means the feature is not supported at all.
| Feature | mdparser | Parsedown | league/cm core | cebe GFM | michelf Extra | Ciconia |
|---|---|---|---|---|---|---|
| CommonMark core | β | partial | β | partial | partial | partial |
| Fenced code blocks | β | β | β | β | β | β |
| GFM tables | β | β | via ext | β | via Extra | β |
| Strikethrough | β | β | via ext | β | β | β |
| Task lists | β | β | via ext | β | β | β |
| Autolinks (bare URL) | β | β | via ext | β | β | β |
<script> tag filter |
β (tagfilter) | β (escaped) | via ext | partial | β | β |
| Smart punctuation | β (Options::smart) |
β | via ext | β | β | β |
| Footnotes | β (Options::footnotes) |
Extra | via ext | β | β Extra | plugin |
| Hardbreaks/nobreaks | β | β | β | β | β | β |
| Sourcepos | β | β | β | β | β | β |
| Heading anchors | β (Options::headingAnchors) |
β | via ext | β | β | β |
rel="nofollow" |
β (Options::nofollowLinks) |
β | via ext | β | β | β |
| HTML output | β | β | β | β | β | β |
| XML output | β | β | β | β | β | β |
| AST output | β (arrays) | β | β (objects) | β | β | β |
Beyond CommonMark + GFM, md4c ships several dialect extensions, each exposed as an opt-in Options flag (all default off, so the standard CommonMark + GFM parse is unaffected): latexMath ($inline$, $$block$$), wikiLinks ([[target]]), spoilers (||text||), underline, highlight (==text==), superscript (^text^), subscript (~text~), and admonitions (GitHub-style > [!NOTE] alert blocks). Plus parser-behavior toggles (noIndentedCodeBlocks, permissiveAtxHeadings, collapseWhitespace). See docs/options.md for behavior and edge cases.
mdparser is deliberately scoped to CommonMark core plus the GFM extensions. It does not cover the "Markdown Extra" family of features that Parsedown Extra, michelf Markdown Extra, and league/commonmark's optional extensions offer. If you need any of the following, reach for league/commonmark, the most actively-maintained pure-PHP option for extended Markdown:
- Definition lists (
Term :: definition) - Abbreviations (
*[HTML]: ...) - Attribute syntax (
{.class #id key="val"}) - Permalink anchor markup (we emit heading
idslugs; we don't inject the inner<a class="anchor">element GitHub uses for permalinks) - Table of contents
- YAML front matter
- Mentions (
@user) - Emoji (
:smile:) - Fenced admonition containers (
::: warning); GitHub-style> [!NOTE]alert blocks are supported viaOptions::admonitions
These are real features. They're just out of scope for a CommonMark+GFM core parser.
Options::unsafe = true tells the renderer to pass raw HTML through verbatim instead of escaping or stripping it. The contract for this mode is that you own the input: it is yours, or it comes from a pipeline you trust. headingAnchors and nofollowLinks are applied in-stream as md4c parses the source, so they touch only Markdown-derived nodes; raw HTML you write directly is emitted verbatim and is never rewritten:
- Heading anchors apply to Markdown headings only. A
# headinggets anidslug. A raw<h1>x</h1>block written directly in the source (possible underunsafe: true, tagfilter: false) is raw HTML, not a parsed heading node, so it is emitted untouched and gets no id. A raw heading and a later Markdown heading with the same text do not collide. nofollowLinksapplies to Markdown links only. Inline links, reference links, and autolinks getrel="nofollow noopener noreferrer"; in-document fragment anchors (href="#...", including footnote references and backrefs) are skipped. A raw<a href="...">written directly in the source is passed through verbatim rather than rewritten β sanitize raw HTML yourself if you allow it.
Parser::toXml() and Parser::toAst() return structural representations of the parsed document. Link / image url fields and html_block / html_inline literal text are preserved; XML output escapes those bytes as XML text, while AST output returns them byte-for-byte. The unsafe, tagfilter, and URL-scheme defenses do not make these structural outputs safe to transform back into HTML. If you build HTML out of XML or AST data yourself, you own the sanitization: apply a URL scheme allowlist before emitting href, and run HTML through a sanitizer before emitting raw html_block / html_inline literal text. See docs/ast.md for examples.
Companion native PHP extensions for high-throughput PHP workloads:
- php_excel: native Excel I/O. 7-10Γ faster than PhpSpreadsheet, full XLS/XLSX with formulas, formatting, and styling. Powered by LibXL.
- php_clickhouse: native ClickHouse client speaking the wire protocol directly. Picks up where SeasClick left off.
- fastchart: native chart-rendering extension. 26 chart types behind one fluent OO API, SVG-canonical with PNG/JPG/WebP output (no libgd dependency).
Full background, design rationale, and benchmark methodology in the launch post: mdparser: A Native CommonMark + GFM Parser for PHP.
- Wrapper code (
mdparser*.c,php_mdparser.h) under BSD 3-Clause. - Embedded md4c sources under the MIT license. See
LICENSEfor aggregated notices.
Follow @iliaa on X β’ Blog β’ If this sped up your stack, β star it!
