This library provides HTML5 element definitions for HTML Purifier, compliant with the WHATWG spec.
It is the most complete HTML5-compliant solution among all based on HTML Purifier. Apart from providing the most extensive set of element definitions, it provides tidy/sanitization rules for transforming the input into a valid HTML5 output.
Install with Composer by running the following command:
composer require xemlock/htmlpurifier-html5
The most basic usage is similar to the original HTML Purifier. Create a HTML5-compatible config
using HTMLPurifier_HTML5Config::createDefault()
factory method, and then pass it to an HTMLPurifier
instance:
$config = HTMLPurifier_HTML5Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html5 = $purifier->purify($dirty_html5);
To modify the config you can either instantiate the config with a configuration array passed to
HTMLPurifier_HTML5Config::create()
, or by calling set
method on an already existing config instance.
For example, to allow IFRAME
s with Youtube videos you can do the following:
$config = HTMLPurifier_HTML5Config::create(array(
'HTML.SafeIframe' => true,
'URI.SafeIframeRegexp' => '%^//www\.youtube\.com/embed/%',
));
or equivalently:
$config = HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.SafeIframe', true);
$config->set('URI.SafeIframeRegexp', '%^//www\.youtube\.com/embed/%');
Apart from HTML Purifier's built-in configuration directives, the following new directives are also supported:
-
Attr.AllowedInputTypes
Version added: 0.1.12
Type: Lookup (or null)
Default:null
List of allowed input types, chosen from the types defined in the spec. By default, the setting is
null
, meaning there is no restriction on allowed types. Empty array means that no explicittype
attributes are allowed, effectively making all inputs a text inputs. -
HTML.Forms
Version added: 0.1.12
Type: Boolean
Default:false
Whether or not to permit form elements in the user input, regardless of %HTML.Trusted value. Please be very careful when using this functionality, as enabling forms in untrusted documents may allow for phishing attacks.
-
HTML.IframeAllowFullscreen
Version added: 0.1.11
Type: Boolean
Default:false
Whether or not to permit
allowfullscreen
attribute oniframe
tags. It requires either %HTML.SafeIframe or %HTML.Trusted to betrue
. -
HTML.Link
Version added: 0.1.12
Type: Boolean
Default:false
Permit the
link
tags in the user input, regardless of %HTML.Trusted value. This effectively allowslink
tags without allowing other untrusted elements.If enabled, URIs in
link
tags will not be matched against a whitelist specified in %URI.SafeLinkRegexp (unless %HTML.SafeIframe is also enabled). -
HTML.SafeLink
Version added: 0.1.12
Type: Boolean
Default:false
Whether to permit
link
tags in untrusted documents. This directive must be accompanied by a whitelist of permitted URIs via %URI.SafeLinkRegexp, otherwise nolink
tags will be allowed. -
HTML.XHTML
Version added: 0.1.12
Type: Boolean
Default:false
While deprecated in HTML 4.01 / XHTML 1.0 context, in HTML5 it's used for enabling support for namespaced attributes and XML self-closing tags.
When enabled it causes
xml:lang
attribute to take precedence overlang
, when both attributes are present on the same element. -
URI.SafeLinkRegexp
Version added: 0.1.12
Type: String
Default:null
A PCRE regular expression that will be matched against a
<link>
URI. This directive only has an effect if %HTML.SafeLink is enabled. Here are some example values:%^https?://localhost/%
- Allow localhost URIsUse
Attr.AllowedRel
to control permitted link relationship types.
Aside from HTML elements supported originally by HTML Purifier, this library adds support for the following HTML5 elements:
<article>
, <aside>
, <audio>
, <bdi>
, <data>
, <details>
, <dialog>
, <figcaption>
, <figure>
, <footer>
, <header>
, <hgroup>
, <main>
, <mark>
, <nav>
, <picture>
, <progress>
, <section>
, <source>
, <summary>
, <time>
, <track>
, <video>
, <wbr>
as well as HTML5 attributes added to existing HTML elements, such as:
<a>
, <del>
, <fieldset>
, <ins>
, <script>
The MIT License (MIT). See the LICENSE file.