Skip to content

xemlock/htmlpurifier-html5

Repository files navigation

HTML5 Definitions for HTML Purifier

Build Status Coverage Status Latest Stable Version Total Downloads License

This library provides HTML5 element definitions for HTML Purifier, compliant with the WHATWG spec.

It is the most complete HTML5-compliant solution among all based on HTML Purifier. Apart from providing the most extensive set of element definitions, it provides tidy/sanitization rules for transforming the input into a valid HTML5 output.

Installation

Install with Composer by running the following command:

composer require xemlock/htmlpurifier-html5

Usage

The most basic usage is similar to the original HTML Purifier. Create a HTML5-compatible config using HTMLPurifier_HTML5Config::createDefault() factory method, and then pass it to an HTMLPurifier instance:

$config = HTMLPurifier_HTML5Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html5 = $purifier->purify($dirty_html5);

To modify the config you can either instantiate the config with a configuration array passed to HTMLPurifier_HTML5Config::create(), or by calling set method on an already existing config instance.

For example, to allow IFRAMEs with Youtube videos you can do the following:

$config = HTMLPurifier_HTML5Config::create(array(
  'HTML.SafeIframe' => true,
  'URI.SafeIframeRegexp' => '%^//www\.youtube\.com/embed/%',
));

or equivalently:

$config = HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.SafeIframe', true);
$config->set('URI.SafeIframeRegexp', '%^//www\.youtube\.com/embed/%');

Configuration

Apart from HTML Purifier's built-in configuration directives, the following new directives are also supported:

  • Attr.AllowedInputTypes

    Version added: 0.1.12
    Type: Lookup (or null)
    Default: null

    List of allowed input types, chosen from the types defined in the spec. By default, the setting is null, meaning there is no restriction on allowed types. Empty array means that no explicit type attributes are allowed, effectively making all inputs a text inputs.

  • HTML.Forms

    Version added: 0.1.12
    Type: Boolean
    Default: false

    Whether or not to permit form elements in the user input, regardless of %HTML.Trusted value. Please be very careful when using this functionality, as enabling forms in untrusted documents may allow for phishing attacks.

  • HTML.IframeAllowFullscreen

    Version added: 0.1.11
    Type: Boolean
    Default: false

    Whether or not to permit allowfullscreen attribute on iframe tags. It requires either %HTML.SafeIframe or %HTML.Trusted to be true.

  • HTML.Link

    Version added: 0.1.12
    Type: Boolean
    Default: false

    Permit the link tags in the user input, regardless of %HTML.Trusted value. This effectively allows link tags without allowing other untrusted elements.

    If enabled, URIs in link tags will not be matched against a whitelist specified in %URI.SafeLinkRegexp (unless %HTML.SafeIframe is also enabled).

  • HTML.SafeLink

    Version added: 0.1.12
    Type: Boolean
    Default: false

    Whether to permit link tags in untrusted documents. This directive must be accompanied by a whitelist of permitted URIs via %URI.SafeLinkRegexp, otherwise no link tags will be allowed.

  • HTML.XHTML

    Version added: 0.1.12
    Type: Boolean
    Default: false

    While deprecated in HTML 4.01 / XHTML 1.0 context, in HTML5 it's used for enabling support for namespaced attributes and XML self-closing tags.

    When enabled it causes xml:lang attribute to take precedence over lang, when both attributes are present on the same element.

  • URI.SafeLinkRegexp

    Version added: 0.1.12
    Type: String
    Default: null

    A PCRE regular expression that will be matched against a <link> URI. This directive only has an effect if %HTML.SafeLink is enabled. Here are some example values: %^https?://localhost/% - Allow localhost URIs

    Use Attr.AllowedRel to control permitted link relationship types.

Supported HTML5 elements

Aside from HTML elements supported originally by HTML Purifier, this library adds support for the following HTML5 elements:

<article>, <aside>, <audio>, <bdi>, <data>, <details>, <dialog>, <figcaption>, <figure>, <footer>, <header>, <hgroup>, <main>, <mark>, <nav>, <picture>, <progress>, <section>, <source>, <summary>, <time>, <track>, <video>, <wbr>

as well as HTML5 attributes added to existing HTML elements, such as:

<a>, <del>, <fieldset>, <ins>, <script>

License

The MIT License (MIT). See the LICENSE file.