WordPress Escape Functions

The process of escaping data an important one, since the lack of thereof can lead to XSS and other naughty and unexpected things, among the legit data that just breaks specific data formats.

WordPress Escape Functions

Consider HTML attributes. Imagine you have the following simple code:

$image_src = get_uploaded_image_src(); // not any specific function
echo '<img src="' . $image_src . '" />';

What if the uploaded image is called “Horizons” by LTJ Bukem.jpg? You end up with broken HTML: <img src=""Horizons" by LTJ Bukem.jpg" />… not to worry though, WordPress comes a dozen escape functions for taking care of all these sorts of things. However, with the myriad of escaping functions provided in WordPress, it is often times difficult to remember which is which and whether there is an escape function for a specific case.

esc_attr

    $attr = '"there\'s nothing going on in here? Is there? >_<"';
    echo esc_attr($attr);
    // &quot;there&#039;s nothing going on in here? Is there? &gt;_&lt;&quot;

The esc_attr function escapes content that is to be contained inside HTML attributes. titles, rels, etc.

esc_url and esc_url_raw

    $attr = 'https://inval1d.com?one=490&t"""\\\o=-1&c\'ontent=<<<ONE>>!&amp;%00#one>';
    echo esc_url($attr);
    // https://inval1d.com?one=490&#038;to=-1&#038;c&#039;ontent=ONE!&#038;%00#one
    echo esc_url_raw($attr);
    // https://inval1d.com?one=490&to=-1&c'ontent=ONE!&amp;%00#on

esc_url escapes a URL for display on pages. Invalid characters are simply stripped out, the others: a-z A-Z 0-9 - _ ~ : / ? # [ ] @ ! $ & ' ( ) * + , . ; = % are encoded into valid HTML entities (no, they’re not URL encoded, you have to do that yourself).

esc_url_raw wraps around esc_url but does not encode HTML entities, and is not meant to return data that can be safely displayed on pages. The function strips invalid URLs for storage.

The two functions do not allow URLs that have non-whitelisted schemes. The default schemes/protocols that are allowed are: ‘http’, ‘https’, ‘ftp’, ‘ftps’, ‘mailto’, ‘news’, ‘irc’, ‘gopher’, ‘nntp’, ‘feed’, ‘telnet’, ‘mms’, ‘rtsp’, ‘svn’ (no, no ‘magnet’ or ‘file’ protocols by default).

So, which one do you put as your href attribute in a link? esc_url encodes the entities, so an & is transformed into an &amp;. A user-fed URL should probably be escaped as esc_url_raw( $url );, to filter invalid URL characters and protocols, additionally esc_attr should be used to further encode the attribute as per HTML specification. (see StackOverflow: Do I encode Ampersands in a href?

esc_html

    $attr = '<div class="the" rel="quick" onclick="brown(\'fox\')">jumped over...</div>';
    echo esc_html($attr);
    // &lt;div class=&quot;the&quot; rel=&quot;quick&quot; onclick=&quot;brown(&#039;fox&#039;)&quot;&gt;jumped over...&lt;/div&gt;

esc_html is simple, it escapes any and all HTML, letting the browser render it instead of interpreting it. This is particularly useful for outputting code samples, especially those that come from the outside, via comments, etc.

esc_textarea

    $attr = 'This is some very nasty <script type="text/javascript">alert("XSS");</script> here!';
    echo '<textarea>'.esc_textarea($attr).'</textarea>'; // make it safe

esc_textarea is another important function, although not as convoluted and complex as esc_attr. esc_textarea sanitizes anything that is going to be displayed in a textarea element (enabled or disabled) and is similar to esc_html. esc_textarea uses htmlspecialchars().

esc_js

    $attr = "if ( !confirm('Are you sure you want to do this?') ) return false; alert('Done!');";
    echo '<a href="#" onclick="alert(\'The payload: '.esc_js($attr).'\');">clickme</a>';

esc_js escapes all sorts of quote manipulations in strings that can lead to broken JavaScript strings. In order for this function to work the string has to be enclosed in single quotes. It can sometimes get confusing, especially when you’re echoing the JavaScript from PHP.

This function will not escape jQuery selectors like these jQuery('input[name=array\[...\]]'). It only escapes single-quoted strings.

esc_sql and like_escape

This is a convenient wrapper around the global $wpdb and its escape() method. Escapes SQL. esc_sql does NOT escape LIKE statements, an additional like_escape is available.

tag_escape and sanitize_html_class

tag_escape replaces anything other than a-zA-Z0-9_:, the set of valid HTML tag characters. sanitize_html_class does a similar operation on HTML classes, filtering out invalid stuff.


There are other escape functions that are used internally by WordPress, for key, username, title, filename sanitization. These can be used by themes and plugins; most are found inside wp-includes/formatting.php.

This escaping is quite confusing, isn’t it? Further contributing to the confusion is the fact that many built-in data generation methods may already escape data, like get_blogaddress_by_id. Ultimately, it’s up to you to check and sanitize/escape if necessary. And remember, future versions may remove built-in escaping from a function that you’re not escaping… 😕

So, when was the last time you used esc_url inside a href attribute?