WordPress Escape Functions
The process of escaping data an important one, since the lack of thereof can lead to XSS and other naughty and unexpected things, among the legit data that just breaks specific data formats.
Consider HTML attributes. Imagine you have the following simple code:
$image_src = get_uploaded_image_src(); // not any specific function echo '<img src="' . $image_src . '" />';
What if the uploaded image is called “Horizons” by LTJ Bukem.jpg? You end up with broken HTML:
<img src=""Horizons" by LTJ Bukem.jpg" />… not to worry though, WordPress comes a dozen escape functions for taking care of all these sorts of things. However, with the myriad of escaping functions provided in WordPress, it is often times difficult to remember which is which and whether there is an escape function for a specific case.
$attr = '"there\'s nothing going on in here? Is there? >_<"'; echo esc_attr($attr); // "there's nothing going on in here? Is there? >_<"
esc_attr function escapes content that is to be contained inside HTML attributes.
esc_url and esc_url_raw
$attr = 'https://inval1d.com?one=490&t"""\\\o=-1&c\'ontent=<<<ONE>>!&%00#one>'; echo esc_url($attr); // https://inval1d.com?one=490&to=-1&c'ontent=ONE!&%00#one echo esc_url_raw($attr); // https://inval1d.com?one=490&to=-1&c'ontent=ONE!&%00#on
esc_url escapes a URL for display on pages. Invalid characters are simply stripped out, the others:
a-z A-Z 0-9 - _ ~ : / ? # [ ] @ ! $ & ' ( ) * + , . ; = % are encoded into valid HTML entities (no, they’re not URL encoded, you have to do that yourself).
esc_url_raw wraps around
esc_url but does not encode HTML entities, and is not meant to return data that can be safely displayed on pages. The function strips invalid URLs for storage.
The two functions do not allow URLs that have non-whitelisted schemes. The default schemes/protocols that are allowed are: ‘http’, ‘https’, ‘ftp’, ‘ftps’, ‘mailto’, ‘news’, ‘irc’, ‘gopher’, ‘nntp’, ‘feed’, ‘telnet’, ‘mms’, ‘rtsp’, ‘svn’ (no, no ‘magnet’ or ‘file’ protocols by default).
So, which one do you put as your
href attribute in a link?
esc_url encodes the entities, so an
& is transformed into an
&. A user-fed URL should probably be escaped as
esc_url_raw( $url );, to filter invalid URL characters and protocols, additionally
esc_attr should be used to further encode the attribute as per HTML specification. (see StackOverflow: Do I encode Ampersands in a href?
$attr = '<div class="the" rel="quick" onclick="brown(\'fox\')">jumped over...</div>'; echo esc_html($attr); // <div class="the" rel="quick" onclick="brown('fox')">jumped over...</div>
esc_html is simple, it escapes any and all HTML, letting the browser render it instead of interpreting it. This is particularly useful for outputting code samples, especially those that come from the outside, via comments, etc.
esc_textarea is another important function, although not as convoluted and complex as
esc_textarea sanitizes anything that is going to be displayed in a textarea element (enabled or disabled) and is similar to
$attr = "if ( !confirm('Are you sure you want to do this?') ) return false; alert('Done!');"; echo '<a href="#" onclick="alert(\'The payload: '.esc_js($attr).'\');">clickme</a>';
This function will not escape jQuery selectors like these
jQuery('input[name=array\[...\]]'). It only escapes single-quoted strings.
esc_sql and like_escape
This is a convenient wrapper around the global
$wpdb and its
escape() method. Escapes SQL.
esc_sql does NOT escape
LIKE statements, an additional
like_escape is available.
tag_escape and sanitize_html_class
There are other escape functions that are used internally by WordPress, for key, username, title, filename sanitization. These can be used by themes and plugins; most are found inside wp-includes/formatting.php.
This escaping is quite confusing, isn’t it? Further contributing to the confusion is the fact that many built-in data generation methods may already escape data, like
get_blogaddress_by_id. Ultimately, it’s up to you to check and sanitize/escape if necessary. And remember, future versions may remove built-in escaping from a function that you’re not escaping…
So, when was the last time you used
esc_url inside a