What `the_content` goes through

the_content is one of the most known WordPress template tags. It wraps itself around the get_the_content tag (a little lower in the source) and applies the the_content filter to it.

the_content filters in WordPress

the_content filter applies at least 10 default filter functions to the content before displaying it. WordPress post content is usually just altered here and there, not too drastically (except shortcodes, of course), and sometimes you just have to know what to expect when displaying filtered content.

WP_Embed::run_shortcode()

This method removes all registered shortcodes temporarily, processes the content, inflating codes into HTML, a reactivates all shortcodes again. This is done so that the is run before wp_autop. It’s run alone, once.

WP_Embed::autoembed()

Another WP_Embed method that runs after run_shortcode (for in-depth information on how and why some filters run before others check my Inside WordPress Actions and Filters article). Auto-embed, if enabled in the Settings/Media section of the Dashboard, will try to inflate URLs which are on their own line into HTML code that is interactive, like YouTube’s video player.

wptexturize()

wptexturize transforms some of the less-beautiful characters in text into more eye-pleasing variants. Single quotes, double-quotes, trademark ™ symbols are some, among many characters that are enhanced. wptexturize does not touch pre, script and other HTML tags for obvious reasons.

This is sometimes one of my least loved manipulations. When copying and pasting text from a WordPress page into a plaintext context (something I don’t do often) these “nice” characters really get in the way. There are a lot of tutorials out there on how to disable wptexturize, here’s one of them.

convert_smilies()

:mrgreen: 😐 😈 ➡ 😯 🙂 😕 😎 👿 😀 💡 😳 😛 🙄 😉 😥 😮 😆 😡 🙁 😎 😯 🙁 🙂 😕 😀 😛 😮 😡 😐 😉 8) 😯 🙁 🙂 😕 😀 😛 😮 😡 😐 😉 ❗ ❓

Yes, you guessed it. This filter adds life to your content by adding smilies. convert_smilies has the content run through only if the appropriate Settings/Writing setting is set to true (it is by default).

And you’ll notice that by pasting all the icons I’ve found a bug in the beast of a regular expression that turns everything into magic:

/(?:\s|^);(?:\-\)|\))|(?:\s|^)\:(?:\||x|wink\:|twisted\:|smile\:|shock\:|sad\:|roll\:|razz\:|oops\:|o|neutral\:|mrgreen\:|mad\:|lol\:|idea\:|grin\:|evil\:|eek\:|cry\:|cool\:|arrow\:|P|D|\?\?\?\:|\?\:|\?|\-\||\-x|\-o|\-P|\-D|\-\?|\-\)|\-\(|\)|\(|\!\:)|(?:\s|^)8(?:O|\-O|\-\)|\))(?:\s|$)/m

A ticket has been submitted at the time of writing.

convert_chars()

convert_chars() (Codex entry) translates invalid Unicode references to valid ones, and transforms some metadata tags to conform to XHTML. Nothing too special.

wpautop()

wpautop wraps line-broken plaintext paragraphs into <p> tags. Sounds easy, but the amount of work wpautop does is quite spectacular with lots of mixing and matching.

shortcode_unautop()

Another formatting function that tries to remove paragraph tags from single-standing shortcodes in the content. If this filter is not applied, shortcodes may be wrapped into unwanted and unexpected paragraphs.

prepend_attachment()

This function works on attachment post types only. It tries to “show the medium sized image representation of the attachment if available, and link to the raw file“.

capital_P_dangit()

Not “wordpress“, “WordPress“, “wordPress” or even “wOrDpreSS“, it’s “WordPress”, with a capital “P”, dangit! This function makes sure your content doesn’t say “WordPress” in the wrong character case.

Here’s are some great resources in order to understand the obsession:

do_shortcode()

Last (but not usually least) the whole content is swished through the do_shortcode function. It inflates any validly formatted shortcodes in the content by calling registered shortcode handlers. It’s simple.

…it’s far from over

It’s usually far from over. Many themes and plugins add more hooks to the_content, so it may be modified and remodified dozens of times before finally making it onto the page.

If you’re brave enough and want to get the raw content stored in the database use get_the_content, which does not apply any filters to the content. You’ll quite frequently see code accessing the $post->post_content property as well outside of the loop, where $post has a different, non-global meaning.