Remove Duplicate WordPress Category Pages

A rather peculiar problem has come my attention yesterday. WordPress has an infinite amount of category pages that are absolutely identical. For example:

This happens for all hierarchical taxonomies, and here is why. Basically, WordPress only queries the last term in the path.

You might not like this at all, for one reason or another (mainly for duplicate content reasons), so here’s a snippet of code that forces a stricter taxonomy query for hierarchical objects. It returns a 404 on archive queries that do not strictly match all the terms.

add_filter( 'parse_tax_query', function( $wp ) {
	$extra_queries = array();
	foreach ( get_taxonomies( array( 'hierarchical' => '1' ), 'object' ) as $taxonomy ) {
		if ( !$taxonomy->rewrite['hierarchical'] )
			continue; /** Not a hierarchical rewrite */
		if ( empty( $wp->query[$taxonomy->query_var] ) )
			continue; /** Not this query */
		$terms = explode( '/', $wp->query[$taxonomy->query_var] );
		array_pop( $terms ); /** Basename already exists in the tax_query */
		foreach ( $wp->tax_query->queries as &$query ) {
			if ( $query['taxonomy'] == $taxonomy->name ) {
				foreach ( $terms as $term ) {
					$extra_query = array_merge( $query, array() );
					$extra_query['terms'] = array( $term );
					$extra_queries []= $extra_query;
				}
			}
		}
		$wp->tax_query->queries = array_merge( $wp->tax_query->queries, $extra_queries );
	}
} );

Here’s how the code above works. We get a list of hierarchical taxonomies that have a hierarchical rewrite rule, find all the parent terms of the request for the taxonomy and append it as an extra taxonomy query to the tax_query. This makes sure that only posts that are in all of the terms in the query are returned.