Cleaning Up Bot Registrations in WooCommerce
…or Optimizing Slow Sub-Queries in WordPress
Bot registrations are a nuisance in many WooCommerce sites. Cleaning them up seems to be a trivial task: just delete all users without a placed order from a month ago and backwards.
select * from wp_users where user_registered < "2020-07-01 00:00:00"; 47665 rows in set (0.06 sec) select meta_value from wp_postmeta where meta_value = '_customer_user'; 51253 rows in set (0.44 sec)
Okay, so we almost 50 thousand customers and a bit over 50 thousand orders.
The query to delete all the users that have no order is seemingly a simple one:
delete from wp_users where user_registered < "2020-07-01 00:00:00" and id not in (select meta_value from wp_postmeta where meta_key = '_customer_user' group by meta_value);
Great. Yet there's a huge issue: Query OK, (59 min 7.22 sec)
Ooomph! This won't effing do!
Why the heck would it be so slow?
+------+--------------------+-------------+------+---------------+----------+---------+-------+-------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+--------------------+-------------+------+---------------+----------+---------+-------+-------+----------+-------------+ | 1 | PRIMARY | wp_users | ALL | NULL | NULL | NULL | NULL | 53102 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | wp_postmeta | ref | meta_key | meta_key | 767 | const | 87152 | 100.00 | Using where | +------+--------------------+-------------+------+---------------+----------+---------+-------+-------+----------+-------------+
Obviously because the subquery is run every single time for each user. So 0.1 sec * 50k = 5 thousand seconds
pretty much. So no surprise why we get a query that is running for an hour or so.
What can we do? Caching.
Unfortunately built-in subquery caching does not work in the above case. DEPENDENT SUBQUERY
is reevaluated every single time.
This is because the query optimizer transforms a NOT IN
construct into a NOT EXISTS ... WHERE
construct. The "beauty" of antijoins.
show status where Variable_name like 'Subquery%'; +---------------------+-------+ | Variable_name | Value | +---------------------+-------+ | Subquery_cache_hit | 0 | | Subquery_cache_miss | 1927 | +---------------------+-------+
The query itself is pretty lightweight when selecting:
select id from wp_users where id not in (select id from wp_users left join wp_postmeta on meta_value=id where user_registered < "2020-07-01 00:00:00" and meta_key = '_customer_user' union select id from wp_users where user_registered >= "2020-07-01 00:00:00");
But to actually run a delete query you need to alias the subquery. Confusing.
delete from wp_users where id not in (select id from (select id from wp_users u left join wp_postmeta on meta_value=u.id where user_registered < "2020-07-01 00:00:00" and meta_key = '_customer_user' union select id from wp_users where user_registered >= "2020-07-01 00:00:00" union select user_id from wp_usermeta where meta_key = 'wp_capabilities' and meta_value like '%admin%' ) as u);
Done.
Query OK, 46140 rows affected (1.73 sec)
Bye.