Skip to content
Advertisement

Having trouble optimizing MySQL query with GROUP BY … HAVING

I’m trying to optimize quickly optimize the search functionality of some outdated forum software written in PHP. I’ve got my work down to a query that looks like this:

JavaScript

word1 and word2 are examples; there could be any number of words. The number at the very end of the query is the total number of words. The idea is that a thread most contain all words in the search query, spread out over any number of posts.

This query often exceeds 60 seconds with only two words, and times out. I’m stumped; I can’t figure out how to further optimize this horrid search engine.

As far as I can tell, everything is indexed properly, and I’ve run ANALYZE recently. Most of the database is running on InnoDB. Here’s the output of EXPLAIN:

JavaScript

Update

LIMIT 25 doesn’t seem to be helping much. It shaves off maybe second from a query that normally returns hundreds of results.

Clarification

The part that’s slowing down MySQL is the GROUP BY ... HAVING ... bit. With GROUP BY, the LIMIT is pretty much useless for improving performance. Without GROUP BY, and as long as the LIMIT remains, the queries are quite speedy.

SQL Info

Output of SHOW CREATE TABLE postindex;:

JavaScript

I didn’t make the table, so I have no idea why there’s a duplicate index on wordid; however, I’m not willing to delete it, since this is ancient, fickle software.

Advertisement

Answer

You can try several rewrites and compare execution plan and times.

Using 2 EXISTS subqueries (one for each word to be checked):

JavaScript

Using one EXISTS subquery:

JavaScript

A single query with many joins and GROUP BY only to remove the duplicate threadid:

JavaScript
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement