Skip to content
Advertisement

Is there a way to force a MySQL select query to resolve the having clause before the where clause?

I have a somewhat complex MySQL Select query that joins multiple tables with several left/outer joins, some COUNTs and expressions in the HAVING clause, and several expressions in the WHERE clause, which is called once each time I load a web-page in my web-app, and again using AJAX from the web-page to automatically page through the data from the database every 15 seconds. Depending on which way the user is paging, values from the table in the web-page, either the first row for backwards paging or the last row for forward paging, are sent in the AJAX call to load the previous/next page. The query expression is created using a php HereDoc that uses variable substitutions to set values that control the WHERE and HAVING clause and sorts the return data first by the votes, and then the title and artist’s name as follows:

WHERE ...
  AND ( ( `tracks`.`title`,  `artists`.`name` ) > ( ${title}, ${name} ) )
--   This line is not included in the initial query when the page is first created/loaded, but is
--   used in the AJAX called.

ORDER BY
  -- Total_Songs_Votes, Track_Sales, Track_Votes, Track_Listens, Tracks_Title,     Artists_Name
  3 DESC,               5 DESC,      4 DESC,      6 DESC,        `tracks`.`title`, `artists`.`name`

HAVING COUNT( `track_votes`.`id` )   <= ${votes}
   AND COUNT( `track_sales`.`id` )   <= ${sales}
   AND COUNT( `track_listens`.`id` ) <= ${listens}

The Select output columns are:

SELECT `artists`.`name`                AS artists_name,
       `tracks`.`title`                AS tracks_title,
       COUNT( `track_votes`.`id` ) +
         COUNT( `track_sales`.`id` ) +
         COUNT( `track_listens`.`id` ) AS 'total_song_votes', -- Order By column 3
       COUNT( `track_votes`.`id` )     AS 'track_votes',      -- Order By column 4
       COUNT( `track_sales`.`id` )     AS 'track_sales',      -- Order By column 5
       COUNT( `track_listens`.`id` )   AS 'track_listens'     -- Order By column 6

The problem is that in order control the data returned by the query the counts, above, needs to use a HAVING clause due to the COUNTs, but this is executed after the WHERE clause. So instead of ‘picking up’ from the item that certain total_song_votes, track_votes, track_sales, and track_listens, and then song name and artist name in the WHERE clause, the query first uses the WHERE clause that can’t use the votes, so only has the title and artist’s name, so eliminates too many rows before the HAVING clause in used.

How can I force the query to perform the HAVING clause filtering in the WHERE clause?

The output of the select query looks somewhat like the following:

<table border=1>
<tr>
<td bgcolor=silver class='medium'>artists_name</td>
<td bgcolor=silver class='medium'>tracks_title</td>
<td bgcolor=silver class='medium'>total_song_votes</td>
<td bgcolor=silver class='medium'>track_votes</td>
<td bgcolor=silver class='medium'>track_sales</td>
<td bgcolor=silver class='medium'>track_listens</td>
</tr>

<tr>
<td class='normal' valign='top'>DHF</td>
<td class='normal' valign='top'>T Song</td>
<td class='normal' valign='top'>4</td>
<td class='normal' valign='top'>2</td>
<td class='normal' valign='top'>2</td>
<td class='normal' valign='top'>0</td>
</tr>

<tr>
<td class='normal' valign='top'>DHF</td>
<td class='normal' valign='top'>H Song</td>
<td class='normal' valign='top'>2</td>
<td class='normal' valign='top'>1</td>
<td class='normal' valign='top'>1</td>
<td class='normal' valign='top'>0</td>
</tr>

<tr>
<td class='normal' valign='top'>DHF</td>
<td class='normal' valign='top'>A Song 2</td>
<td class='normal' valign='top'>1</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>1</td>
<td class='normal' valign='top'>0</td>
</tr>

<tr>
<td class='normal' valign='top'>DHF</td>
<td class='normal' valign='top'>A Song 1</td>
<td class='normal' valign='top'>1</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>1</td>
<td class='normal' valign='top'>0</td>
</tr>

<tr>
<td class='normal' valign='top'>DB</td>
<td class='normal' valign='top'>killer song</td>
<td class='normal' valign='top'>1</td>
<td class='normal' valign='top'>1</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
</tr>

<tr>
<td class='normal' valign='top'>DB</td>
<td class='normal' valign='top'>Kills it</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
</tr>

<tr>
<td class='normal' valign='top'>DB</td>
<td class='normal' valign='top'>scarry song</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
</tr>

<tr>
<td class='normal' valign='top'>TB</td>
<td class='normal' valign='top'>Reggae All Day</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
</tr>

<tr>
<td class='normal' valign='top'>TB</td>
<td class='normal' valign='top'>Reggae Just Today</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
</tr>

<tr>
<td class='normal' valign='top'>Howard's Band</td>
<td class='normal' valign='top'>test</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
<td class='normal' valign='top'>0</td>
</tr>
</table>

So if the first page only showed the first two rows, then the AJAX call to retrieve the next two songs would send ‘DHF’, ‘H Song’, 2, 1, 1, 0 back to the server and the WHERE and HAVING clauses would be changed to include the following:

  WHERE ...
    AND ( ( `tracks`.`title`,  `artists`.`name` ) > ( "H Song", "DHF" ) )

 HAVING COUNT( `track_votes`.`id` )   <= 1
    AND COUNT( `track_sales`.`id` )   <= 1
    AND COUNT( `track_listens`.`id` ) <= 0

So the DHF song A Song 1 and A Song 2 would both be incorrectly eliminated.

The goal here is to retrieve the next two songs* after the last one shown on the web-page, using the voting values, the song title, and artists’ name, which need to be taken in that order using the WHERE clause. As I said, above, with the WHERE clause first only considering the song title and artist’s name, then songs that have lower votes but higher titles and names are incorrectly eliminated.

  • I’m using only two rows here because it is easier to show the data for this example, in the real page 10 items are displayed at a time, but that doesn’t really affect anything except for where the data breaks between the pages.

Please note that I expect to have a large quantity of items returned from the query, so I’d like first have the the query correctly remove as many rows before removing the remaining rows using php logic. Also note, that the web-page automatically re-queries the database by making AJAX cause every 15 seconds and that the same page could be displayed to multiple users but not necessarily all seeing the same data as some could be paging backwards through the data or playing a song and no longer paging at the moment, so I want this query to be very efficient and not re-query the whole database each time it is run, but rather have it pick up from after the last results for each user, independently.

FOLLOW-UP

Astax suggested that I enclose my query in a SELECT * FROM (…) query and then do the filtering there.

I left the HAVING clause in the inner SELECT so that it performs the voting filtering first and doesn’t retrieve any more data to the outer SELECT * FROM … then it has too. I really don’t want to have to wade through the entire dataset more than necessary.

Actually, I moved the track title and artist’s name to the outer SELECT’s WHERE clause, but then found that it still cause too many rows to be elminated:

WHERE ( ( `tracks`.`title`,  `artists`.`name` ) > ( ${title}, ${name} ) )

So I changed the this expression to:

WHERE ( ( `tracks`.`title`,  `artists`.`name` ) not in
            ( ( ${title0}, ${name0} ), ... ( ${title9}, ${name9} ) ) )

Where the title0, name0 through title9, $name9 variables are substituted with the ten pares of titles and names shown on the page being paged-out before the SQL query is submitted to the database engine.

The problem, however, is that if the same votes are the same over more than ten songs (one web-page of songs), then the songs from previous pages will show again and paging will stop working.

This issue makes using the votes, song titles, and artist’s name insufficient as a real-world solution. I need something else that I can use to track my page breaks that still allows changes made in the server’s database to be seen without a lot of overhead in terms of temporary tables that either live on the user’s device or on the server.

Any ideas on how I can manage paging and still support dynamic data as I described?

Thanks

Advertisement

Answer

The main reason why HAVING exists is to apply some filters after WHERE conditions and all aggregations. So the answer is no, you can’t make it run before WHERE.

However, you may:

  • Use temporary tables. Put a result of one select into a temporary table, add more data from other selects and then run a final query.
  • Use nested selects, something like SELECT ... FROM (SELECT .... WHERE ... HAVING ....) as t WHERE ... HAVING ...

Update:

You don’t need to use conditions in order to implement pagination. Use LIMIT statement – this is what it’s made for.

SELECT `artists`.`name`                AS artists_name,
       `tracks`.`title`                AS tracks_title,
       COUNT( `track_votes`.`id` ) +
         COUNT( `track_sales`.`id` ) +
         COUNT( `track_listens`.`id` ) AS 'total_song_votes', -- Order By column 3
       COUNT( `track_votes`.`id` )     AS 'track_votes',      -- Order By column 4
       COUNT( `track_sales`.`id` )     AS 'track_sales',      -- Order By column 5
       COUNT( `track_listens`.`id` )   AS 'track_listens'     -- Order By column 6
LIMIT {$page_size} OFFSET {$page_start}

Or, if you have some strongs reasons to use conditions, you can use nested query:

SELECT * FROM 
   (SELECT `artists`.`name`                AS artists_name,
       `tracks`.`title`                AS tracks_title,
       COUNT( `track_votes`.`id` ) +
         COUNT( `track_sales`.`id` ) +
         COUNT( `track_listens`.`id` ) AS 'total_song_votes', -- Order By column 3
       COUNT( `track_votes`.`id` )     AS 'track_votes',      -- Order By column 4
       COUNT( `track_sales`.`id` )     AS 'track_sales',      -- Order By column 5
       COUNT( `track_listens`.`id` )   AS 'track_listens'     -- Order By column 6
   ) AS t
WHERE (t.track_title, t.artist_name, t.total_song_votes...) > ({$title}, {$name}, {$votes}...)
LIMIT {$page_size}

You may still add WHERE to the inner query for non-groupped fields to process less data. Just use not strict conditions – >= or <= instead of > and <. Further filtering will be done in the outer select.

But I repeat again – check the output of EXPLAIN for your query. Very likely you’re not saving anything by using conditions instead of LIMIT.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement