I have revisited this problem many times, and I have never really found a proper answer.
Is it possible to perform a MySQL search which returns ACTUAL accurately sorted results by relevancy?
I am trying to create an ajax search form which makes suggestions as the user types into an input field, and have found no decent solution to this using only pure MySQL queries. I know there are search servers available such as ElasticSearch, I want to know how to do it with a raw MySQL query only.
I have a table of school subjects. There are less than 1200 rows and this will never change. Let’s perform a basic FULLTEXT search where the user starts typing “Bio”.
Query (“Bio…”) – FULLTEXT BOOLEAN MODE
SELECT name, MATCH(name) AGAINST('bio*' IN BOOLEAN MODE) AS relevance FROM subjects WHERE MATCH(name) AGAINST('bio*' IN BOOLEAN MODE) ORDER BY relevance DESC LIMIT 10
Results
name | relevance -------------------------------------------------------- Biomechanics, Biomaterials and Prosthetics | 1 Applied Biology | 1 Behavioural Biology | 1 Cell Biology | 1 Applied Cell Biology | 1 Developmental/Reproductive Biology | 1 Developmental Biology | 1 Reproductive Biology | 1 Environmental Biology | 1 Marine/Freshwater Biology | 1
To show how bad these results are, here is a comparison with a simple LIKE
query which shows all the more relevant results which weren’t shown:
Query (“Bio…”) – LIKE
SELECT id, name WHERE name LIKE 'bio%' ORDER BY name
Results
name | relevance -------------------------------------------------------- Bio-organic Chemistry | 1 Biochemical Engineering | 1 Biodiversity | 1 Bioengineering | 1 Biogeography | 1 Biological Chemistry | 1 Biological Sciences | 1 Biology | 1 Biomechanics, Biomaterials and Prosthetics | 1 Biometry | 1
And already you see how many subjects are not suggested, even though these are more likely what the user will be looking for.
The problem with using LIKE
however, is how to search across multiple words and in the middle of words like FULLTEXT
does.
The basic ordering I would want to implement is something like:
- First words starting with the search term
- Second words starting with the search term
- Words where the term is not at the start of the words
- Everything generally alphabetical if not further relevant
So my question is, how does one go about getting a sensibly sorted list of suggestions for the user with a MySQL search across multiple words?
Advertisement
Answer
You could use string functions, such as:
select id, name from subjects where name like concat('%', @search, '%') order by name like concat(@search, '%') desc, ifnull(nullif(instr(name, concat(' ', @search)), 0), 99999), ifnull(nullif(instr(name, @search), 0), 99999), name;
This gets you all entries containing @search. First those that have it at the beginning, then those that have it after a blank, then by the position of the occurrence, then alphabetical.
name like concat(@search, '%') desc
uses MySQL’s boolean logic by the way. 1 = true, 0 = false, so ordering this descending gives you true first.
SQL fiddle: http://sqlfiddle.com/#!9/c6321a/1