I am coding an app for a friend, and SQL is not my strong suit. I thought that I had laid this matter to rest with my previous question, which received an excellent answer.
However, my friend has moved the goal posts yet again (and swears that it is final this time).
Given these tables
mysql> describe skill_names; +------------+----------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+----------+------+-----+---------+----------------+ | skill_id | int(11) | NO | PRI | NULL | auto_increment | | skill_name | char(32) | NO | MUL | NULL | | +------------+----------+------+-----+---------+----------------+ mysql> describe skill_usage; +----------+---------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+---------+------+-----+---------+-------+ | skill_id | int(11) | NO | MUL | NULL | | | job_id | int(11) | NO | MUL | NULL | | +----------+---------+------+-----+---------+-------+ mysql> describe jobs; +--------------+---------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------+---------+------+-----+---------+----------------+ | job_id | int(11) | NO | PRI | NULL | auto_increment | | candidate_id | int(11) | NO | MUL | NULL | | | company_id | int(11) | NO | MUL | NULL | | | start_date | date | NO | MUL | NULL | | | end_date | date | NO | MUL | NULL | | +--------------+---------+------+-----+---------+----------------+ 5 rows in set (0.00 sec)
The user inputs a text for skill search.
Some examples might be
- C
- C or C++
- C++ AND UML
- (C AND kernel) OR (C++ AND UML)
And I want to parse that in PHP and generate an appropriate SQL query. Something along the lines of the answer to my previous question, but it will need to be tweaked.
Previously, I had believed that if the user input two skills and ANDed them e.g C++ and UML
, then I should return details of any job where BOTH skills were used.
Now, he says that he just wants candidates who have used both of those skills (hence the AND), but not necessarily on the same job. To clarify, my PHP API will return an array of candidates, each entry of which will have an array of jobs.
I want to tweak the code in the answer to my previous question to achieve this. I wondered if I would probably end up doing the whole thing in PHP, with a bunch of nested FOR loops, rather than offloading it onto the SQL engine.
Update
I can’t find a good fiddle site that allows both PHP and MySQL.
If anyone wants to test their own code on localhost, the main parts of the test database look like this:
Test search 3.1.5 (Python AND UML) OR (C++ OR UML)
Candidate name Company Job year Skills Overall match One Thales 2015 C No One BAe 2016 Python No One Google 2017 C++ No Two BAe 2015 C++ Yes Two Google 2020 Python Yes Two Google 2011 C++, UML Yes Three Thales 2019 Python, UML Yes
and here is an SQL script to create the test database:
CREATE TABLE `candidates` ( `candidate_id` INT(11) NOT NULL AUTO_INCREMENT, `candidate_name` CHAR(50), `candidate_city` CHAR(50), `latitude` DECIMAL(11,8), `longitude` DECIMAL(11,8), PRIMARY KEY (candidate_id)); CREATE TABLE `companies` ( `company_id` INT(11) NOT NULL AUTO_INCREMENT, `company_name` CHAR(50) NOT NULL, `company_city` CHAR(50) NOT NULL, `company_post_code` CHAR(50) NOT NULL, `latitude` DECIMAL(11,8) NOT NULL, `longitude` DECIMAL(11,8) NOT NULL, PRIMARY KEY (company_id)); CREATE TABLE `jobs` ( `job_id` INT(11) NOT NULL AUTO_INCREMENT, `candidate_id` INT(11) NOT NULL, `company_id` INT(11) NOT NULL, `start_date` DATE NOT NULL, `end_date` DATE NOT NULL, PRIMARY KEY (job_id)); CREATE TABLE `skill_names` ( `skill_id` INT(11) NOT NULL AUTO_INCREMENT, `skill_name` CHAR(32) NOT NULL, PRIMARY KEY (skill_id)); CREATE TABLE `skill_usage` ( `skill_id` INT(11) NOT NULL, `job_id` INT(11) NOT NULL); INSERT INTO `skill_names` (skill_name) VALUES("C"); INSERT INTO `skill_names` (skill_name) VALUES("Python"); INSERT INTO `skill_names` (skill_name) VALUES("C++"); INSERT INTO `skill_names` (skill_name) VALUES("UML"); INSERT INTO `candidates` (candidate_name, candidate_city, latitude, longitude ) VALUES("One", "Hastings", 50.8543, 0.5735); INSERT INTO `candidates` (candidate_name, candidate_city, latitude, longitude ) VALUES("Two", "Slough", 51.5105, 0.5950); INSERT INTO `candidates` (candidate_name, candidate_city, latitude, longitude ) VALUES("Three", "Stonehenge", 51.1789, -1.8262); INSERT INTO `companies` (company_name, company_city, company_post_code, latitude, longitude ) VALUES("Thales", "Crawley", "AB1 1CD", 51.1091, -0.1872); INSERT INTO `companies` (company_name, company_city, company_post_code, latitude, longitude ) VALUES("BAe", "Rochester", "EF1 2GH", 51.3880, 0.5067); INSERT INTO `companies` (company_name, company_city, company_post_code, latitude, longitude ) VALUES("Google", "East Ham", "E6 0XX", 51.5334, 0.0499); INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(1, 1, "2015-01-010", "2015-12-31"); INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(1, 2, "2016-01-010", "2016-12-31"); INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(1, 3, "2017-01-010", "2017-12-31"); INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(2, 2, "2015-01-010", "2015-12-31"); INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(2, 3, "2020-01-010", "2020-12-31"); INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(2, 3, "2011-01-010", "2011-12-31"); INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(3, 1, "2019-01-010", "2019-12-31"); INSERT INTO `skill_usage` (job_id, skill_id) VALUES(1, 1); INSERT INTO `skill_usage` (job_id, skill_id) VALUES(2, 2); INSERT INTO `skill_usage` (job_id, skill_id) VALUES(3, 3); INSERT INTO `skill_usage` (job_id, skill_id) VALUES(4, 3); INSERT INTO `skill_usage` (job_id, skill_id) VALUES(5, 2); INSERT INTO `skill_usage` (job_id, skill_id) VALUES(6, 3); INSERT INTO `skill_usage` (job_id, skill_id) VALUES(6, 4); INSERT INTO `skill_usage` (job_id, skill_id) VALUES(7, 2); INSERT INTO `skill_usage` (job_id, skill_id) VALUES (7, 4);
Advertisement
Answer
Example of working query in DB fiddle: https://www.db-fiddle.com/f/rQKazPgbtGS766WEiuiXyR/0
$string = '(C AND kernel) OR (C++ AND UML)'; //Adding spaces between parentheses to accept inputs without that space as well $string = str_replace('(', ' ( ', $string); $string = str_replace(')', ' ) ', $string); //Splitting input into separate strings, taken from previous question $tokens = preg_split('/[s]+/', $string); $query = ''; $args = []; $n = 0; //Transcripting all strings separately and each skill name into FIND_IN_SET function foreach ($tokens as $tok) { switch ($tok) { case '': # skip empty tokens case ';': # No, you should not! case '"': case "'": case ';': break; case '(': $query .= '('; break; case ')': $query .= ')'; break; case '&': case 'AND': $query .= ' AND '; break; case '|': case 'OR': $query .= ' OR '; break; case '!': case 'NOT': $query .= ' NOT '; break; default: $arg = 'arg_' . $n; $args[$arg] = $tok; $query .= "FIND_IN_SET(':{$arg}', skills)"; $n++; break; } } //Query grouping all of the skills used by candidates returning candidates //with grouped column skills with HAVING clause containing all of the //conditions they must match. So query returns only candidates who have ever //used desired skills. $sql = "SELECT candidate_id, GROUP_CONCAT(DISTINCT skill_names.skill_name) as skills FROM jobs LEFT JOIN skill_usage ON skill_usage.job_id = jobs.job_id LEFT JOIN skill_names ON skill_names.skill_id = skill_usage.skill_id GROUP BY candidate_id HAVING {$query} ORDER BY candidate_id DESC";
Example output for (C AND kernel) OR (C++ AND UML)
:
SELECT candidate_id, GROUP_CONCAT(DISTINCT skill_names.skill_name) as skills FROM jobs LEFT JOIN skill_usage ON skill_usage.job_id = jobs.job_id LEFT JOIN skill_names ON skill_names.skill_id = skill_usage.skill_id GROUP BY candidate_id HAVING (FIND_IN_SET('C', skills) AND FIND_IN_SET('kernel', skills)) OR (FIND_IN_SET('C++', skills) AND FIND_IN_SET('UML', skills)) ORDER BY candidate_id DESC
Array of arguments:
Array ( [arg_0] => C [arg_1] => kernel [arg_2] => C++ [arg_3] => UML )