Skip to content
Advertisement

Parsing a boolean expression into a MySQL query in PHP – part 2

I am coding an app for a friend, and SQL is not my strong suit. I thought that I had laid this matter to rest with my previous question, which received an excellent answer.

However, my friend has moved the goal posts yet again (and swears that it is final this time).

Given these tables

mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field      | Type     | Null | Key | Default | Extra          |
+------------+----------+------+-----+---------+----------------+
| skill_id   | int(11)  | NO   | PRI | NULL    | auto_increment |
| skill_name | char(32) | NO   | MUL | NULL    |                |
+------------+----------+------+-----+---------+----------------+

mysql> describe skill_usage;
+----------+---------+------+-----+---------+-------+
| Field    | Type    | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO   | MUL | NULL    |       |
| job_id   | int(11) | NO   | MUL | NULL    |       |
+----------+---------+------+-----+---------+-------+

mysql> describe jobs;
+--------------+---------+------+-----+---------+----------------+
| Field        | Type    | Null | Key | Default | Extra          |
+--------------+---------+------+-----+---------+----------------+
| job_id       | int(11) | NO   | PRI | NULL    | auto_increment |
| candidate_id | int(11) | NO   | MUL | NULL    |                |
| company_id   | int(11) | NO   | MUL | NULL    |                |
| start_date   | date    | NO   | MUL | NULL    |                |
| end_date     | date    | NO   | MUL | NULL    |                |
+--------------+---------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

The user inputs a text for skill search.

Some examples might be

  • C
  • C or C++
  • C++ AND UML
  • (C AND kernel) OR (C++ AND UML)

And I want to parse that in PHP and generate an appropriate SQL query. Something along the lines of the answer to my previous question, but it will need to be tweaked.

Previously, I had believed that if the user input two skills and ANDed them e.g C++ and UML, then I should return details of any job where BOTH skills were used.

Now, he says that he just wants candidates who have used both of those skills (hence the AND), but not necessarily on the same job. To clarify, my PHP API will return an array of candidates, each entry of which will have an array of jobs.

I want to tweak the code in the answer to my previous question to achieve this. I wondered if I would probably end up doing the whole thing in PHP, with a bunch of nested FOR loops, rather than offloading it onto the SQL engine.

Update

I can’t find a good fiddle site that allows both PHP and MySQL.

If anyone wants to test their own code on localhost, the main parts of the test database look like this:

Test search 3.1.5 (Python AND UML) OR (C++ OR UML)

Candidate name  Company Job year    Skills     Overall match
One             Thales      2015    C           No
One             BAe         2016    Python      No
One             Google      2017    C++         No
Two             BAe         2015    C++         Yes
Two             Google      2020    Python      Yes
Two             Google      2011    C++, UML    Yes
Three           Thales      2019    Python, UML Yes

and here is an SQL script to create the test database:

CREATE TABLE `candidates` (
  `candidate_id` INT(11) NOT NULL AUTO_INCREMENT,
  `candidate_name` CHAR(50),
  `candidate_city` CHAR(50),
  `latitude` DECIMAL(11,8),
  `longitude` DECIMAL(11,8),
     PRIMARY KEY (candidate_id));

CREATE TABLE `companies` (
  `company_id` INT(11) NOT NULL AUTO_INCREMENT,
  `company_name` CHAR(50) NOT NULL,
  `company_city` CHAR(50) NOT NULL,
  `company_post_code` CHAR(50) NOT NULL,
  `latitude` DECIMAL(11,8) NOT NULL,
  `longitude` DECIMAL(11,8) NOT NULL,
     PRIMARY KEY (company_id));

CREATE TABLE `jobs` (
  `job_id` INT(11) NOT NULL AUTO_INCREMENT,
  `candidate_id` INT(11) NOT NULL,
  `company_id` INT(11) NOT NULL,
  `start_date` DATE NOT NULL,
  `end_date` DATE NOT NULL,
     PRIMARY KEY (job_id));


CREATE TABLE `skill_names` (
  `skill_id` INT(11) NOT NULL AUTO_INCREMENT,
  `skill_name` CHAR(32) NOT NULL,
     PRIMARY KEY (skill_id));


CREATE TABLE `skill_usage` (
  `skill_id` INT(11) NOT NULL,
  `job_id` INT(11) NOT NULL);

INSERT INTO `skill_names` (skill_name) VALUES("C");
INSERT INTO `skill_names` (skill_name) VALUES("Python");
INSERT INTO `skill_names` (skill_name) VALUES("C++");
INSERT INTO `skill_names` (skill_name) VALUES("UML");

INSERT INTO `candidates` (candidate_name, candidate_city, latitude, longitude ) VALUES("One",   "Hastings",   50.8543, 0.5735);
INSERT INTO `candidates` (candidate_name, candidate_city, latitude, longitude ) VALUES("Two",   "Slough",  51.5105, 0.5950);
INSERT INTO `candidates` (candidate_name, candidate_city, latitude, longitude ) VALUES("Three", "Stonehenge", 51.1789, -1.8262);

INSERT INTO `companies` (company_name, company_city, company_post_code, latitude, longitude ) VALUES("Thales", "Crawley",   "AB1 1CD", 51.1091, -0.1872);
INSERT INTO `companies` (company_name, company_city, company_post_code, latitude, longitude ) VALUES("BAe",    "Rochester", "EF1 2GH", 51.3880, 0.5067);
INSERT INTO `companies` (company_name, company_city, company_post_code, latitude, longitude ) VALUES("Google", "East Ham",  "E6  0XX", 51.5334, 0.0499);

INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(1, 1, "2015-01-010", "2015-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(1, 2, "2016-01-010", "2016-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(1, 3, "2017-01-010", "2017-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(2, 2, "2015-01-010", "2015-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(2, 3, "2020-01-010", "2020-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(2, 3, "2011-01-010", "2011-12-31");
INSERT INTO `jobs` (candidate_id, company_id, start_date, end_date ) VALUES(3, 1, "2019-01-010", "2019-12-31");

INSERT INTO `skill_usage` (job_id, skill_id) VALUES(1, 1);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(2, 2);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(3, 3);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(4, 3);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(5, 2);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(6, 3);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(6, 4);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES(7, 2);
INSERT INTO `skill_usage` (job_id, skill_id) VALUES (7, 4);

Advertisement

Answer

Example of working query in DB fiddle: https://www.db-fiddle.com/f/rQKazPgbtGS766WEiuiXyR/0

$string = '(C AND kernel) OR (C++ AND UML)';

//Adding spaces between parentheses to accept inputs without that space as well
$string = str_replace('(', ' ( ', $string);
$string = str_replace(')', ' ) ', $string);

//Splitting input into separate strings, taken from previous question
$tokens = preg_split('/[s]+/', $string);

$query = '';

$args = [];
$n = 0;
//Transcripting all strings separately and each skill name into FIND_IN_SET function
foreach ($tokens as $tok) {
    switch ($tok) {
        case '':  # skip empty tokens
        case ';':  # No, you should not!
        case '"':
        case "'":
        case ';':
            break;
        case '(':
            $query .= '(';
            break;
        case ')':
            $query .= ')';
            break;
        case '&':
        case 'AND':
            $query .= ' AND ';
            break;
        case '|':
        case 'OR':
            $query .= ' OR ';
            break;
        case '!':
        case 'NOT':
            $query .= ' NOT ';
            break;
        default:
            $arg = 'arg_' . $n;
            $args[$arg] = $tok;
            $query .= "FIND_IN_SET(':{$arg}', skills)";
            $n++;
            break;
    }
}

//Query grouping all of the skills used by candidates returning candidates
//with grouped column skills with HAVING clause containing all of the
//conditions they must match. So query returns only candidates who have ever
//used desired skills.
$sql = "SELECT candidate_id, GROUP_CONCAT(DISTINCT skill_names.skill_name) as skills
FROM jobs
LEFT JOIN skill_usage ON skill_usage.job_id = jobs.job_id
LEFT JOIN skill_names ON skill_names.skill_id = skill_usage.skill_id
GROUP BY candidate_id
HAVING {$query}
ORDER BY candidate_id DESC";

Example output for (C AND kernel) OR (C++ AND UML):

SELECT candidate_id, GROUP_CONCAT(DISTINCT skill_names.skill_name) as skills
FROM jobs
    LEFT JOIN skill_usage ON skill_usage.job_id = jobs.job_id
    LEFT JOIN skill_names ON skill_names.skill_id = skill_usage.skill_id
GROUP BY candidate_id
HAVING (FIND_IN_SET('C', skills) AND FIND_IN_SET('kernel', skills)) OR (FIND_IN_SET('C++', skills) AND FIND_IN_SET('UML', skills))
ORDER BY candidate_id DESC

Array of arguments:

Array
(
    [arg_0] => C
    [arg_1] => kernel
    [arg_2] => C++
    [arg_3] => UML
)
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement