These are the only two tables that are germane. No need to bother you with the others.
mysql> describe skill_usage; +----------+---------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+---------+------+-----+---------+-------+ | skill_id | int(11) | NO | MUL | NULL | | | job_id | int(11) | NO | MUL | NULL | | +----------+---------+------+-----+---------+-------+ mysql> describe skill_names; +------------+----------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+----------+------+-----+---------+----------------+ | skill_id | int(11) | NO | PRI | NULL | auto_increment | | skill_name | char(32) | NO | MUL | NULL | | +------------+----------+------+-----+---------+----------------+
Basically, users enter a boolean search string, using skill names.
I will convert the skill snames to skill_id
, and then want to generate a MySql query to get all matching job_id
from table skill_usage
, by parsing the user’s search string.
Strings can contain skill name, the operators AND and OR, plus brackets for precedence.
Some examples might be
- C
- C or C++
- C++ AND UML
- (C AND kernel) OR (C++ AND UML)
But there is no limit to the complexity of expression allowed – and that’s my problem.
I’m no SQL guru, so correct me if I am wrong. I think that I want to start SELECT job_id FROM skill_usage
then parse, and build up the rest of the query.
For the first example, just skill name C, I want to add WHERE skillId = X
, where X is gotten from the table skill_names
.
For a simple OR
, like C OR C++
, I can use an IN
clause – WHERE skillId IN (X, Y)
(again, X & Y are lookups of the skill names to get a skill_id
).
For a simple AND
, like C++ AND UML
, I reckon I need an INNER JOIN
, something like WHERE skill_id = X INNER JOIN skill_usage ON skill_usage.skill_id = Y
(where X is the skill_id
for C++ and Y for UML).
I think that is roughly correct, for those simple cases (?).
But, when I get to even slightly more complex cases like (C AND kernel) OR (C++ AND UML)
, I get confused.
Would a regex or an algorithm be appropriate here?
@AnthonyVallée-Dubois answer to this question looks like I might be able to modify it, but it also seems very complex. I am hoping to make something simpler, but am unsure how to start (the PHP coding is not my problem, just the regex or algorithm).
Update
I am trying to separate the parsing from the queries, and am using this question to sort out the queries.
I am getting answers like
SELECT job_id FROM skill_usage WHERE skill_id IN (3, 4) GROUP BY job_id HAVING MIN(skill_id) <> MAX(skill_id);
and
select s1.job_id from skill_usage s1 where s1.skill_id = 3 and s1.job_id in ( select s2.job_id from skill_usage s2 where s2.skill_id = 4 )
Where the latter looks more extensible.
Whereas my pseudo-code for the PHP to convert a search string to an SQL query is roughly
fail if mis-matched brackets reduce multiple spaces to single removes spaces before and after closing/opening bracket "( " & " )" foreach c in string if c == ( else if c === ) else if AND else if OE else # it's a skill name
Advertisement
Answer
Simple query generator, assuming PDO
## for simple tokenisation, the terms are separated by space here. ## ############################################################### $string = "( C AND kernel ) OR ( C++ AND UML )"; function emit_term( $tag ) { $res = " EXISTS ( SELECT * FROM skill_usage su JOIN skill_names sn ON sn.skill_id = su.skill_id WHERE su.Job_id = j.job_id AND sn.skillname = :" . $tag . ")n"; return $res; } $fixed_part =" SELECT job_id, job_name FROM jobs j WHERE 1=1 AND n" ; # $tokens = explode( ' ' , $string ); #splits on any single space $tokens = preg_split( '/[s]+/' , $string ); # accepts multiple whitespace # print_r ( $tokens ); $query = $fixed_part; $args = array(); $num = 1; foreach ( $tokens as $tok ) { switch ($tok) { case '': # skip empty tokens case ';': # No, you should not! case '"': case "'": case ';': break; case '(': $query .= '('; break; case ')': $query .= ')'; break; case '&': case 'AND': $query .= ' AND '; break; case '|': case 'OR': $query .= ' OR '; break; case '!': case 'NOT': $query .= ' NOT '; break; default: $tag = '_q' . $num ; $query .= emit_term ( $tag ); $args[$tag] = $tok; $num += 1; break; } } $query .= ";nn"; echo "Query + parameters (for PDO):n" ; echo $query; print_r ( $args) ;
Output:
SELECT job_id, job_name FROM jobs j WHERE 1=1 AND ( EXISTS ( SELECT * FROM skill_usage su JOIN skill_names sn ON sn.skill_id = su.skill_id WHERE su.Job_id = j.job_id AND sn.skillname = :_q1) AND EXISTS ( SELECT * FROM skill_usage su JOIN skill_names sn ON sn.skill_id = su.skill_id WHERE su.Job_id = j.job_id AND sn.skillname = :_q2) ) OR ( EXISTS ( SELECT * FROM skill_usage su JOIN skill_names sn ON sn.skill_id = su.skill_id WHERE su.Job_id = j.job_id AND sn.skillname = :_q3) AND EXISTS ( SELECT * FROM skill_usage su JOIN skill_names sn ON sn.skill_id = su.skill_id WHERE su.Job_id = j.job_id AND sn.skillname = :_q4) ); Array ( [_q1] => C [_q2] => kernel [_q3] => C++ [_q4] => UML )