Skip to content
Advertisement

Parsing a boolean expression into a MySql query in PHP

These are the only two tables that are germane. No need to bother you with the others.

mysql> describe skill_usage;
+----------+---------+------+-----+---------+-------+
| Field    | Type    | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO   | MUL | NULL    |       |
| job_id   | int(11) | NO   | MUL | NULL    |       |
+----------+---------+------+-----+---------+-------+

mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field      | Type     | Null | Key | Default | Extra          |
+------------+----------+------+-----+---------+----------------+
| skill_id   | int(11)  | NO   | PRI | NULL    | auto_increment |
| skill_name | char(32) | NO   | MUL | NULL    |                |
+------------+----------+------+-----+---------+----------------+

Basically, users enter a boolean search string, using skill names.

I will convert the skill snames to skill_id, and then want to generate a MySql query to get all matching job_id from table skill_usage, by parsing the user’s search string.

Strings can contain skill name, the operators AND and OR, plus brackets for precedence.

Some examples might be

  • C
  • C or C++
  • C++ AND UML
  • (C AND kernel) OR (C++ AND UML)

But there is no limit to the complexity of expression allowed – and that’s my problem.

I’m no SQL guru, so correct me if I am wrong. I think that I want to start SELECT job_id FROM skill_usage then parse, and build up the rest of the query.

For the first example, just skill name C, I want to add WHERE skillId = X, where X is gotten from the table skill_names.

For a simple OR, like C OR C++, I can use an IN clause – WHERE skillId IN (X, Y) (again, X & Y are lookups of the skill names to get a skill_id).

For a simple AND, like C++ AND UML, I reckon I need an INNER JOIN, something like WHERE skill_id = X INNER JOIN skill_usage ON skill_usage.skill_id = Y (where X is the skill_id for C++ and Y for UML).

I think that is roughly correct, for those simple cases (?).

But, when I get to even slightly more complex cases like (C AND kernel) OR (C++ AND UML), I get confused.

Would a regex or an algorithm be appropriate here?

@AnthonyVallée-Dubois answer to this question looks like I might be able to modify it, but it also seems very complex. I am hoping to make something simpler, but am unsure how to start (the PHP coding is not my problem, just the regex or algorithm).

Update

I am trying to separate the parsing from the queries, and am using this question to sort out the queries.

I am getting answers like

SELECT job_id
FROM skill_usage
WHERE skill_id IN (3, 4)
GROUP BY job_id
HAVING MIN(skill_id) <> MAX(skill_id);

and

select s1.job_id
  from skill_usage s1
  where s1.skill_id = 3
    and s1.job_id in (
                       select s2.job_id
                         from skill_usage s2
                        where s2.skill_id = 4
                     )

Where the latter looks more extensible.

Whereas my pseudo-code for the PHP to convert a search string to an SQL query is roughly

fail if mis-matched brackets

reduce multiple spaces to single
removes spaces before and after closing/opening bracket  "( " & " )"

foreach c in string

   if c == (
   
   else
      if c === )
      
      else
         if AND
         
         else
           if OE
           
           else
              # it's a skill name

Advertisement

Answer

Simple query generator, assuming PDO


        ## for simple tokenisation, the terms are separated by space here.
        ## ###############################################################
$string = "( C AND kernel ) OR ( C++ AND UML )";

function emit_term( $tag ) {
$res = " EXISTS (
                SELECT *
                FROM skill_usage su
                JOIN skill_names sn ON sn.skill_id = su.skill_id
                WHERE su.Job_id = j.job_id
                AND sn.skillname = :" . $tag . ")n";
return $res;
}


$fixed_part ="
SELECT job_id, job_name
 FROM jobs j
 WHERE 1=1
 AND n" ;


# $tokens = explode( ' ' , $string ); #splits on any single space
$tokens = preg_split( '/[s]+/' , $string ); # accepts multiple whitespace
# print_r ( $tokens );

$query = $fixed_part;

$args = array();
$num = 1;
foreach ( $tokens as $tok ) {
        switch ($tok) {
        case '':  # skip empty tokens
        case ';':  # No, you should not!
        case '"':
        case "'":
        case ';':  break;
        case '(': $query .= '('; break;
        case ')': $query .= ')'; break;
        case '&':
        case 'AND': $query .= ' AND '; break;
        case '|':
        case 'OR': $query .= ' OR '; break;
        case '!':
        case 'NOT': $query .= ' NOT '; break;
        default:
                $tag = '_q' . $num ;
                $query .= emit_term ( $tag );
                $args[$tag] = $tok;
                $num += 1;
                 break;
                }
        }
$query .= ";nn";

echo "Query + parameters (for PDO):n" ;
echo $query;
print_r ( $args) ;
          

Output:


SELECT job_id, job_name
 FROM jobs j
 WHERE 1=1
 AND 
( EXISTS (
        SELECT *
        FROM skill_usage su
        JOIN skill_names sn ON sn.skill_id = su.skill_id
        WHERE su.Job_id = j.job_id
        AND sn.skillname = :_q1)
 AND  EXISTS (
        SELECT *
        FROM skill_usage su
        JOIN skill_names sn ON sn.skill_id = su.skill_id
        WHERE su.Job_id = j.job_id
        AND sn.skillname = :_q2)
) OR ( EXISTS (
        SELECT *
        FROM skill_usage su
        JOIN skill_names sn ON sn.skill_id = su.skill_id
        WHERE su.Job_id = j.job_id
        AND sn.skillname = :_q3)
 AND  EXISTS (
        SELECT *
        FROM skill_usage su
        JOIN skill_names sn ON sn.skill_id = su.skill_id
        WHERE su.Job_id = j.job_id
        AND sn.skillname = :_q4)
);

Array
(
    [_q1] => C
    [_q2] => kernel
    [_q3] => C++
    [_q4] => UML
)

                     
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement