Skip to content
Advertisement

How to get intersection between two strings in PHP

A problem description:

I have two strings and I need to find the length of intersection of them.

Let’s assume the both strings are Latin-ASCII and lower case.

These are expected results:

$str1 = "lorem ipsum";
$str2 = "rem";
echo str_intersection($str1, $str2); // Expected result: 3

$str2 = "xzy";
echo str_intersection($str1, $str2); // Expected result: 0

My try to solve the problem:

I’ve tried to compare the strings using array_intersect() function this way:

$str_intersection = function(string $str1, string $str2): int {
   $arr1 = str_split($str1); // ['l','o','r','e','m',' ','i','p','s','u','m']
   $arr2 = str_split($str2); // ['r','e','m']

   return count(array_intersect($arr1, $arr2));
};

echo $str_intersection($str1, $str2); // Result: 4 (because of lo*REM* ipsu*M*)

But this way of comparing two strings is inappropriate because it compares occurrences of characters and not whole parts of strings as I need it.

In addition, the str_intersection() function designed in this way is not only inappropriate, but also very slow if I need to compare thousands of strings.


Example how I plan to use the needed function:

As requested I wrote a little example how I plan to use the string intersection function:

$strings = ['lorem', 'ipsum', 'dolor', 'sit', 'amet', 'consectetur'];
$needle = 'lo';
$intersections = [];
foreach ($strings as $str) {
    $intersections[] = str_intersection($str, $needle);
}
print_r($intersections);

Expected result (intersection “highlighed” as uppercase):

Array (
    [0] => 1 // LOrem
    [1] => 0 // ipsum
    [2] => 1 // doLOr
    [3] => 0 // sit
    [4] => 0 // amet
    [5] => 0 // consectetur
)

Advertisement

Answer

This is my attempt.

function str_intersection($str1, $str2)
{
   [$long, $short] = strlen($str1) > strlen($str2) ? [$str1, $str2] : [$str2, $str1];
   $shortLength = strlen($short);
   for ($length = $shortLength; $length > 0; $length--) {
       for ($offset = 0; $offset < $shortLength - 1; $offset++) {
           if (strpos($long, substr($short, $offset, $length)) !== false) return $length;
       }       
   }
   return 0;    
}

$str1 = "lorem ipsum";
$str2 = "rem";
echo str_intersection($str1, $str2) . PHP_EOL; // Expected result: 3

$str2 = "xzy";
echo str_intersection($str1, $str2) . PHP_EOL; // Expected result: 0

This outputs:

3
0

See: https://3v4l.org/7YW0R#v8.0.25

This function starts by sorting the input strings, so we know which one is the shortest. It then tries to find the longest part of this shortest string in the longer string. This is not very efficient, who can improve this?

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement