Skip to content
Advertisement

Questions on the possible fastest relation graph alghorithm in PHP

Good morning, so here is my problem. And sorry in advance if my english isn’t perfect. Anyway, I have a huge amount of relation between numbers formated as so :

$relation1 = [1, 2];
$relation2 = [2, 3];
$relation3 = [4, 5];
$relation4 = [6, 7];
$relation5 = [7, 2];

each array represente a relation between the 2 numbers. What I need to do is merge the relations to get a result such as :

$result1 = [1, 2, 3, 6, 7];
$result1 = [4, 5];

I already develloped a algorithm in PHP that do exactly that, the problem is that for a total of about more than 50000 relations, the program takes a long time to execute. When I’m done creating direct relations, which don’t require a lot of time :

$temporary_result1 = [1, 2, 3];
$temporary_result2 = [4, 5];
$temporary_result3 = [6, 7, 2];

I then need to merge group 1 and 3, but with around 20 000 groups previously created, the process takes a really long time, because i need to iterate 20 000 * 20 000 times (400 000 000) that takes around 30 minutes already. As the number of relation is expected to grow overtime, I’m worried the time to execute will grow exponentially since i need to execute the program every days.

I was just wondering if there was a existing algorithm that does this, while being more efficient, in any programming language that I could try and recode in PHP.

Advertisement

Answer

You can look at these relations as Composer packages. Every package has it’s own dependency.

Like for example,

$relation1 = [1, 2];
$relation2 = [2, 3];
  • In the above example, you can say that package 1 is dependent on package 2 and package 2 is dependent on package 3.

  • Likewise, we have to find all connected components. The relation between 2 packages need not necessarily be reflexive, meaning package A is dependent on package B doesn’t necessarily mean package B is dependent on package A.

  • But for the scope of this problem, we can make the relation reflexive to get the connected components via depth first search. If you want, you can adapt to a non-reflexive idea using union find by path compression to merge values with singular parents, but in my answer, I would stick with DFS.

Your Input:

$relation1 = [1, 2];
$relation2 = [2, 3];
$relation3 = [4, 5];
$relation4 = [6, 7];
$relation5 = [7, 2];

For the above input, the adjacency list would look like:

1 => 2
2 => 1,3,7
3 => 2
4 => 5
5 => 4
6 => 7
7 => 6,2

Snippet:

function getConnectedComponents($relations,$total_nodes){
    $result = [];
    $nodes = [];
    $visited = [];

    for($i=1;$i<=$total_nodes;++$i){
        $nodes[$i] = [];
        $visited[$i] = false;
    }

    foreach($relations as $relation){
        $nodes[$relation[0]][] = $relation[1];
        $nodes[$relation[1]][] = $relation[0];
    }

    for($i=1;$i<=$total_nodes;++$i){
        if(!$visited[$i]){
            $temp = [];
            dfs($nodes,$i,$visited,$temp);
            $result[] = $temp;
        }
    }

    return $result;
}

function dfs($nodes,$node,&$visited,&$temp){
    if($visited[$node]) return;
    $visited[$node] = true;
    $temp[] = $node;
    foreach($nodes[$node] as $child_node){
        dfs($nodes,$child_node,$visited,$temp);
    }
}

Demo: https://3v4l.org/JWAF6

In the above algorithm, we build the adjacency list and do a depth first search on each node. We mark nodes as visited along the way to make sure that we aren’t visiting something we already processed.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement