Skip to content
Advertisement

Cluster PHP array

Let’s say I have an array of items with each item a value. I’d like to create a new array where the items are clustered by their relative distance to each other. When an item has a distance of one to another item, they belong to each other.

$input = [
    'item-a' => 1,
    'item-b' => 2,
    'item-c' => 3,
    'item-d' => 5,
];

$output = [
    ['item-a', 'item-b'],
    ['item-b', 'item-c'],
    ['item-d'],
];

This will create an output of overlapping arrays. What I want is that, because item-a and item-b are related, and item-b is also related to item-c, I’d like to group item-a, item-b, and item-c to each other. The distance to item-c and item-d is greater than 1 so it will for a cluster of itself.

$output = [
    ['item-a', 'item-b', 'item-c'],
    ['item-d'],
];

How do I even start coding this?

Thanks in advance and have a nice day!

Advertisement

Answer

This can only be tested in your environment but here is what it does

  • it attempts to find relative distances based on array index 0’s hash
  • it resorts the input array by distances (assuming that in this stage some will be positive and some negative) – that gives us the info to put the hash array in an order
  • Take this new array and put the hash back in
  • build a final output array measuring distances and sorting the level of output array by a threshhold.

I put in a couple dummy functions to return distances, obviously replace with your own. This might need tweaking but at this point, it’s in your hands.

<?php
// example code

$input = [
    'item-a' => 'a234234d',
    'item-f' => 'h234234e',
    'item-h' => 'e234234f',
    'item-b' => 'f234234g',
    'item-m' => 'd234234j',
    'item-d' => 'm234234s',
    'item-e' => 'n234234d',
    'item-r' => 's234234g',
    'item-g' => 'f234234f',
];

function getDistanceFrom($from, $to) {
    return rand(-3,3);
}

function getDistanceFrom2($from, $to) {
    return rand(0,7);
}

// first sort by relative distance from the first one

$tmp = [];
$ctr = 0;
foreach ($input as $item => $hash) {
    if ($ctr === 0) { $ctr ++; continue; }
    $tmp[$item]=getDistanceFrom(reset($input), $hash);
}

uasort($tmp, function ($a, $b)
{
    return ($a < $b) ? -1 : 1;
});

//now they're in order, ditch the relative distance and put the hash back in
$sortedinput = [];
foreach ($tmp as $item => $d) {
    $sortedinput[$item] = $input[$item];
}


$output=[];
$last=0;
$level=0;
$thresh = 3; // if item is within 3 of the previous, group
foreach($sortedinput as $v=>$i) {
  $distance = getDistanceFrom2($last, $i);
  if (abs($distance) > $thresh) $level++;
  $output[$level][]=array("item" => $v, "distance" => $distance, "hash" => $i);
  $last = $i;
}
print_r($output);
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement