Skip to content
Advertisement

Match Country in String, split based on result

I have a CSV file with one of the fields holding state/country info, formatted like: “Florida United States” or “Alberta Canada” or “Wellington New Zealand” – not comma or tab delimited between them, simply space delimited.

I have an array of all the potential countries as well.

What I am looking for, is a solution that, in a loop, I can split the State and Country to different variables, based on matching the country in the $countryarray that I have something like:

$countryarray=array("United States","Canada","New Zealand");
$userfield="Wellington New Zealand");
$somefunction=(match "New Zealand", extract into $country, the rest into $state)

Split won’t do it straight up – because many of the countries AND states have spaces, but the original data set concatenated the state and country together with just a space…

TIA!

Advertisement

Answer

I’m a fan of the RegEx method that @Mike Morton mentioned. You can take an array of countries, implode them using the | which is a RegEx OR, and use that as an “ends with one of these” pattern.

Below I’ve come up with two ways to do this, a simple way and an arguably overly complicated way that does some extra escaping. To illustrate what that escaping does I’ve added a fake country called Country XYZ (formally ABC).

Here’s the sample data that works with both methods, as well as a helper function that actually does the matching and echoing. The RegEx does named-capturing, too, which makes things really easy to deal with.

// Sample data
$data = [
    'Wellington New Zealand',
    'Florida United States of America',
    'Quebec Canada',
    'Something Country XYZ (formally ABC)',
];

// Array of all possible countries
$countries = [
    'United States of America',
    'Canada',
    'New Zealand',
    'Country XYZ (formally ABC)',
];

// The begining and ending pattern delimiter for the RegEx
$delim = '/';

function matchAndShowData(array $data, array $countries, string $delim, string $countryParts): void
{
    $pattern = "^(?<region>.*?) (?<country>$countryParts)$";
    
    foreach($data as $d) {
        if(preg_match($delim . $pattern . $delim, $d, $matches)){
            echo sprintf('%1$s, %2$s', $matches['region'], $matches['country']), PHP_EOL;
        } else {
            echo 'NO MATCH: ' . $d, PHP_EOL;
        }
    }
}

Option 1

The first option is a naïve implode. This method, however, will not find the country that includes parentheses.

matchAndShowData($data, $countries, $delim, implode('|', $countries));

Output

Wellington, New Zealand
Florida, United States of America
Quebec, Canada
NO MATCH: Something Country XYZ (formally ABC)

Option 2

The second option applies proper RegEx quoting of the countries, just in case they have special characters. If you are 100% certain you don’t have any, this is overkill, but I personally have learned, after way too many hours of debugging, to just always quote, just in case.

$patternParts = array_map(fn(string $country) => preg_quote($country, $delim), $countries);

// Implode the cleaned countries using the RegEx pipe operator which means "OR"
matchAndShowData($data, $countries, $delim, implode('|', $patternParts));

Output

Wellington, New Zealand
Florida, United States of America
Quebec, Canada
Something, Country XYZ (formally ABC)

Note

If you don’t expect your list of countries to change often you can echo the pattern out and then just bake that into your code which will probably shave a couple of milliseconds of execution, which in a tight loop might be worth it.

Demo

You can see a demo of this here: https://3v4l.org/CaNRZ

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement