Skip to content
Advertisement

regexp to find locale in URL

I’m doing regular expression to parse url and find locale on my website. What I did is this code:

<?php

$app_conf = require_once __DIR__ . '/../config/app.php';

function extract_lang($avail)
{
    $uri_lang = [];
    if (preg_match('/^(/)+([a-z]{2})(/+.*)?/', $_SERVER['REQUEST_URI'], $uri_lang)) {
        if (in_array($uri_lang[2], $avail)) {
            $_SERVER['REQUEST_URI'] = isset($uri_lang[3]) ? $uri_lang[3] : "/";
            $_SERVER['HTTP_LANG'] = $uri_lang[2];
        }
    }
}

if ($app_conf['extract_from_uri']) {
    extract_lang($app_conf['locales']);
}

It’s working most of the time, but it has bug. If my given url starts with ‘en’ – it thinks its a locale and crashes my application’s logic. Example route that causes bug:

https://m2.test/environmental_projects

I need to somehow update my regular expression and I’m struggling with it, please help me. In locales config I have array

'locales' => ['en', 'ru']

Okay route should look like

https://m2.test/en/environmental_projects

Advertisement

Answer

You could match a single forward slash, capture in the first group 2 chars a-z and then make group 2 optional matching a forward slash and any char except a newline ending with an anchor $

Note that now there are 2 capturing groups instead of 3, and if you change the delimiter to a char other than / like for example ~, you don’t have to escape the forward slash.

^/([a-z]{2})(/.*)?$

See a regex demo

For example

if (preg_match('~^/([a-z]{2})(/.*)?$~', $_SERVER['REQUEST_URI'], $uri_lang)) {
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement