Skip to content
Advertisement

Without Regex: String Between Quotes?

I’m creating a word-replacement script. I’ve run into a roadblock with ignoring strings between quotes and haven’t been able to find a decent solution here that didn’t involve Regex.

I have a working snippet that cycles through every character in the string and figures out whether the most recent quotation was an opening or closing quote (Whether single or double) and ignores escaped quotes. The problem is that in order for it to provide a 100% accurate experience, it has to run every time the string changes (Because of how it works, it could change well over 60K times across a single function), and due to string length potential, the code takes too long even on a fairly short script.

Is there a fast way to figure out whether a string is between open and close quotes (Single and double)? Ignoring escaped ” and ‘. Or, do you have suggestions on how to optimize the snippet to make it run significantly faster? Removing this function, the process runs at almost the preferred speed (Instant).

As an exercise, consider copying and pasting the snippet into the script with a variable containing text. For example $thisIsAQuote = “This is a quote.”; And, from that point, everything should replace correctly, except $thisIsAQuote should retain its exact text.

But here’s the issue: Other solutions I’ve found will treat everything between “This is a quote.” and … $this->formatted[$i – 1] != ” … as if it’s still between quotes. Because as far as those solutions are concerned, the last quote in “This is a quote.” and the first quote in the if-check are open and close quotes. Another obvious issue is that some strings contain words with apostrophes. Apostrophes shouldn’t be treated as single-quotes, but in all solutions I’ve found, they are.

In other words, they’re “unaware” solutions.

    $quoteClosed = true;
    $singleQuoteClosed = true;

    $codeLength = mb_strlen($this->formatted);
    if ($codeLength == false)
        return;

    for ($i = 0; $i < $codeLength; $i++)
    {
        if ((!$quoteClosed || !$singleQuoteClosed) && ($this->formatted[$i] == '"' || $this->formatted[$i] == "'"))
        {
            if (!$quoteClosed && $this->formatted[$i - 1] != "\")
                $quoteClosed = true;
            else if (!$singleQuoteClosed && $this->formatted[$i - 1] != "\")
                $singleQuoteClosed = true;
        }
        else if ($this->formatted[$i] == '"' && ($i <= 0 || $this->formatted[$i - 1] != "\"))
        {
            if ($quoteClosed && $singleQuoteClosed)
                $quoteClosed = false;
        }
        else if ($this->formatted[$i] == "'" && ($i <= 0 || $this->formatted[$i - 1] != "\"))
        {
            if ($singleQuoteClosed && $quoteClosed)
                $singleQuoteClosed = false;
        }

        if ($quoteClosed && $singleQuoteClosed)
            $this->quoted[$i] = 0;
        else
            $this->quoted[$i] = 1;
    }

If there isn’t a way to make the above more efficient, is there a non-Regex way to quickly replace all substrings in an array with substrings in a second array without missing any across an entire string?

substr_replace and str_replace only seem to replace “some” pieces of the overall string, which is why the number of iterations are in place. It cycles through a while loop until either strpos deems a string nonexistent (Which it never seems to do … I may be using it wrong), or it cycles through 10K times, whichever occurs first.

Running the above snippet -once- per round would solve the speed issue, but that leaves the “full-replacement” issue and, of course, staying aware that it should avoid replacing anything within quotes.

    for ($a = 0; $a < count($this->keys); $a++)
    {
        $escape = 0;
        if ($a > count($this->keys) - 5)
            $this->formatted = $this->decodeHTML($this->formatted);

        while (strpos($this->formatted, $this->keys[$a]) !== false)
        {
            $valid = strpos($this->formatted, $this->keys[$a]);
            if ($valid === false || $this->quoted[$valid] === 1)
                break;

            $this->formatted = substr_replace($this->formatted, $this->answers[$a], $valid, mb_strlen($this->keys[$a]));
            $this->initializeQuoted();
            $escape++;

            if ($escape >= 10000)
                break;
        }

        if ($a > count($this->keys) - 5)
            $this->formatted = html_entity_decode($this->formatted);
    }
    $this->quoted = array();
    $this->initializeQuoted();
    return $this->formatted;

‘keys’ and ‘answers’ are arrays containing words of various lengths. ‘formatted’ is the new string with the changed information. ‘initializeQuoted’ is the above snippet. I use htmlentities and html_entity_decode to help get rid of whitespaces with key/answer replacements.

Ignore the magic numbers (5s and 10K).

Advertisement

Answer

If I understand you correctly then you can do this:

$replacements = [
    "test" => "banana",
    "Test" => "Banana"
];  

$brackets = [[0]];
$lastOpenedQuote = null;



for ($i = 0;$i < strlen($string);$i++) {

    if ($string[$i] == "\") { $i++; continue; } //Skip escaped chars

    if ($string[$i] == $lastOpenedQuote) {
        $lastOpenedQuote = null;
        $brackets[count($brackets)-1][] = $i; 
        $brackets[] = [ $i+1 ];
    } elseif ($lastOpenedQuote == null && ($string[$i] == """ || $string[$i] == "'")) {
        $lastOpenedQuote = $string[$i];
        $brackets[count($brackets)-1][] = $i-1; 
        $brackets[] = [ $i ];
    }
}
$brackets[count($brackets)-1][] = strlen($string)-1;

$prev = 0;
$bits = [];
foreach ($brackets as $index => $pair) {
    $bits[$index] = substr($string,$pair[0],$pair[1]-$pair[0]+1);
    if ($bits[$index][0] != """ && $bits[$index][0] != "'") {
        $bits[$index] = str_replace(array_keys($replacements),array_values($replacements), $bits[$index]);
    }
}

Check it out at: http://sandbox.onlinephpfunctions.com/code/0453cb7941f1dcad636043fceff30dc0965541ee

Now if performance is still an issue keep in mind this goes through each string character 1 time and does the minimum number of checks it needs each time so it will be really hard to reduce it more. Perhaps you should revise your approach from the bottom up if you need something faster like e.g. doing some of the splitting on the client-side progressively instead of on the whole string on the serverside.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement