Skip to content
Advertisement

PHP preg_replace pattern NOT ending in ‘.zip’ or ‘.pdf’ or ‘.txt’

I want to preg_replace the following:

$string='<a blah href="http://example.com/readme.zip" blah><img ><a blah href="http://example.com/readme.zqp" blah>';

I want to add a target="_blank" to every href which is NOT ending in .zip or .pdf or .txt

I tried a pattern like this: $pattern='href="http.*(?!zip)"'; but does not work.

What’s the best way of doing this?

Advertisement

Answer

You should really use PHP’s built-in DOMDocument to parse and process HTML. Then you can simply fetch all <a> tags and check whether the href ends in .zip or .pdf or .txt, and if not, add a target attribute with value _blank:

$doc = new DOMDocument();
$doc->loadHTML("<html>$string</html>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($doc->getElementsByTagName('a') as $a) {
    $href = $a->getAttribute('href');
    if (!preg_match('/(zip|pdf|txt)$/', $href)) {
        $a->setAttribute('target', '_blank');
    }
}
echo substr($doc->saveHTML(), 6, -8);

Output:

<a blah href="http://example.com/readme.zip" blah2></a><img>
<a blah href="http://example.com/readme.zqp" blah2 target="_blank"></a>

Note that because you don’t have a top-level element in the sample HTML, one (<html>) has to be added on read and then removed on output (using substr). If your actual HTML has a top-level element, you don’t need to bother with that.

Demo on 3v4l.org

If you insist on using regex, there’s a regex in the demo too…

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement