I want to preg_replace the following:
$string='<a blah href="http://example.com/readme.zip" blah><img ><a blah href="http://example.com/readme.zqp" blah>';
I want to add a target="_blank"
to every href which is NOT ending in .zip
or .pdf
or .txt
I tried a pattern like this: $pattern='href="http.*(?!zip)"';
but does not work.
What’s the best way of doing this?
Advertisement
Answer
You should really use PHP’s built-in DOMDocument
to parse and process HTML. Then you can simply fetch all <a>
tags and check whether the href
ends in .zip
or .pdf
or .txt
, and if not, add a target
attribute with value _blank
:
$doc = new DOMDocument(); $doc->loadHTML("<html>$string</html>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); foreach ($doc->getElementsByTagName('a') as $a) { $href = $a->getAttribute('href'); if (!preg_match('/(zip|pdf|txt)$/', $href)) { $a->setAttribute('target', '_blank'); } } echo substr($doc->saveHTML(), 6, -8);
Output:
<a blah href="http://example.com/readme.zip" blah2></a><img> <a blah href="http://example.com/readme.zqp" blah2 target="_blank"></a>
Note that because you don’t have a top-level element in the sample HTML, one (<html>
) has to be added on read and then removed on output (using substr
). If your actual HTML has a top-level element, you don’t need to bother with that.
If you insist on using regex, there’s a regex in the demo too…