Skip to content
Advertisement

Excluding anchor tags starting with fragment URLs using regex

One of the WordPress plugins we’re using is relying on regex to detect anchor tags in HTML. The code is as follows:

$regexp = "<as[^>]*href=("??)([^" >]*?)\1[^>]*>(.*)</a>";

preg_match_all("/$regexp/siU", $string, $matchArray);

This results in $matchArray being populated with all anchor tags including the ones with just fragment URLs in href attribute (ex: href="#this-is-an-id" or href="#" shouldn’t match).

We’re trying to update the regex to ignore anchor tags with fragment URLs. I tried the following regex, but it seem to work. Regex is not my strong suit, looking for any helpful guidance in the right direction.

$regexp = "<as[^>]*href=("[^#.*]??)([^" >]*?)\1[^>]*>(.*)</a>";

P.S.: The goal is to fix this and submit a PR to the original plugin author, so it gets rectified.

Advertisement

Answer

If you’re just trying to ignore URLs that start with #, you can use this:

$regexp = "<as[^>]*href=("??)([^#"][^" >]*?)\1[^>]*>(.*)</a>"

Demo

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement