One of the WordPress plugins we’re using is relying on regex to detect anchor tags in HTML. The code is as follows:
$regexp = "<as[^>]*href=("??)([^" >]*?)\1[^>]*>(.*)</a>";
preg_match_all("/$regexp/siU", $string, $matchArray);
This results in $matchArray being populated with all anchor tags including the ones with just fragment URLs in href attribute (ex: href="#this-is-an-id" or href="#" shouldn’t match).
We’re trying to update the regex to ignore anchor tags with fragment URLs. I tried the following regex, but it seem to work. Regex is not my strong suit, looking for any helpful guidance in the right direction.
$regexp = "<as[^>]*href=("[^#.*]??)([^" >]*?)\1[^>]*>(.*)</a>";
P.S.: The goal is to fix this and submit a PR to the original plugin author, so it gets rectified.
Advertisement
Answer
If you’re just trying to ignore URLs that start with #, you can use this:
$regexp = "<as[^>]*href=("??)([^#"][^" >]*?)\1[^>]*>(.*)</a>"