One of the WordPress plugins we’re using is relying on regex to detect anchor tags in HTML. The code is as follows:
$regexp = "<as[^>]*href=("??)([^" >]*?)\1[^>]*>(.*)</a>"; preg_match_all("/$regexp/siU", $string, $matchArray);
This results in $matchArray
being populated with all anchor tags including the ones with just fragment URLs in href
attribute (ex: href="#this-is-an-id"
or href="#"
shouldn’t match).
We’re trying to update the regex to ignore anchor tags with fragment URLs. I tried the following regex, but it seem to work. Regex is not my strong suit, looking for any helpful guidance in the right direction.
$regexp = "<as[^>]*href=("[^#.*]??)([^" >]*?)\1[^>]*>(.*)</a>";
P.S.: The goal is to fix this and submit a PR to the original plugin author, so it gets rectified.
Advertisement
Answer
If you’re just trying to ignore URLs that start with #
, you can use this:
$regexp = "<as[^>]*href=("??)([^#"][^" >]*?)\1[^>]*>(.*)</a>"