I have this html code:
JavaScript
x
<p style="padding:0px;">
<strong style="padding:0;margin:0;">hello</strong>
</p>
How can I remove attributes from all tags? I’d like it to look like this:
JavaScript
<p>
<strong>hello</strong>
</p>
Advertisement
Answer
Adapted from my answer on a similar question
JavaScript
$text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';
echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(/?)>/si",'<$1$2>', $text);
// <p><strong>hello</strong></p>
The RegExp broken down:
JavaScript
/ # Start Pattern
< # Match '<' at beginning of tags
( # Start Capture Group $1 - Tag Name
[a-z] # Match 'a' through 'z'
[a-z0-9]* # Match 'a' through 'z' or '0' through '9' zero or more times
) # End Capture Group
[^>]*? # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
(/?) # Capture Group $2 - '/' if it is there
> # Match '>'
/is # End Pattern - Case Insensitive & Multi-line ability
Add some quoting, and use the replacement text <$1$2>
it should strip any text after the tagname until the end of tag />
or just >
.
Please Note This isn’t necessarily going to work on ALL input, as the Anti-HTML + RegExp will tell you. There are a few fallbacks, most notably <p style=">">
would end up <p>">
and a few other broken issues… I would recommend looking at Zend_Filter_StripTags as a more full proof tags/attributes filter in PHP