Skip to content
Advertisement

Is there any way that my HTML securer could be exploited?

I’ve finally managed to make a function which does the following:

  1. Takes a string as input. This can be either an entire HTML document or a HTML “snippet” (even broken).
  2. Creates a DOMDocument from this and loops through all nodes.
  3. Whenever it encounters any node whose element is outside of a whitelist of basic structural elements, it “marks it for deletion”. For example, <script> is not whitelisted.
  4. Whenever any node has ANY attribute starting with “on”, this is immediately removed with removeAttribute. The same goes for any “style” attribute, and any “href” attribute whose value starts with “javascript:”.
  5. When all nodes are looped through, the ones marked for deletion are looped over and deleted ($node->parentNode->removeChild($node)). This isn’t done in the first loop because the parser becomes confused if you do that.
  6. This document is now saveHTMLed and returned as a string, now representing a cleaned/secured HTML document/snippet.

As far as I can tell, there is no way to abuse this. Unless there is some bug in the DOM parser, which is off my hands/conscience.

But maybe there is another “onsomething” attribute or something else I haven’t thought of?

I feel pretty confident in outputting any HTML from any untrusted external/user-provided source after it’s been mangled through this function of mine, but perhaps I’m being cocky?

(I truly wish that strip_tags would do this on its own so that I didn’t have to code my own thing.)

Advertisement

Answer

If you want to prevent xss, all of the on* attributes are candidates for removal. Also style might have javascript in various ways in some browsers, as well as href (javascript:). SVG can I think include scripts and so on.

Look here for a non-comprehensive list of how these sanitizers would be bypassed, and why it’s very hard to build a sanitizer yourself.

Why not just use a known-good sanitizer like Google Caja, instead of reinventing them? It’s a lot harder than you seem to think.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement