Jacob Kaplan-Moss

Tag: Content

FAQ: Untrusted users and HTML

An input form that takes raw HTML. It’s a pretty common thing to see in web apps these days: many comment forms allow HTML, or some subset thereof; many social-network-style applications allow end-users to enter HTML in their profiles; etc. Unfortunately, allowing untrusted users to enter raw HTML is incredibly dangerous; read up on XSS if you don’t know why.

So a common question that comes up in web developer circles deals with how best to “escape” user-entered HTML so that is safe for presentation. Though this seems easy, it’s actually incredibly difficult – see Whitelist, Don’t Blacklist for an introduction. I’ve literally seen hundreds of recipes for stripping unsafe HTML that are about as effective as a screen door on a submarine.

February 24th, 2009 • content html security