Monday, November 19, 2012

How to correctly escape webapp output

1, Escaping must be performed in the last possible place = when writing HTML output


Only when writing a java variable content to the output stream we know if we write the content into a javascript variable, html attribute or as a plain text (html body).

2, Every variable of a String type deserves escaping. Don't escape output only in case you NEED to print HTML.


You may never know how a possibly malicious content got into your variable. It may be persistent
XSS from DB, session attribute, request parameter.

3, Use the right escaping for the right situation. 


See OWASP's XSS_Prevention_Rules_Summary


Side note: Don't mix URL encoding with URL escaping. 

See HTML appendix B.2.1 and B.2.2


URL encoding
Though there can be UTF-8 characters in the URL, URLs should be transmitted in US-ASCII encoding.

Valid URL follows percent encoding of characters, described in Percent encoding of URI characters.

When URL is not encoded, UTF-8 characters are translated into US-ASCII OOTB by browsers. But if you want to be sure (you can't lose here) you can use new java.net.URI().toASCIIString(). In javascript there is a similar function window.encodeURI().

What MUST BE encoded is a URL parameter name and value. It's confusing but to encode the parameter use java.net.URLEncoder.encode(). Javascript has better name: window.encodeURIComponent().

Don't use URL encoding to perform HTML escaping! (Yes, I've seen that :) )


URL escaping
Let's say we already have a valid (correctly encoded) URL which we want to write into HTML.

Then there are 2 important steps:
1, Ampersand character (&), which may be a part of URL, is used to prefix html entity references. When writing a valid HTML we should escape it the same way as any other ampersand you want to write there. Use ampersand entity: &

2, To write safe HTML = to be sure that URL won't escape from HTML attributes or JavaScript variable (it is allowed to contain apostrophe character) it's good to escape the URL as described in  XSS_Prevention_Rules_Summary, not only the query string part.

When we don't have the URL properly encoded we must escape the whole URL - perform the step #2.

No comments:

Post a Comment