Correct URLs
All URIs need to be encoded using percent encoding as specified in rfc3986.
URLs may consist of:
* reserved characters - when used with their special meaning: !*'();:@&=+$,/?#[]
* unreserved characters - don't need to be encoded: a-z A-Z 0-9 -._~
* any other characters must be encoded using %xx hex escaping sequence using UTF-8 bytes
This means:
http://example.com/příliš/žluťoučký;kůň=úpěl?ďábelské=ódy is a nice invalid URL.
Correct is:
http://example.com/p%C5%99%C3%ADli%C5%A1/%C5%BElu%C5%A5ou%C4%8Dk%C3%BD;k%C5%AF%C5%88=%C3%BAp%C4%9Bl?%C4%8F%C3%A1belsk%C3%A9=%C3%B3dy
Question: Are URLs in my web app broken?
Answer: Unless you always go with ASCII ... probably yes :)
Better answer: It doesn't matter, browsers are smart enough to correctly parse the simple ones
Semantics of URL parts
Important to know ⁉
- URL = decode(encode(URL))
- encode(URL) != encode(decode(encode(URL)))
Decoding depends on correct knowledge of the parts being encoded!
* http://example.com/evaluate/3%2B2%2F5 ⇝ http://example.com/evaluate/3+2/5
* http://example.com/evaluate/3%2B2/5 ⇝ http://example.com/evaluate/3+2/5
* http://example.com/evaluate/3%2B2%2F5 ⇝ http://example.com/evaluate/3+2/5
* http://example.com/evaluate/3%2B2/5 ⇝ http://example.com/evaluate/3+2/5
=> only then DECODING is able to correctly identify the parts
Exercise from [2]:
http://example.com/:@-._~!$&'()*+,=;:@-._~!$&'()*+,=:@-._~!$&'()*+,==?/?:@-._~!$'()*+,;=/?:@-._~!$'()*+,;==#/?:@-._~!$&'()*+,;=
The important are delimiters - reserved characters.
http://example.com/:@-._~!$&'()*+,=;:@-._~!$&'()*+,=:@-._~!$&'()*+,==?/?:@-._~!$'()*+,;=/?:@-._~!$'()*+,;==#/?:@-._~!$&'()*+,;=
For the patient ones - see URL rfc1738
For the patient ones - see URL rfc1738
URLs & HTML = XSS ?
A: Inside not encoded URL ... YES!
Example URL: http://localhost/'"><img src=x onerror=alert(1)>
⇝ <a href="http://localhost/'"><img src=x onerror=alert(1)>">link</a>
Q: Can there be a XSS in correctly encoded URL?
A: Yes!
Example URL: http://localhost/'+alert(1)+'
Correctly encoded: http://localhost/'+alert(1)+'
⇝ <script>location.href='http://localhost/'+alert(1)+'';</script>
Summary
Q: Does it matter when I don't encode URLs using percent-encoding syntax?A: Only when parts contain reserved characters.
=> ENCODE user-supplied parts of the URL
http://example.com/evaluate/ + encode("3+2/5") ⇝ http://example.com/evaluate/3%2B2%2F5
Q: Can there be a XSS in URL?
A: Yes.
=> ESCAPE user-supplied URLs based on the surrounding context (see OWASP cheet sheet)!
⇝ <a href="http://localhost/'"><img src=x onerror=alert(1)>">link</a>
⇝ <script>location.href='http\x3a\x2f\x2flocalhost\x2f\x27\x22\x3e\x3cimg\x20src\x3dx\x20onerror\x3dalert\x281\x29\x3e';</script>
Where to continue...
(for serious readers, ordered by simplicity)
- http://www.w3.org/TR/html40/appendix/notes.html#h-B.2
- https://en.wikipedia.org/wiki/Percent-encoding
- http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding
- https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet
- https://tools.ietf.org/html/rfc3986 - URI
- http://tools.ietf.org/html/rfc1738 - URL