Hunting Security Bugs
Extensible Markup Language (XML) is a text format designed to represent data so that it can easily be shared between different computer systems. Although XML has existed for almost 10 years, over the last few years the format has been become extremely popular. Many Web browsers, word processors, databases, and Web servers use XML today. The XML format is used to send data across the network or to store data as a local file. In this chapter, you ll learn how to security test applications that interact with XML. The first part of the chapter describes how to test for non-XML vulnerabilities such as HTML scripting, spoofing, and buffer overflows when the data input is through XML. The second part of the chapter describes specific security issues with XML and how to test for these.
Note | XML includes security features such as signatures and Web Service Security however, these issues are beyond the scope of this book. |
Testing Non-XML Security Issues in XML Input Files
Applications that take XML as input typically send the data through an XML parser first. The application then accesses the parsed version of the data. If the XML data can t be parsed, the application usually doesn t access the input. For this reason, it is important to craft input that will be parsed successfully, but that input might find security issues in the application consuming the XML. For example, because XML is a tag-based format similar to HTML, sending the <img> tag in the XML input seems logical. Because XML expects a corresponding </img> tag, however, simply sending in <img> causes the XML parser to fail. For XML data to be parsed successfully, the data should be both well formed and valid.
Applications that use an XML parser that supports data streams might obtain parts of an XML document before the document is deemed well formed. For example, the Microsoft .NET Framework XmlReader class can parse XML streams. An application that requests the value of the Name element ( innerXML ) for the following XML would receive User1 . If the application continues to read the XML stream, the XmlReader class would return an error because the XML isn t well formed (the closing tag </p> should be </phone>).
<customer> <name>User1</name> <phone>425-882-8080</p> </customer>
The fact that some XML parsers allow access to the data even when it is not well formed creates situations in which an attacker s data can enter the application through the parser when there are constraints and the attacker is not able to form the XML input correctly. Other classes that do not support data streams ”for example, the XmlDocument class ”do not have this issue.
Important | The XML parser can be tested by creating malformed XML and sending it through the parser. This chapter focuses on testing scenarios where the XML is well formed and valid because most readers are probably more interested in testing their applications than they are the XML parser. |
Well-Formed XML
XML is well formed if it is syntactically correct. This means that the following points hold true:
-
The document has exactly one root element (also known as a document entity).
-
Elements must have a start and an end. Whereas some tags in HTML have only a begin tag (such as <img>), XML must contain a begin tag and an end tag. For example, <tester>Tom</tester> is correct. XML tags also can contain the start and end in the same tag. For example, <br />. Note that XML tag names are case-sensitive.
-
Elements must be nested properly. Unlike HTML, XML isn t forgiving . <center><b> Test</center></b> would be rendered correctly as HTML, but would be rejected by an XML parser.
-
Attributes must be quoted. Attributes of a tag must be enclosed in quotation marks. For example: <tester name= Tom /> is correct, but <tester name=Tom /> is incorrect.
Valid XML
XML authors can apply a set of constraints that are used when parsing the XML data known as a schema. There are several different ways to specify a schema, including Document Type Definition (DTD), XML Schema Definition (XSD), and RELAX NG.
The following XSD specifies that the validated XML contains an element with an attribute named id that is exactly 10 digits long (specified in line 5). This element should contain children elements named CATEGORY and DESCRIPTION whose values are strings (specified in lines 11 and 12).
<?xml version="1.0" encoding="UTF-8" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:simpleType name="testCaseIDType"> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{10}"/> </xs:restriction> </xs:simpleType> <xs:element name="TESTCASE"> <xs:complexType> <xs:sequence> <xs:element name="CATEGORY" type="xs:string"/> <xs:element name="DESCRIPTION" type="xs:string"/> </xs:sequence> <xs:attribute name="id" type="testCaseIDType" use="required"/> </xs:complexType> </xs:element></xs:schema>
Important | Programmers can perform high-level validation of data by using an XML schema. If you use a schema, the possibility that malicious or malformed input will make it through the parser and into the application is greatly reduced, making it easier to secure the application. |
Including Nonalphanumeric Data in XML Input
You often need to include nonalphanumeric data when testing an application that accepts XML input. For example, to test an application for script injection, you frequently must include HTML tags and quotations marks. (Script injection attacks are discussed in depth in Chapter 10, HTML Scripting Attacks. ) However, HTML tags included as part of XML data often cause the parser to fail (not well-formed XML). The following sections discuss how to include arbitrary data in XML data.
CDATA
A CDATA section begins with <![CDATA[ followed by free-form unescaped character data. The section is ended with ]]> . The data within the CDATA section is not interpreted by the parser. Consider the case in which the attacker specifies the name of the car in XML data <CAR color = purple > Car Name </CAR> , and the car s name (text between the <CAR> and </CAR> tags) is stored and later displayed as HTML; a script injection attack should be attempted. The following XML causes the parser to fail:
<?xml version="1.0" encoding="UTF-8"?> <CAR color="purple"><IMG SRC="javascript:alert(document.domain)">Car Name</CAR>
The problem is <IMG SRC= javascript:alert(document.domain) > : it is invalid XML because it has no ending tag. The Microsoft XML parser (MSXML) displays the error End tag ˜CAR does not match the start tag ˜IMG , as shown in Figure 11-1.
A CDATA section can be used to include the image tag as shown here:
<?xml version="1.0" encoding="UTF-8"?> <CAR color="purple">Car Type<![CDATA[<IMG SRC="javascript:alert(document.domain)">]]></CAR>
Character References
Another way to include character data in XML is by using a Character Entity Reference or a Numeric Character Reference (NCR). Just as characters, such as angle brackets, could be encoded in HTML, they can be encoded in XML. Table 11-1 shows characters and their predefined character entity reference.
Character | Predefined entity representation |
---|---|
< | < |
> | > |
& | & |
' | ' |
" | " |
Arbitrary characters can be represented as numeric character references by using the characters &#x[ character s hex value ]. For example, a null character (hex 00) could be embedded in the XML data by using �.
For printing blocks of printable characters, it is easier to use a CDATA section. Character references are good for representing a few characters at a time and nonprintable characters. Character references can also be used as an attribute of a tag (where CDATA sections aren t permitted).
Tip | XML parsers understand CDATA and character references in XML data and return the decoded equivalents to the caller of the parser. For example, the value of the text attribute would be returned as "1<3" by the parser for the following XML:
<EXAMPLE text= "1 < 3" /> Programs doing additional decoding after parsing likely contain a double decoding bug. |
Testing Really Simple Syndication
Really Simple Syndication (RSS) is a feature that reads an XML document known as a feed on a Web site. RSS readers interpret and display the feed to the user . RSS feeds are commonly used to publish news, mailing lists, and Podcasts. Hidetake Jo and I (Gallagher) recently tested parts of an RSS reader written in C++. The data controlled by the attacker was the RSS feed. In addition to attacking the parser itself, we also tried quite a few other test cases. Here is a partial list (the full list is too long for this text):
-
HTML scripting attacks Many RSS readers render items in HTML. Often these HTML rendering engines support script. Sometimes this script runs in an elevated security context (example: the My Computer zone). We tried the following:
<description>Test <![CDATA[ "><SCRIPT>alert(document.location);</SCRIPT>]]></description>
This test case uses a CDATA section to attempt to close off another tag and inject the <script> tag. We also tried some similar cases using javascript protocol URLs, as discussed in Chapter 10.
-
Directory traversal One of the features of RSS is called enclosures. Enclosures are file attachments associated with an RSS item. An RSS item containing an enclosure has a URL of an item to download and store to a local directory. We tried cases in which the enclosure name attempted to escape from the enclosure directory using traversal tricks discussed in Chapter 12, Canonicalization Issues.
-
User interface spoofing We tried various cases to spoof the look and feel of the RSS reader. As discussed in Chapter 6, Spoofing, user interface (UI) spoofing cases often involve using control characters. To include these characters in the RSS feed, we tried both the character itself and the NCR version of the character. For example, attempting to insert a tab character can be done by using 	.
-
Buffer overflow We looked at the RSS reader s code to understand what the application did with each part of the RSS feed once it was returned by the XML parser. We created RSS feeds with specific fields that contained data larger than the code expected. We understood the size limitations by inspecting the code first.
-
Format strings We attempted to put format strings in the various fields of the RSS field. For more information about format strings, see Chapter 9, Format String Attacks.
These test cases help stress the importance of testing for non-XML vulnerabilities in applications that interact with XML. Although RSS feeds are XML files, all of the bugs we discovered were non-XML bugs. Our test cases had to take into account the fact that we were dealing with XML because we knew the RSS reader used an XML parser to access the RSS feed.