Accessing and Analyzing Data with Microsoft Excel (Bpg-Other)

Most data analysis software applications are designed to work best with a specific data format. As data proliferates inside organizations, and organizations that use different software applications need to share data with each other, the need to present and share data without regard to proprietary data formats has become more urgent.

Extensible Markup Language (XML) is a data format that’s independent of any particular software application or software manufacturer. Because no single company owns the XML data format, and because the XML data format is so flexible, more software manufacturers are supporting XML in their applications and extending existing application features to work with XML data.

The growing support for XML and its flexibility does not necessarily mean that you should switch all your business data to the XML format. Before you make the move to XML, you should consider whether the data you have already needs to be restructured. You should also determine whether your data analysis software applications can handle data in XML format. The applications covered in this book can handle XML formatted data in some way.

A full treatment of XML is beyond the scope of this book. However, you need to know a few simple XML concepts before you can analyze XML data:

Elements and Attributes

Elements are the building blocks of XML data. Elements have names, may contain data values, and may be related to other elements. Elements can also carry descriptive information called attributes. Elements and attributes are described by using a special series of symbols. These symbols specify where the information within an element or attribute starts and ends. For example, the firstname element in the example below has a data value of Paul. The address element contains no data value, but it does have an attribute, city.

<firstname>Paul</firstname> <address city=”Redmond”/>

Attributes are the properties of an element. Just as the properties of an automobile include the type of transmission (automatic or manual), its body paint (red, for example), and the number of wheels (4), an element can have attributes such as a unique identifier. For instance, the following receipt element has an attribute named id, and the id attribute’s value is F0123:

<receipt />

Namespaces

Namespaces in XML differentiate elements with the same names. For example, the following two elements are different (even though each is named month) because they are prefaced with a different namespace. The first element is my:month, and the second element is your:month.

<my:month xmlns:my=”http://www.microsoft.com/myNamespace/”>January</month> <your:month xmlns:your=”http://www.microsoft.com/yourNamespace/”>January </month>

Note

Although these attribute values look like Internet addresses, these addresses do not refer to actual Web sites. Although namespaces can be just about any series of characters, as a rule of thumb an organization should use their main Web site address followed by some unique series of characters that the organization decides on or assigns among its members.

Your Turn

In this exercise, you will practice interpreting XML data in a file.

  1. Start Microsoft Internet Explorer.

  2. On the File menu, click Open.

  3. In the Open dialog box, click the Browse button.

  4. In the Files Of Type list, select All Files.

  5. Locate and click the Sale2001.xml file in the Chap02 folder, and then click Open. When the Open dialog box reappears, click Open.

  6. Locate the month with the highest sales total. The correct answer is November, as you can see in Figure 2-17.

    Figure 2-17: An XML data file.

Although data in XML files can look more dense than data in electronic spreadsheets, once you know how to interpret XML files, they are not too difficult to use. You will learn more about how to work with XML data in Chapter 10.

Категории