XML, Web Services, and the Data Revolution

Team-Fly    

 
XML, Web Services, and the Data Revolution

By Frank  P.  Coyle

Table of Contents
Chapter 3.   XML in Practice

XML is used to meet the needs of specific industries.

As Figure 3.2 shows, the first wave of XML vocabularies centered on defining structures for specific or vertical industries: applications of XML or SGML ”such as schemas, document type definitions (DTDs), namespaces, and style sheets ”were used to leverage the Web for business purposes. Many of the early applications of XML were extensions of prior development based on SGML, so that many companies working with SGML were able to get a head start in XML.

Figure 3.2. The first wave of XML applications: vertical industry data descriptions.

Companies seeking to position themselves in the global economy find XML attractive for several reasons:

  • XML provides an opportunity to establish a data standard for an organization that is tied to global standards organizations.

  • There is wide industry support for XML integration. Conversion utilities are being provided with Web browsers, databases, and operating systems, making it easier and less expensive for small- to medium- size businesses to import and export data in an XML format.

  • Data is immediately accessible via browsers through the use of style sheet technology.

In this section we'll look at just a few of the hundreds of vertical industry initiatives centered around XML. These include the Open Financial Exchange (OFX), Mortgage Industry Standards Maintenance Organization (MISMO), and the HR-XML Consortium, an initiative to standardize data for human resources with a focus on recruitment. Each brings to the table its own challenges and solutions, many still in the formative stage, but since with XML we're not constrained by lock-in to binary data representations, data descriptions can evolve with requirements.

Industry issues: XML for data exchange versus storage.

In looking at different vertical industry approaches to defining XML vocabularies, we notice that two themes recur. One is the struggle over whether to use elements or attributes to represent data. The second theme, which is more subtle, is whether to focus on defining XML for data storage across an industry or whether to concentrate on data representations for exchanging data between partners within an industry. As we'll see in the following examples, OFX and MISMO tackle the problem of XML for data exchange, while HR-XML takes on the challenge of defining XML formats for persistent data storage.

Finance: OFX

OFX uses XML to bridge the gap between brokerage databases and personal software.

One of the great ironies of the computer revolution is that it has pushed many clerical responsibilities up the corporate ladder so that many of us now find ourselves typing and editing our own documents and using financial software to track our personal finances. However, one of the real challenges of any data management system is to keep data synchronized. The OFX specification is an XML-based language that enables brokerage clients to download account information directly into their accounting or tax-preparation software, such as Quicken or TurboTax. OFX also supports the exchange of financial information among financial services companies, their technology outsourcers, and consumers using Web- and PC-based software.

As in any effort to define an industry standard, consensus is required. OFX is an open consortium created by CheckFree, Intuit, and Microsoft in early 1997, and it now has the support of over 1,000 financial institutions, technology solution providers, and payroll companies. Major financial players in the OFX initiative include Prudential, TD Waterhouse Group , Inc., and T. Rowe Price. OFX supports a range of financial activities, including consumer and small-business banking, consumer and small-business bill payment, bill presentment, and investment download and tracking, including stocks, bonds , and mutual funds.

As Figure 3.3 shows, OFX enables the downloading of brokerage information to a user 's PC. Downloads can go directly into Web and PC tax software and may include information from 401(k), 1099, and W2 tax forms. OFX also allows consumers to pay bills directly over the Web.

Figure 3.3. OFX enables brokerage clients to download account information directly into tax preparation software.

As is often the case in broad-based initiatives of this sort , the umbrella consortium allows its members (financial services companies) to enhance application capability by adding new XML content, giving them an opportunity to support value-added features and help position themselves in a competitive marketplace .

The focus of the OFX XML vocabulary has been on data exchange, not data storage. OFX makes no recommendation about how data should be represented in the permanent data stores of participants . The important objective for OFX is to define the data formats for moving data from one platform to another.

OFX has taken a strong stance in the elements versus attributes controversy, coming down strongly in favor of elements. The DTD for the financial data exchange defines over 450 elements and no attributes. The DTD does, however, make extensive use of entities, XML shortcut abbreviations that make the DTD and XML documents themselves more readable.

Human Resources and HR-XML

HR-XML defines a common vocabulary for storing human resources data.

The hiring and employee management done by human resources departments are data intensive . HR-XML is a nonprofit consortium dedicated to enabling an XML-based e-commerce and human re sources data interchange format. The objective is to spare employers and vendors the risk and expense of having to agree upon and implement an ad hoc data exchange mechanism. By developing and publishing an XML representation for HR data, it will be easier for any company to do business with other companies without having to implement a one-of-a-kind interchange mechanism. HR-XML's current work focuses on standards for staffing and recruiting, benefits enrollment, payroll, competencies, and workforce management.

For any organization attempting to define an XML data representation, it's important to create consensus among stakeholders. HR-XML includes a group called the Cross-Process Objects (CPO) Workgroup with three related roles within the HR-XML Consortium:

  • Developing a common HR vocabulary and model for the consortium

  • Developing schemas for common HR objects used across the consortium's domain-specific workgroups (Recruiting and Staffing, Benefits Enrollment, Payroll, and so on)

  • Reviewing the specifications produced by other HR-XML work groups for appropriate use of common HR objects

The CPO oversees teams that work on models and schemas for common HR objects. Driving the CPO effort is the fact that XML-HR specifically targets XML for data storage, not B2B transactional data. The usage scenario is one in which r sum s will be written to a server as XML files. A program will load information from these files to a system of distributed databases where an intranet-based query program will allow precise skills matches against the databases.

What's a Person?

XML data definitions need to accommodate the requirements of different users.

One of the challenges confronted by the HR-XML Consortium has been to arrive at a consensus on what it means to be a Person (at least as far as hiring managers are concerned ). While Person is often used in introductory XML examples, it's not trivial to define, given the requirement that any definition should be capable of global use in a consistent manner. The definition of a Person schema for HR-XML includes a number of requirements:

  • Must be able to handle various name formats without significant overhead

  • Syntax must be self-documenting

  • Must take cultural context into account. Cultural context drives the sort order for a name. It also determines how the various parts of the name are put together to form the whole name.

  • Should be able to handle effective dating

  • Must be able to handle multiple purposes or contexts of the name (Employee, Supervisor, Dependent, Beneficiary, and so forth)

HR-XML uses elements for data and attributes for metadata.

The result of the effort was the DTD shown in Listing 3.1. It's included here because it illustrates a short, readable design that mixes elements and attributes according to the well-respected schema design principle, "Use elements to represent domain data; use attributes for metadata." For example, the element FormattedName , which is used to describe the full name as it will appear on some document, includes an attribute called type , which is intended to describe the kind of presentation it indicates, for example, a legal form of the name, a form suitable for sorting, or just a default presentation that might be used to address an envelope. Similarly, the element Affix , intended to allow a title of some kind to be included with a name, is supported by an attribute that adds information about the affix, such as whether it represents an academic rank (Professor), an aristocratic title (Lord), or a military title (Colonel).

Listing 3.1 A DTD for the Person Element in the HR-XML Definition for Human Resources Applications [1]

<!ELEMENT PersonName (FormattedName* , LegalName? , GivenName* , PreferredGivenName? , MiddleName? , FamilyName* , Affix*)> <!ELEMENT FormattedName (#PCDATA)> <!ATTLIST FormattedName type (presentation legal sortOrder ) 'presentation' > <!ELEMENT LegalName (#PCDATA)> <!ELEMENT GivenName (#PCDATA)> <!ELEMENT PreferredGivenName (#PCDATA)> <!ELEMENT MiddleName (#PCDATA)> <!ELEMENT FamilyName (#PCDATA)> <!ATTLIST FamilyName primary (true false undefined ) 'undefined' prefix CDATA #IMPLIED > <!ELEMENT Affix (#PCDATA)> <!ATTLIST Affix type (academicGrade aristocraticPrefix aristocraticTitle familyNamePrefix familyNameSuffix formOfAddress generation qualification ) #REQUIRED >

[1] Copyright, The HR-XML Consortium. All Rights Reserved. http://www.hr-xml.org.

Examples of XML that satisfy the DTD in Listing 3-1 include an XML PersonName for Major John Smith:

<PersonName> <GivenName>John</GivenName> <FamilyName>Smith</FamilyName> <FormattedName>John Smith</FormattedName> <Affix type="formOfAddress">Major</Affix> </PersonName>

and for Mrs. Jane H. Doe:

<PersonName> <GivenName>Jane</GivenName> <MiddleName>H.</MiddleName> <FamilyName>Doe</FamilyName> <Affix type="formOfAddress">Mrs.</Affix> </PersonName>

Mortgage Banking: MISMO

MISMO's XML definitions focus on data transfer.

Just about everyone who purchases a home acquires a mortgage loan. Mortgage loans are available through lending institutions such as banks and mortgage companies that supply the cash to buy the home. In order for lending institutions to continue to have money to deliver to borrowers, the loans are sold to companies such as Fannie Mae and Freddie Mac and packaged as mortgage-backed securities. This is big business. Over $378 billion in mortgage- backed securities were issued by Fannie Mae and Freddie Mac in 2000, a statistic which indicates the importance of the transfer of data between lending institutions and Fannie and Freddie.

In 1999 a group of industry representatives formed MISMO and in 2000 began to address electronic commerce issues in the mortgage industry. Their objective was to define an XML schema in the form of DTDs that could be used as the basis for data exchange within the industry.

In formulating an XML schema, MISMO has been very explicit about what they are working to standardize. Unlike HR-XML, they are not trying to come up with formats for storing long- term data but are only attempting to standardize loan data as it moves between two organizations at some point in time. Of course, companies are free to archive data as it moves between servers, but the intent is only to describe the data that is needed to carry out the B2B transactions between lenders and Fannie Mae and Freddie Mac.

MISMO's industry acceptance is based on collaboration and consensus.

In developing a schema for B2B data interchange it's important to establish consensus, which depends on communication. Along the path to standardization, MISMO released a draft version of its dictionary of common data items for review, focusing on information associated with mortgage loan applications. MISMO uses a centralized Web-based repository to provide a single location for managing data elements and generating XML document definitions. The DTD that has evolved is extensible so that other underwriting organizations can use it and add to it with additional data they may need for their own particular transactions. As Listing 3.2 shows, the DTD is designed around a top-level definition so that parts can be reused in other loan-related transactions.

Listing 3.2 A Portion of the DTD for MISMO

LOAN_APPLICATION ( _DATA_INFORMATION? , ADDITIONAL_CASE_DATA?, AFFORDABLE_LENDING?, ASSET*, DOWN_PAYMENT*, GOVERNMENT_LOAN?, INTERVIEWER_INFORMATION?, LIABILITY*, LOAN_PRODUCT_DATA?, LOAN_PURPOSE?, LOAN_QUALIFICATION?, MORTGAGE_TERMS?, PROPERTY?, PROPOSED_HOUSING_EXPENSE*, REO_PROPERTY*, TITLE_HOLDER*, TRANSACTION_DETAIL?, BORROWER+ )> DOWN_PAYMENT _Type ( BridgeLoan CashOnHand CheckingSavings DepositOnSalesContract EquityOnPendingSale EquityOnSoldProperty EquityOnSubjectProperty GiftFunds LifeInsuranceCashValue LotEquity OtherTypeOfDownPayment RentWithOptionToPurchase RetirementFunds SaleOfChattel SecuredBorrowedFunds StocksAndBonds SweatEquity TradeEquity TrustFunds UnsecuredBorrowedFunds ) #IMPLIED>

Tracking XML Standards

OASIS is an organization that tracks and promotes XML standards.

The explosion of XML vocabularies has led to a need for a central repository to track the various XML initiatives. The Organization for the Advancement of Structured Information Standards (OASIS), is a nonprofit international consortium that creates interoperable industry specifications based on public XML and SGML standards. One aspect of the OASIS mission is to develop vertical industry applications, conformance tests, and interoperability specifications that make vertical standards usable. Table 3.1 lists various areas for which there are XML initiatives.

OASIS does not compete with but rather builds upon and supplements the work done by standards bodies such as W3C (for XML) or ISO (for SGML). OASIS's technical work generally falls into one of the following categories:

  • Vertical industry applications: Development of applications of XML or SGML, such as schemas, DTDs, namespaces, style sheets, and so forth that may be used in specific vertical industries

  • Horizontal and e-business framework: Development of specifications defining how to build systems for the electronic exchange of business information

  • Interoperability: Development of specifications and standards that define how other standards will work together, or how earlier, non-XML standards can work in an XML world

  • Conformance testing: Development of test scenarios and cases that can determine what it means to conform to specific standards; for example, what does it really mean to "be XML"?

OASIS maintains directories of industry-specific vocabularies.

In keeping with the spirit of the Web and open standards, OASIShas adopted a Technical Committee Process that governs its technical work and provides a vendor-neutral home for standards, giving all interested parties, regardless of their standing in a specific industry, an equal voice in the creation of technical work.

Table 3.1. Some Vertical Industry XML Dialects Registered at OASIS

Accounting Education Professional Service
Advertising Energy/Utilities Publishing/Print
Aerospace Environmental Real Estate
Agriculture ERP Retail
Arts/Entertainment Financial Services Robotics/AI
Automotive Food Services Science
Banking Healthcare Software
Business Services Human Resources Supply Chain
Chemistry Insurance Transportation
Construction Legal Travel
Customer Relation Manufacturing Waste Management
E-Commerce Marketing/PR Weather
Economics Mining  
EDI Multimedia

Team-Fly    
Top

Категории