Deriving Meaning from Data

Reporting is an important component in almost every database project. Indeed, the need to create reports that summarize or synthesize data is often the reason many databases exist in the first place. No matter what your database does, it's a fair bet that you have many reporting needs.

Reports come in many shapes and sizes: There are simple list reports, summarized reports, workflow reports, cross-tabulated reports, variance reports, and graphic reports (to name but a few). There are standard reports that need to be generated periodically; there are ad hoc reports for which the report criteria need to be defined on the fly. Some reports need to be printed and distributed, whereas others are meant to be viewed onscreen.

Despite the wide range of things that can be classified as reports, most reports tend to have a few characteristics in common:

To generate meaningful reports, you should learn several standard reporting techniques. From there, it's just a matter of coming up with variations that suit your particular needs. This chapter covers working with lists of data and reporting with grouped data (also known as subsummary reports).

Begin with the End in Mind

In our experience, one of the keys to creating successful reports is beginning with the end in mind. By this, we mean that you should begin thinking about the reports that a system will need to generate right at the beginning of a project. A system's intended outputs can have a profound impact on its design and implementation.

A few simple illustrations will help clarify this point. Say that you've been asked to design a contact management system, and the client mentions that he wants the system to be able to track a history of conversations and interactions with each of his clients. From a data-entry standpoint, you could create this sort of functionality either by having a single long Notes field in the Contact table, or by setting up a related Contact History table and using a portal to capture information about each interaction separately.

All other considerations aside (for example, time/cost to implement, rules of normalization), the reporting needs of the client will likely influence your decision about how to implement this feature. If you don't ask the right questions up front, you may find out in two months that the client expects to be able to generate a report of call activity summarized by account representative, or sorted by date. This report would be relatively simple to generate if you had chosen the route of the related Contact History table; it would be virtually impossible to generate from a single undifferentiated Notes field.

Another typical example of reporting needs driving feature implementation is the decision to use check boxes to capture data. Check boxes are fantastic from a data-entry standpoint, but they may limit your reporting capabilities because they store multiple pieces of data in the same field. For instance, imagine that in the contact management system you're building, the client asks you to put a check box field on the layout so that users can select one or more sources for the contact (such as Referral, Conference Attendee, Website).

If you know that one of the reports the client wants is a Contact Source Summary that lists the various sources and the total number of contacts each has generated, one would hope that you wouldn't implement the feature as a check box field. With potentially multiple sources selected for a given contact, it's not straightforward to split the selections apart. Instead, if you set up a Source table and a ContactSource table (as a join table between Contact and Source), you could display and maintain the contact's sources via a portal from Contact into ContactSource. The desired report could easily be generated from the ContactSource table. The point here is simply that the reporting requirement informs both the table structure and the user interface.

In both of the preceding examples, choosing a more fully normalized data structure happened to provide the more robust reporting capabilities, but there are certainly just as many occasions when you'll find that opting for a denormalized data structure makes for better, faster reporting. For example, imagine you're a teacher creating a system that will track the scores on eight quizzes you plan to give to your class. You could set up a Student table and a related Quiz table (which would eventually contain eight related records for each student record), or you could just create a Student table with eight QuizScore fields. If one of your goals was to get a spreadsheet-like report that listed students down the side and quizzes going across, you'd have a significantly easier time using a flat file than you would if quiz scores were broken out into their own table. Of course, the flat file approach is not without its own problems and limitations. By hard-coding the number of quizzes, you restrict possible future expansion. Similarly, a seemingly simple ad hoc question like "Which students scored a perfect 100 on at least one of the quizzes?" would be difficult to answer with the scores spread across eight fields.

The point of both of these examples is simply to demonstrate what we mean by "begin with the end in mind." Over the course of this chapter, you'll learn more about these types of reports. For now, what's important is that as you define the requirements for any database system, you think carefully about the reporting needs. If you don't, you may end up having to rewrite sections of your system and/or create more complex reporting routines later.

Determine Report Requirements

Just as a system's reporting requirements influence its design, an organization's business needs influence the design of the reports themselves. When thinking about how you'll go about generating any given report, ask yourself (or your client/users) the following types of questions:

After you've collected answers to questions like these, we strongly recommend writing out a sample report (using whatever tools you choosepencil and paper and whiteboards are our favorites) and showing it to its appropriate consumers for feedback.

Generic Versus Specific Report Structures

Another part of report planning is determining whether the report is to meet a specific or a generic need. That is, should users be able to select a data set to feed into a report shell, or should the search criteria for the report be hard-coded?

For example, say you have a List view layout that displays customer data. If you feed it a found set of customers obtained since a certain date, it becomes a New Customers report. If you feed it a set of inactive customers, the same shell is transformed into an Inactive Customers report.

In instances like this, it's often helpful to think of a report as consisting of two distinct components: its format and its content. If you can create a generic multipurpose format, then simply by sending in different content, you create different reports. The point is that in planning reports, you should have the distinction between format and content in mind. You can sometimes save yourself a lot of work if you recognize when a report can be created by simply feeding new data into an existing format.

As a classic example of this separation between format and content, we had a client who wanted a 10 a.m. activity report and a 2 p.m. activity report. The reports showed the same columns or fields of information; they just contained different sets of datadifferent found sets of records. We simply created two scripted report routines that used the same layout to present their information; however, during discussion of these reports, it was clear that to the client these were two very different reports. Anytime we made a change to the 10 a.m. report, he would always remind us to be sure to change the 2 p.m. report also. Happily, as you can imagine, we never once failed to do so....

Working with Lists of Data

Категории