Working with Entities and Attributes

When you set out to design a database system, there are two concepts you simply must be familiar with before you can say you have a solid planning foundation. You need to know the types of things your system will track, and you need to know the characteristics of each of those things. In a recipe list, for example, you track one kind of thing: recipes. A recipe's characteristics are, for example, recipe name, recipe type, calories, ingredients, and cooking time. You could draw it out like this:

Recipe

Here is one thing followed by a collection of its characteristics. A bigger database system may store information about several kinds of things, each with its own set of characteristics. For example, if I want to write a database system for a motorcycle company, I might want to track information about motorcycles, customers, and sales. Now I have three kinds of things, each with its own set of characteristics.

In database design terminology, the things in your database system are called entities. Each entity is a specific, distinct kind of thing, about which you need to track information. This system tracks data about three distinct kinds of things. And each kind of thing has certain characteristics, which in the technical jargon are called attributes. The motorcycle example includes three entities, and each has some specific number of attributes (see Table 5.1).

Table 5.1. Simple Analysis of a Database Structure

Motorcycle

Customer

Sale

Model Number

First Name

Customer Name

Model Year

Last Name

Date

Vehicle ID Number

Birth Date

Amount

Factory Serial Number

Street Address

 

Accessories

City

 

Manufacturer

State

 

Model Name

ZIP

 

The first indispensable step in solid database design is to determine what entities (things) your proposed system needs to track, and what the attributes (characteristics) of each entity are. It's not just the first step, thoughit's also the third, fifth, seventh, and so forth. Your list of things and their characteristics will inevitably change during your analysis, sometimes quite frequently. This is not a bad thing. It's a natural part of database design. You'll inevitably revisit and refine your list of entities and their attributes several times in the course of designing the system.

Roughly speaking, an entity is a class of things that all look more or less alike. In other words, from a database standpoint, you track many instances of an entity, and you track the same kind of information about each one. In a banking system, you'd probably have an entity called Customer because a banking database wants to keep track of many different customers, and wants to record roughly the same kinds of data about each one. (You'll always want to know a customer's birth date, Social Security number, home address, and the like.)

Attributes, on the other hand, refer to the kinds of information you track about each entity. If Customer is an entity in our banking database, birth date, home address, and Social Security number are among the attributes of a customer.

It won't surprise you to learn that entities often correspond to actual database tables, and attributes often correspond to database fields. More likely than not, a banking database will have a Customer table with fields for date of birth, address, and Social Security number.

Note

The entities in these diagrams are purely abstract things. They may or may not translate directly into database tables. Your FileMaker solution may (and almost certainly will) end up with tables that aren't represented on your design diagram.

It's fairly easy to represent entities and attributes in the graphical notation of an ERD. Sometimes it's more convenient to draw an entity without showing any of its attributes, in which case you can draw it in a simple box, as shown in Figure 5.1.

Figure 5.1. A simple preliminary ERD showing entities for customers and accounts, with no attributes shown.

Sometimes it's appropriate to show entities with some or all of their attributes, in which case you can add the attributes as shown in Figure 5.2.

Figure 5.2. An ERD showing entities for customers and accounts, with attributes shown.

 

Entities Versus Attributes: A Case Study

The focus of this chapter is in taking descriptions of real-world problems and turning them into usable ERDs. As was noted earlier, your first step in trying to model a problem into an ERD is sorting out the entities from the attributes. To see how to tackle this, let's begin with an example of a simple process description:

Maurizio's Fish Shack is ready to go digital. Maurizio sells fish out of his storefront but he's not worried about electronically recording his sales to consumers just yet. He just wants to keep track of all the fish he buys wholesale. Every time he buys a load of fish, he wants to know the kind, the quantity, the cost of the purchase, and the vendor he bought it from. This will give him a better handle on how much he's buying and from whom, and may help him negotiate some volume discounts.

Now you know the basics of Maurizio's business. Next you need to develop a list of potential entities. Here are some possibilities:

Fish

Load of fish

Purchase

Storefront

Variety

Vendor

Sale

Quantity

Volume discount

Consumer

Cost

 

Tip

Usually the rule of thumb to apply when coming up with a list of possible entities is to pull out every word that's a noun; in other words, every word that represents a specific thing.

These are typically referred to as candidate entities, in that they all represent possible entities in the system. But are they all entities? You can immediately cross "storefront," "sale," and "consumer" off the list, for the simple reason that the process description already says that these are parts of his business that Maurizio doesn't want to automate at this time. That leaves us with the following potential entities:

Fish

Cost

Load of fish

Purchase

Variety

Vendor

Quantity

Volume discount

Well, "fish" and "load of fish" look like they refer to the same thing. According to the process description, a load of fish is actually a quantity of fish that Maurizio bought to resell. Put in those terms, it's clearly the same thing as a purchase. Now the list looks like this:

Purchase (of fish)

Quantity

Vendor

Variety

Cost

Volume discount

These all seem like reasonable things to track in a database system. But are they all entities? Remember that an entity is a kind of thing. The thing will probably appear many times in a database, and the system will always track a coherent set of information about the thing. Put that way, a purchase of fish sounds like an entity. You'll record information about many fish purchases in Maurizio's database.

What about something like "cost"? The "cost" in the process description refers to the price Maurizio paid for a load of fish, so cost isn't really an entity. It's the price paid for one load of fish. It's actually a piece of information about a fish purchase because each fish purchase has an associated cost. The same is also true for "variety" and "quantity." These are all attributes of the "purchase" entity.

Then you get to "vendor." A vendor is clearly a category of thing; you'll probably want to store information about many vendors in this database, so you can consider a vendor to be an entity. This leaves "volume discount." Well, that one's a bit tricky. It probably applies to a vendor, and might reasonably be called an attribute of a vendor. If you assume that each vendor may offer a discount of some kind, it makes sense for it to be an attribute of a vendor.

Figure 5.3 shows what the fledgling ERD for this system might look like, with the two entities from the process description and their various attributes.

Figure 5.3. An ERD showing entities for fish purchases and vendors, with attributes shown.

A few things are noteworthy about this diagram. Notice that the entities are called Purchase and Vendor, instead of Purchases and Vendors. When naming entities, it's preferable to name them in the singular, rather than the plural. (You're trying to answer the question "each instance of this thing is a....") In FileMaker, we usually extend that convention to the database table that ends up being built for each entity.

Design as an Iterative Process

Your general task when designing a database (or indeed any piece of software) is to take a set of things in a real-world problem domain and translate them into corresponding things in the software domain. In your software, you create a simplified model of reality. Concepts like "fish purchase" and "fish vendor" in the problem domain turn into concepts like "purchase entity" and "vendor entity" in a design, and may ultimately turn into things like "purchase table" and "vendor table" in the finished database.

But this translation (from problem domain to software) is not a one-way street. It's rare that there's a single, unambiguous software model that corresponds perfectly to a real-world problem. Usually, your software constructs are approximations of the real world, and how you arrive at those approximations depends a lot on the goal toward which you're working.

In general, software design follows an iterative path, meaning you perform a similar set of steps over and over again until you end up with something that's "close enough." For example, in your initial reading of the design problem, you might miss an entity or two. Or you might create entities you don't really need on later examination. Later, as you do more work on the project and learn more about the problem domain, you may revise your understanding of the model. Some entities might disappear and become attributes of other entities. Or some attributes might turn out to be entities in their own right. You might find it's possible to combine two similar entities into one. Or you might find out that one entity really needs to be split in two. We're not trying to make you feel uncertain or hesitant about your design decisions. Just recognize that it's not imperative, or necessarily even possible, to get the design exactly right the first time. You'll revisit your design assumptions frequently over the course of the design process, and this is a natural part of the process.

Категории