Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design

   

Refining the Fields

Now that you've assigned fields to each table, you'll refine the fields by improving the field names and resolving any structural problems that may exist. Then you'll refine the tables further by establishing that you've assigned the appropriate fields to each table and that the table structures are sound.

Improving the Field Names

As you know, a field represents a characteristic of the subject of the table to which it belongs. You can easily identify the characteristic a field is supposed to represent when that field has an appropriate name. A field name that is ambiguous, vague, or unclear is a sure sign of trouble and suggests that you have not thoroughly identified the purpose of the field.

Earlier in this chapter, you learned a set of guidelines for naming a table. Now you'll learn another set of guidelines that you'll apply to field names. Fortunately, many of them are similar to the guidelines governing table names, so you're already familiar with most of the concepts.

Guidelines for Creating Field Names

With these guidelines in mind, review each table and determine whether you can make improvements to any of the field names. When you're finished, you're ready to identify and resolve any problems with the fields. Figure 7.12 shows revisions to the field names of the table structures in Figure 7.11.

Figure 7.12. Revised field names.

In Figure 7.12, "Classes" is shortened to "Cls," "Subjects" is shortened to "Subj," "Instructors" is shortened to "Inst," "Student" is shortened to "Std," and "Social Security Number" replaces "SSN." Remember that abbreviations can be very useful so long as they are meaningful and understood by everyone in the organization. Using proper and appropriate abbreviations will not detract from the meaning of the field name.

Note

Throughout the remainder of the chapter and the rest of the book, table names within the text appear in all capital letters (such as VENDORS) and field names within the text appear in small capital letters (such as V ENDOR ID N UMBER ).

Using an Ideal Field to Resolve Anomalies

Although you've carefully identified the fields on your preliminary field list, you may have created a few fields that could prove problematic to the table structure. Poorly defined fields can cause duplicate data and redundant data, and they can be difficult to use. You might find it difficult to determine whether any of the fields in a table is going to cause problems unless you know the warning signs. The best way to identify potentially troublesome fields is to determine whether they comply with the Elements of the Ideal Field. These elements constitute a set of guidelines you can use to create sound field structures and to spot poorly designed fields easily.

Elements of the Ideal Field

Although you now know the specific elements of an ideal field, you'll still find it difficult in many instances to identify problematic fields just by looking at their names. Figure 7.13 shows a table structure that helps to illustrate this point. Take a moment and try to determine whether each field complies with the Elements of the Ideal Field or needs to be modified.

Figure 7.13. A table containing fields with questionable structures.

Each field on the list seems to conform to the Elements of the Ideal Field. Examine the list carefully, however, and you'll see that some fields don't really comply with the second and third elements. Three fields have anomalies that will cause problems unless you resolve them: I NST N AME , I NST A DDRESS , and C ATEGORIES T AUGHT . If you doubt this assertion, you can test it by "loading" the table with sample data. This will quickly reveal anomalies, if any exist, and is the best way to confirm whether a field complies with all of the Elements of the Ideal Field.

You don't have to create a table physically to perform this test. Take a sheet of legal paper and lay it in front of you lengthwise from left to right. Write the name of each field across the top of the paper, starting from the left-hand side; leave enough space between the field names to allow room for the values you're going to place underneath them. Then enter records into the table by filling in each field with some sample data; be sure the sample data represents the data you're actually going to enter into the database. You need only a few records for the test to work properly. Your sheet of paper should look similar to the one in Figure 7.14.

Figure 7.14. Testing a table with sample data.

Note

As I mentioned in Chapter 3, I show only those fields that are most relevant to the discussion at hand and use <<other fields>> to represent fields that are inessential to the example.

Now you can easily identify which fields are going to be troublesome unless they are resolved. As you can see, I NST N AME and I NST A DDRESS are both multipart fields, and C ATEGORIES T AUGHT is a multivalued field. You must resolve these fields before you can refine the table structure.

Resolving Multipart Fields

Working with a multipart field is difficult because its value contains two or more distinct items. It's hard to retrieve information from a multipart field, and it's hard to sort or group the records in the table by the field's value. The I NST A DDRESS field in Figure 7.14 illustrates these difficulties; you'd certainly have a problem retrieving information for the city of Seattle or sorting information by zip code.

You resolve a multipart field by identifying the distinct items within the field's value and treating each item as an individual field. Accomplish this task by asking yourself a simple question: "What specific items does this field's value represent?" Once you've answered the question and identified the items (as best you can), transform each item into a new field.

In Figure 7.14, the value of the field I NST N AME represents two items: the first name and the last name of an instructor. You resolve this field by creating a new I NST F IRST N AME field and a new I NST L AST N AME field. The value of I NST A DDRESS represents four items: the street address, city, state, and zip code of an instructor. You transform these items into fields as well; they will appear in the table as I NST S TREET A DDRESS , I NST C ITY , I NST S TATE , and I NST Z IPCODE . Figure 7.15 shows the newly revised INSTRUCTORS table.

Figure 7.15. Resolving the multipart fields in the INSTRUCTORS table.

Some multipart fields are hard to recognize. Take a look at the INSTRUMENTS table in Figure 7.16. At first glance, the table doesn't seem to contain multipart fields. When you examine the data in the table more closely, however, you'll see that I NSTRUMENT ID is actually a multipart field. This field's value represents two distinct items: the category to which the instrument belongsAMP ( amplifier ), GUIT (guitar), MFX (multieffects unit), SFX (single-effect unit)and the instrument's identification number. Clearly, you should deconstruct I NSTRUMENT ID into two smaller fields in accordance with the third element of an ideal field. Imagine how difficult it would be for you to update the field's value if the MFX category changed to MFU if you don't do this. You would have to write programming code to parse the value, test for the existence of MFX, and then replace it with MFU if it existed within the parsed value. It's not so much that you can't do this, but you would definitely be working harder than necessary, and you shouldn't have to go through this at all if you have a properly designed database.

Figure 7.16. An example of a "hidden" multipart field.

Resolving Multivalued Fields

As you know, a multivalued field can potentially store two or more occurrences of the same value. Fortunately, you'll recognize a multivalued field when you see one. The field's name is often plural and its value almost invariably contains a number of commas, which serve to separate the various occurrences that exist within the value itself.

Resolving multipart fields is not very hard at all, but resolving multivalued fields can be a little more difficult and will take some work. A multivalued field has the same fundamental set of problems as a multipart field, as the C ATEGORIES T AUGHT field in Figure 7.17 clearly illustrates. For example, you'll have difficultly retrieving information for everyone who teaches a specific category (such as WP), you can't sort the data in any meaningful fashion, and, most important, you don't have room to enter more than four categories. What happens when one or more instructors teach five categories? The only option you'll have is to make the field larger every time you need to enter more values than it will currently allow.

Figure 7.17. Identifying a multivalued field.

So how would you resolve this multivalued field? Your first thought may be to create a new field for each value, thus "flattening" the multivalued field into several single-valued fields. Figure 7.18 shows what will happen if you follow through with this idea.

Figure 7.18. The result of "flattening" the C ATEGORIES T AUGHT field.

Unfortunately, this is not much of an improvement at all. There are three specific problems that arise from this type of structure:

  1. Retrieving category information will be tedious at best. A user attempting to find all instructors who teach the WP category must be sure to search for this value within each of the category fieldsthere is no guarantee that WP is consistently stored in the same field. Failure to do so means that the user runs the risk of overlooking a qualified instructor.

  2. There is no way for the RDBMS program to sort the category data in a meaningful fashion .

  3. This structure is inherently volatile. In its current state, the table unnecessarily restricts the number of categories an instructor can teach; you must create additional category fields when you have instructors who teach more than three categories. Adding more category fields just compounds the first two problems.

Realizing that flattening the C ATEGORIES T AUGHT field won't solve your problem, your next thought is to bring the field into compliance with the second element of an ideal field and declare that it will contain only a single value. Although this is a good impulse and a step in the right direction, it will not resolve the matter completely because it will introduce yet another problem: data redundancy. Figure 7.19 illustrates what happens when you follow through with this particular idea. Note that there is now a single value in the C ATEGORIES T AUGHT field for each record in the table.

Figure 7.19. The result of bringing C ATEGORIES T AUGHT into compliance with the second element of an ideal field.

The values in C ATEGORIES T AUGHT cause redundant data because you must duplicate a given instructor record for each category that the instructor teaches. This redundancy is obviously unacceptable, so you'll have to resolve this problem in some manner.

You can avoid this situation entirely by using these steps to resolve a multivalued field:

  1. Remove the field from the table and use it as the basis for a new table. If necessary, rename the field in accordance with the field name guidelines that you learned earlier in this chapter.

  2. Use a field (or set of fields) from the original table to relate the original table to the new table; try to select fields that represent the subject of the table as closely as possible. The field(s) you choose will appear in both tables. (You'll learn more about relating tables in Chapter 10.)

  3. Assign an appropriate name, type, and description to the new table and add it to the final table list.

These steps form a generic procedure that you can use to resolve any multivalued field you encounter in a table. Now, apply these steps to the C ATEGORIES T AUGHT field.

  1. Remove the field from the INSTRUCTORS table and use it as the basis of a new table. Because this will now be a single -valued field, rename the field C ATEGORY T AUGHT .

  2. Use I NST F IRST N AME and I NST L AST N AME as the connecting fields that will relate the INSTRUCTORS table to the new table, and add them to the structure of the new table.

  3. Give the new table a proper name, compose a suitable description, and add the table to the final table list. ( Indicate the table's type as "Data.") Here's one possible name and description you might use for the new table.

    Instructor Categories the categories of software programs that an instructor is qualified to teach. The information this table provides allows us to make certain that there is an adequate number of instructors for each software category.

Figure 7.20 shows the revised INSTRUCTORS table and the new INSTRUCTOR CATEGORIES table.

Figure 7.20. Resolving the multivalued field in the INSTRUCTORS table.

Note that the new INSTRUCTOR CATEGORIES table is free from the problems typically associated with multivalued fields because C ATEGORY T AUGHT is a single-value field. You can easily retrieve information for a particular instructor or category, and you can sort the records in a meaningful manner. Also note that the I NST F IRST N AME and I NST L AST N AME fields retain their names in the new table, making them compliant with the fifth element of an ideal field.

Although the new table contains redundant data, the redundancy is acceptable because it is minimal . It's a fact of life that a relational data base will always contain some amount of redundant data. Your goal as the database architect is to make certain that it has only an absolute minimum amount of redundant data.

Figure 7.21 shows a version of the INSTRUCTORS table that contains three multivalued fields:

C ATEGORIES T AUGHT This indicates the categories of classes that an instructor can teach.

M AXIMUM L EVEL T AUGHT This indicates the maximum skill level that the instructor can teach for a given category.

L ANGUAGES S POKEN This indicates the foreign languages that an instructor can speak.

Figure 7.21. A version of the INSTRUCTORS table containing three multivalued fields.

Your task here seems relatively clearyou're going to use the procedure you've just learned to resolve these multivalued fields. You then notice one small, relatively obscure problem: There is a distinct one-to-one association between values in C ATEGORIES T AUGHT and the values in M AXIMUM L EVEL T AUGHT for any given record. You probably wouldn't have noticed this anomaly had you not carefully examined the sample data within these fields. Don't worry; you'll still use the same procedure, but with one minor modification.

You'll occasionally encounter a situation such as this, where some given field (whether single- or multivalued) depends on a particular multivalued field. You can easily fix this problem by including the dependent field in the structure of the new table you build to resolve the multivalued field. Figure 7.22 shows the results of consolidating this technique with the previous one to resolve C ATEGORIES T AUGHT . (It shows the resolution of L ANGUAGES S POKEN as well.)

Figure 7.22. Resolving the multipart fields in the INSTRUCTORS table.

The redundancy in the new tables is acceptable because, once again, it is minimal. In Chapter 10, you'll learn how to reduce this type of redundancy even further by relating the tables with primary keys and foreign keys.


   
Top

Категории