As we mentioned in Chapter 1, "Introduction to Visual Basic .NET," the first step in creating any program is talking to the users to determine the program requirements, or functional specifications. In the case of a database application, you will also need to determine what data the program needs to maintain to organize it into a database. For example, knowing that you have to produce directories and mailing lists tells you that the database needs to contain addresses and phone numbers. For more Design Tips, p.11 In this section, you examine many of the considerations involved in designing databases. You can apply these concepts to any type of relational database, not just those that you may be designing to use in your Visual Basic .NET applications. Like most tasks, designing a database starts with a plan. After all, you wouldn't try to build a house without a blueprint, and most people wouldn't attempt to prepare a new dish without a recipe. Designing a database starts with identifying the information you need to store in it. You then create a model of the database on paper to organize it using tables, fields, and relationships. Finally, you implement your design by actually creating the database in a database management system. The most important thing to keep in mind when designing your database is that it provides an accurate representation of the data, regardless of how you will access it with a program. A good database design does the following: Eliminates redundant data Provides minimum search times when locating specific records Stores data in the most efficient manner possible to keep the database from growing too large Makes data updates and maintenance as easy as possible Is flexible enough to allow inclusion of new functions and enhancements required of the program In the case of a relational database, you will perform the following key activities when creating the database: Organizing the data into tables Setting index and validation requirements for the data Creating and storing any necessary queries for the application Reviewing the design Organizing Data Into Tables One of the key aspects of database design is determining how the data will be organized in the database. In a relational database, data is stored in tables, so database organization involves determining how many tables your database will have, and what fields will be stored in the tables. To have a good design, you should organize the data in a way that accurately represents the relationships between the different pieces of information. You should try to be as efficient as possible, eliminating wasted space and repeated information. In addition, use keys to enforce data integrity. In this section we will take a brief look at some important issues that will come up when designing tables. Tables As Topics A table is a collection of information related to a particular topic. By thinking of a key topic for the table, you can determine whether a particular piece of data fits into the table. For example, if a country club wants to track information about members and employees, you might be tempted to put both in the same table (because both groups refer to people). However, consider the different data required for each group. The employee group requires information about payroll and tax status, whereas the membership group may require a membership expiration date. If these groups of information were combined, each record would contain a lot of wasted space, as shown in Figure 20.2. Figure 20.2. If you have a lot of wasted space in your table, you may need to consider breaking it up into multiple tables. Looking at the data more closely, you can see that the two groups of people (employees and members) have some fields in common because they are both people. Figure 20.3 shows an improved design, in which separate tables are used to store member-specific and employee-specific information. Figure 20.3. A more efficient design eliminates wasted space. With the design in Figure 20.3, the shared information has been moved into a new Person table, which can be joined to the Member or Employee table using the SSN field. This new design provides several advantages: Wasted space is eliminated. Because database records are physically smaller, queries will execute more rapidly. A person can be both an employee and a member. A mailing list can be created for all persons. By thinking of a topic for each table, you can determine more easily whether a particular piece of information belongs in the table. If the information results in wasted space for many records, the data belongs in a different table. Data Normalization Data normalization is the process of eliminating redundant data within a database. Taking data normalization to its fullest extent results in each piece of information in a database appearing only once, although that's not always practical. Consider the example of order processing. For your program to process an order it will need to know the item being ordered, price, order number, and order date, as well as the customer's name, address, and phone number. If you place all this information in one table, the result looks like the table shown in Figure 20.4. Figure 20.4. Non-normalized data produces a large data table with repeated information, which is inefficient. As you can see, much of the data in the table is repeated. Although the item information is unique, each record stores the customer's name and phone number. This repetition introduces two problems. The first problem is the same one we discussed in the last section: wasted space. The second problem is one of data accuracy or currency.If, for example, a customer changes his or her phone number, you have to change it for all the records that apply to that customer with the possibility that you will miss one of the entries. In the table in Figure 20.4, notice that Martha Smith's address and phone number were changed in the latest entry but not in the earlier entries. If an employee looks up Martha Smith and uses an earlier entry, that employee would not find Martha's updated contact information. A better solution for handling the data is to put the customer information in one table and the sales order information in another table. You can assign each customer a unique ID and include that ID in the sales order table to identify the customer. This arrangement yields two tables with the data structure shown in Figure 20.5. Figure 20.5. Normalized customer and order tables eliminate data redundancy. With this type of arrangement, the customer information appears in only one place. Now, if a customer changes his or her phone number, you have to change only one record. Note It is important to realize that even with normalized tables, your program can still retrieve the customer and order information in a single record (similar to Figure 20.4) by using a query with appropriate joins. However, the advantage of the normalized design is that the information is not actually stored this way in the database. Carrying normalization a step further, you can redesign the items sold and order information. This leads to the development of four tables, but the organization of the tables is much more efficient. You can be sure that when information must be changed, it will change in only one place. This arrangement is shown in Figure 20.6. With the four-table arrangement, the Orders table and the Items Ordered table provide the links between the customers and the retail items they purchased. The Items Ordered table contains one record for each item of a given order. The Orders table relates the items to the date of purchase and the customer making the purchase. Figure 20.6. Complete normalization of the tables provides the greatest efficiency. Figure 20.6 shows a diagram of the database's relationships. The fields in each table are listed, and the lines between the tables represent relationships. Creating Primary Keys When you work with a table, you usually need some way to uniquely identify a record in the table. Primary keys provide a way for you to tell a database management system what field or fields in a record must be unique within the table. For example, look at the Person table in Figure 20.3. Because the SSN field is defined as the primary key for the Person table, no two records in the Person table can contain the same value in the SSN field. For example, if a program or user attempted to insert a record with an SSN value of 111-22-3333, the database would generate an error message and reject the new record because a record with this SSN already exists. When you design a new table, it is important to set the primary key. Look at Figure 20.3 again. Notice the Employee table. You might also think that SSN would be a good key for this table, and it might. However, what happens if your database needs to accommodate an employee who works in two different departments? If the primary key was just SSN, the database design could not handle this. By making the SSN and Dept fields both part of the primary key, you can accommodate the requirement of an employee working in two different departments. Figure 20.7 shows the SQL Server table design screen for the Employee table. A key icon to the left of the field indicates the primary key. Figure 20.7. In SQL Server and Access, you can create a primary key by highlighting the desired fields and clicking the Key button on the toolbar. Using Lookup Tables A lookup table is another way to store information to prevent data redundancy and to increase the accuracy of data entry functions. Typically, a lookup table is used to store valid data entries (for example, a state abbreviations table). When a person enters the state code in an application, the program looks in the abbreviations table to make sure that the code exists. You also can use a lookup table in data normalization. If you have a large mailing list, many of the entries use the same city and state information. In this case, you can use a ZIP Code table as a related table to store the city and state by ZIP Code (remember that each ZIP Code corresponds to a single city and state combination). Using the ZIP Code table requires that the mailing list use only the ZIP Code of the address, and not the city and state. During data entry, you can have the program check an entered ZIP Code against the valid entries. Rules for Organizing Tables Although no absolute rules exist for defining what data goes into which tables, here are some general guidelines to follow for efficient database design: Determine a topic for each table, and make sure that all data in the table relates to the topic. If several of the records in a table have fields intentionally left blank, split the table into two similar tables. (Remember the example of the employee and member tables.) If information is repeated in a number of records, move that information to another table and set up a relationship between the tables. Repeated fields indicate your design might not be flexible enough. For example, if you have Item1, Item2, Item3, and so on in a table, you might want to move the items to another table that relates back to the parent table. This would make adding a fourth item easier. Use lookup tables to reduce data volume and to increase the accuracy of data entry. Do not store information in a table if it can be calculated from data in other tables. As stated previously, these guidelines for defining tables are not hard-and-fast rules. Sometimes, deviating from the guidelines makes sense. One of the most frequent reasons for deviating from the guidelines just given is to improve performance. Believe it or not, there is a process called denormalization in which tables are combined for speed purposes! If obtaining a total sales figure for a given salesperson requires summing several thousand records, for example, you might find it worthwhile to include a Total Sales field in the salesperson table that is updated each time a sale is made. This way, when reports are generated, the application doesn't have to do large numbers of calculations, and the report process is dramatically faster. However, your program must ensure that the Total Sales field is consistently and accurately updated. Deviating from the guidelines results in two major consequences. The first is increasing the size of the database because of redundant data. The second is the possibility of having incorrect data in some of the records because a piece of data was changed and not all the affected records were updated. There are trade-offs between application performance and data storage efficiency. For each design, you must look at the trade-offs and decide on the optimum design. Using Indexes When you create a table, you can tell the database management system to maintain one or more indexes on the table. Indexes are used to improve query performance. For example, suppose you want to query the Person table pictured in Figure 20.3 for all persons having last names between A and F. In effect, you are asking the database to return records where the LastName field is within a certain range. Without an index,however, the only way the database can determine the records within this range is to perform a table scan. A table scan means the database management system will look at every record in the table to determine whether it should be included in the results. The reason each record needs to be examined is that records in a table are usually stored in the order in which they are added. This order is the physical order or natural order of the data. An index is a special structure that contains a key value (derived from the values of one or more table fields) for each record in the table. The index itself is stored in a specific logical order, such as alphabetically. For example, an index on the LastName field would contain all the last names in the table and a pointer to the actual records with that last name, as pictured in Figure 20.8. Figure 20.8. The database can search the ordered index and then retrieve the associated records quickly. Database indexes work similarly to the index in the back of this book. By using the book's index, you easily can look up key words or topics, because the index is alphabetical and contains pointers (page numbers) to tell you where to find the information. To get an idea of the value of such an index, imagine a phone book that lists the customer names in the order in which they signed up for phone service. If you live in a large city, finding a person's number could take forever, because you have to look at each line until you find the one you want. Different Types of Indexes When you create an index, you specify the type of index and the fields to be indexed, and the database management system does the rest. We have already discussed one type of index, the primary key. Primary keys are known as unique indexes because the keys in the index must be not be duplicated. The index pictured in Figure 20.8 is not unique, because the index may contain multiple keys (last names) with the same value. Some database systems also provide clustered and non-clustered indexes. Clustered means the data in the table is physically stored in the order of the index. As you might expect, each table can only have one clustered index, while any other indexes are non-clustered. An index can also be classified as a single-key index or a multiple-key index. A single-key index is based on the value of a single field in a table. Examples of this type of index are Social Security number, ZIP Code, employee ID, and last name. Although single-key expressions are valuable in presenting data in a specific order, imposing an even more detailed order on the table is often necessary. You can do so by using multiple-key indexes. As you can infer from the name, a multiple-key index is based on the values of two or more fields in a table. As an example, suppose your program is trying to look up the name John Smith in a table with millions of records. If you created an index only on the LastName field, the database could quickly find the Smiths, but it would it still have to perform a table scan on all the Smith records (which could be a lot) to find John Smith. In this case, a multiple-key index on LastName and FirstName would be useful. Determining When to Use Indexes When you design a database, you create indexes based on the types of queries that will be asked of the database. Often, additional indexes are added to the database during testing of the application. If you know in advance that you are going to be running a lot of queries on a certain field or fields, you may want to consider adding an index. However, it is not a good idea to create an index on every field, because the database management system has to maintain the index every time a recorded is added, deleted, or updated. A lot of indexes may help if your table is used for mostly reading and very little updating. However, a bunch of unnecessary indexes on a table that is frequently updated or joined only on one or two key fields may adversely affect query performance. Once again, you must consider the trade-offs in the database design. |