Mapping ORM Models to Logical Database Models
Forward Engineering ORM Source Models
Forward engineering is the process of transforming higher level models to lower level models. Figure 7-1 displays an overview of this process, starting at the conceptual level (ORM), mapping to the logical level (ER or relational), and then generating the physical database model. The first stage, mapping ORM models to logical database models, is called the build process and is the main focus of this chapter.
Figure 7-1: Forward engineering models: conceptual ’ logical ’ physical.
The second stage, transforming a logical database model to a physical database schema, is called the generate process. It is usually best to first generate the data definition language (DDL) script containing the SQL statements for creating the relational schema, so you can check it over. Once you are happy with the DDL script, you can execute it in your chosen DBMS to create the actual database schema. With some DBMSs, you may also generate the database schema directly if connected to the DBMS.
If you later make changes to the logical database model, you can propagate those changes to the existing database by using the update process. The generate and update processes are covered in detail in later chapters and are not discussed further here.
Chapter 3 included a simple example to illustrate the basic process of building a single logical model from a single ORM model. As illustrated in Figure 7-1, it is also possible to build a logical model from two or more ORM source models. In this case, the tool will automatically merge the source models into a single, global model before building the logical model. If it finds any name clashes , it will prompt you to resolve these before completing the build. As Figure 7-1 also shows, the same source model may be used for building more than one logical model. As discussed in section 6.6, this provides one way to reuse existing source models.
When a logical database model is built from one or more ORM source models, the tool creates a project to house the relevant documents. In addition to the logical database model, the project may contain non-source documents, such as a text file you might add to the project for further documentation. In this case, when you build the project a harmless warning is issued that that file is not recognized as a source model. When you open a project file, each of its component files (source or non-source) is displayed, and you can open each file by double-clicking its icon.
Building a Logical Database Model from ORM
To transform a set of one or more ORM models to a logical database model, first add the ORM model(s) to a database model project, and then build the project. The following steps summarize the procedure.
- For each saved ORM model that you wish to include as a source, perform an error check by choosing Database > Model Error Check , and make any corrections needed.
- From the File menu, open the logical modeling solution by choosing File > New > Database > Database Model Diagram .
- To create a database model project, choose Database > Project > Add existing document from the Database menu.
- The Add Document to Project dialog box appears. For each ORM model you wish to add, use the Look in : field to navigate to the model, then press the Open button. The ORM model(s) should now be listed in the project window. To add nonmodel files to the project, change the dialog setting for the Files of Type : field to All Files (*.*), before pressing the Open button.
- Save the project file by pressing the Save icon on the main menu and giving it a filename. The project file automatically has the extension ".vsd." The name and page of your current model is always listed in the title bar at the top of the screen.
- Now build the logical model by choosing Database > Project > Build from the Database menu.
The relational database schema is now automatically built. The Output window records the progress of the build process. If the build fails, error messages are generated. If the build succeeds, the resulting table schemes appear in the Tables and Views window. By default, if the ORM models are correct, the build process generates a relational schema in fifth normal form.
Although the table structures have been created, they do not yet appear on a diagram. To see the logical database model in graphical form, drag the table schemes you wish to see onto one or more drawing pages of the project window. To select one table scheme, simply click it before dragging. To select a group of adjacent table schemes, hold the Shift key down as you click the first and last member. To select a group of nonadjacent table schemes, hold the Ctrl key down as you click each member. To add a new drawing page, select Insert > New Page from the main menu.
For example, suppose the conceptual schema in Figure 7-2, which is a variation of our Patient model, is saved as the file P_CS. Now suppose that this is used as the only source model to build a logical database schema named P_LS.
Figure 7-2: A sample ORM model.
When the project is saved, an icon for the ORM source model appears in the Project window , as shown in Figure 7-3. When the project is built, the fact types are grouped into three table schemes, as shown in the Tables and Views window of Figure 7-3. This window shows only the names of the base tables and views. In this example, there are just three base tables and no views. A more detailed depiction of the logical model results from dragging the table schemes onto the drawing window, as shown in Figure 7-3. This displays the names of each table and column, as well as the foreign key connections between them.
Figure 7-3: Default mapping of the sample ORM source model.
Each table has its name in the shaded header, with its columns listed below. Primary keys are underlined , marked "PK" and appear in the top compartment for the columns. Mandatory (not null) columns are displayed in bold. Foreign key columns are marked FK n where n is the number of the foreign key within the table. Each foreign key connection is depicted as an arrow from the foreign key table to the target table. Uniqueness constraints on columns other than primary keys are marked U n where n is the number of the uniqueness constraint within the table.
ORM constraints that map to check clauses, triggers, or stored procedures are not displayed on the logical diagram, but may be viewed separately in code windows , as discussed later in the chapter. In this example, the constraints missing from the logical diagram are the value constraint for SexCode and the subset constraint involving given names. If desired, the column data types may be displayed on the diagram by choosing the menu option Database > Options > Document , opening the Table Pane , and then picking the Data Types radio button Show physical . It's best to set the data types on the ORM model, where object types correspond to conceptual domains. The correct data types then automatically propagate to all the attributes based on these domains.
It is good practice to keep the ORM source model open when you build a project. You can then toggle between the conceptual and logical models by choosing the relevant model from the Window option of the main menu. If the ORM model is closed, you can open it by clicking its icon in the Project widow.
If the tool's original default settings are used, the table and column names shown in Figure 7-3 are generated automatically. The order in which the columns are listed may vary from that shown, depending on the order in which the fact types were entered into the original ORM source model. In the example, some of the names generated, as well as the order of some of the columns, are less than ideal. We now discuss how to overcome these deficiencies.
The next section reviews how to manually refine a logical model by reordering columns and renaming relational model elements. We then discuss how to migrate such changes back to the ORM source model, and why it is generally best to avoid renaming at the logical level. The following section discusses how to avoid the need to rename relational model elements, by specifying options on the ORM source model to automatically generate the desired logical names when mapping. Later sections review options for mapping subtypes , and discuss how to view constraint code.
Refining the Logical Model
Regardless of whether a logical database model has been mapped from ORM or entered directly, the tool allows you to freely edit it. Any relational model element, such as a table, column, or constraint, may be modified, deleted, or added, and you can reorder columns within tables as desired. In practice however, direct refinements to a logical model that was mapped from ORM should normally be restricted to reordering columns and renaming tables . As discussed later, other changes should be made by configuring the ORM model(s) to automatically map to a logical model with the desired refinements.
Any given table may be refined by selecting it in the drawing window to bring up its Database Properties dialog, and then making the desired changes. To move a column to a new position, choose the Columns category, select the column to be moved, and then press the Move Up or Move Down button to move it up or down respectively. Figure 7-4 shows the result of using the Move Up button to move the Second GivenName column up one place from its former position shown in Figure 7-3. To rename a column, you may simply edit the name displayed in the Physical Name field of the dialog, but as explained later it's better to control column names from the ORM level.
Figure 7-4: Using the Database Properties dialog to move a column.
In addition to columns, you may rename other relational model elements such as tables, primary key constraints, indexes and foreign key constraints. To rename a table, choose the Definition category in its Database Properties dialog and edit the Physical Name field. For example, the "Patient allergic Drug" table has an awkward name, and would be better named "Allergy," Figure 7-5 illustrates how to perform this renaming. The Allergy table is mapped from the ORM fact type Patient is allergic to Drug .
Figure 7-5: Using the Database Properties dialog to rename a table.
In cases like this, where the table corresponds to an ORM fact type with a composite key, the ORM source model solution provides no flexible way to control the table name generation, so the table renaming is performed at the logical level. The Country and Patient tables however house fact types with simple keys, based on the object types Country and Patient respectively. As discussed later, you can control these table names at the ORM level by setting options on the Country and Patient object types.
Names changes for other kinds of model element may also be controlled at the ORM level, so we discuss them only briefly here. To rename a primary key constraint (e.g., Patient_PK), select the Primary ID category and edit the Physical Name field. To rename indexes, choose the Indexes category, and edit the index name. To rename a foreign key constraint (e.g., Patient_Allergy_FK1), select the foreign key arrow on the diagram, choose the Name category, and edit the Physical Name field of its Database Properties dialog. As discussed in a later chapter, this dialog may also be used to change the referential actions (e.g., on update of Parent, no action) for foreign key constraints.
Migrating Changes Back to ORM Source Models
On saving changes to a logical model built from ORM, you are prompted whether to migrate these changes to the ORM source model(s) used to generate the logical model. The migration prompt is displayed in Figure 7-6.
Figure 7-6: Choosing whether to migrate logical changes to the source model(s).
If the only changes you made to the logical model are non-structural (renaming columns , or reordering columns other than primary key columns) and you do not intend to reuse the ORM source model(s) to build a separate logical model, you may answer No to migrate, since the non-structural modifications will be stored with the current project. So if you rebuild the project at a later time, these non-structural changes will be applied during mapping. Since migration can take a long time with large models, migrating changes only when needed can save you some time.
If the logical model was built from a single ORM model, and the only changes you made to the logical model are non-structural, and you intend to reuse the same ORM model to build a different logical model, answer Yes to migrate. Otherwise these changes will be lost the next time you build or rebuild a separate logical model from the ORM source model.
If the logical model was built from a single ORM model, and you made some structural changes to the logical model (e.g., adding or deleting tables, columns or constraints, or changing the primary key), answer Yes to migrate. Otherwise these changes will be lost the next time you build or rebuild this or any other logical model from the ORM source model.
If the logical model was built from more than one source model, migration has no effect, so answer No to the prompt. So with the current version of the tool, migration is only useful if the logical model is built from a single ORM source model. In this case, you may build and migrate as often as you like. This looping possibility is depicted in Figure 7-7. When you migrate, all structural and non-structural changes are communicated to the ORM model.
Figure 7-7: Migration of logical model changes to an ORM source model.
A structural change to the logical model migrates to a structural change in the ORM model. For example, if you add a FaxNr column to the Patient table, and then migrate this change, the fact type Patient has FaxNr will be added to the ORM model. Additional fact types arising from migration appear in the Fact Types pane of the Business Rules window. To add a migrated fact type to the ORM diagram, drag it onto the drawing window.
Although you can make all kinds of logical model changes and migrate them to a single ORM source model, we strongly encourage you to avoid making any structural changes at the logical level . As far as possible, you should also avoid making any nonstructural changes at the logical level, with the exception of column reordering and, table renaming as discussed earlier. Instead, configure the ORM model to automatically generate the required names in the logical model.
There are good reasons for controlling all naming decisions from the conceptual model. The conceptual model is easier to validate with the domain expert, so changes made at this level are typically best understood . Annotating the conceptual model with required naming options for target implementations facilitates the desirable approach of driving the execution directly from the conceptual model.
Another pragmatic reason is that the Visio tool currently exhibits the following undesirable behavior: if name changes made at the logical level are migrated to ORM, any changes made later to names of the corresponding ORM model elements are ignored in subsequent builds of the logical model . Hence renaming at the logical level can make it much harder to synchronize conceptual and logical models. To illustrate this problem, consider the following scenario. Suppose we create the ORM model in Figure 7-8(a), map it to the logical model in Figure 7-8(b), and then rename the columns as shown in Figure 7-8(c).
Figure 7-8: Mapping an ORM schema to a logical schema, then renaming.
Suppose we now save the changes to the logical model, and answer Yes when prompted to migrate those changes to the ORM source model. This is fine if we never want to make changes to the relevant aspects of the ORM model. But suppose we later decide that social security numbers are not a good way to identify all persons in our universe of discourse , so we change the reference scheme to Person(Id). Also we decide that Sex is better renamed as Gender. The ORM model now appears as shown in Figure 7-9(a). On rebuilding the logical model, we would expect those changes to be reflected in the column names generated. Instead these changes are ignored, and the logical model still appears as shown in Figure 7-9(b).
Figure 7-9: Later ORM name changes are overridden by the migrated renaming.
To avoid this undesirable behavior, either press the No button when prompted to migrate logical name changes of this kind, or better still, configure the mapping options on the ORM model to automatically obtain the desired logical names so you can avoid this kind of renaming at the logical level. The next section explores the second alternative. If you ever find yourself in a situation like that of Figure 7-9, you can fix the problem by renaming the logical columns and then migrating these changes back to the ORM model. For example, rename "SSN" to "Person ID" and "Sex" to "Gender" and then migrate. However, it's better to avoid getting such a situation in the first place.
Controlling Logical Name Generation
Visio for Enterprise Architects includes several options for controlling how the names of logical model elements are generated from an ORM source model. In the pure ORM model itself, the names of model elements have a large bearing on the logical names generated. Most important in this regard are the names of object types, predicates, reference modes and roles. In addition, various mapping settings may be stored with the ORM model to control name generation.
Settings made at the document level apply to the whole model unless overridden by local settings on individual object types and predicates. Most of the document level options are specified in the ORM Document Options dialog, while preferences for keys, indexes, and foreign key constraint names may be set in the Database Modeling Preferences dialog. We discuss these two dialogs first before moving onto the use of local settings and roles names to control name generation.
Document Level Options for Name Generation
To set mapping options that apply by default to the whole ORM model, open the ORM Document Options dialog by choosing Database > Options > Document from the main menu of the ORM model. This dialog has six panes: General, Abbreviation, Prefix, Suffix, Capitalization, and Miscellaneous. The General pane has a check box for showing physical constraints: you should normally leave this checked (the default). The Prefix and Suffix panes allow you to automatically insert prefixes or append suffixes to column and/or table names in the logical model. You will normally want to accept the default settings (No prefix, No suffix), as shown in Figure 7-10.
Figure 7-10: Setting prefix and suffix options at the ORM document level.
Some companies enforce naming standards that require tables or columns to be prefixed or suffixed. For example, to prefix each table name by "DEMO_", choose the Custom Prefixes option for Table Prefix and enter "DEMO_" in the field. Custom column prefixes and suffixes are sometimes used help delineate context or sub-areas, but in the version used at the time of writing, there appear to be bugs with the generation of custom column prefixes/suffixes.
The Capitalization pane allows you to control upper and lower case settings for column and table names. Different companies adopt different standards in this regard. The settings shown in Figure 7-11 are recommended if you prefer camel casing for column names (e.g., countryCode) and Pascal casing for table names (e.g., LineItem). Here the case used in the conceptual names is preserved, except for the first letter, which is forced lower case for columns, and forced upper case for tables. To generate table names in all upper case (e.g., LINEITEM) choose the "Force upper" setting for Table Names.
Figure 7-11: Starting column names in lowercase and table names in uppercase.
The Defaults button has a drop-down list with three options: choose Set As to make the new settings the default; choose Restore to return to the settings in place when the dialog was opened; and choose Restore Original to return to the original default setting when the tool was first installed. Press OK to apply the new settings.
The Miscellaneous pane is even more useful. Although conceptual names may contain spaces, embedded spaces are often undesirable or even illegal at the logical level, unless delimited identifiers are used. In Figure 7-12, setting the Spacing Character to None removes all spaces from generated names for relational elements. For example, "First GivenName" becomes "FirstGivenName". To replace a space by an underscore (e.g., First_GivenName), choose the Underscore setting. The Other setting allows you to specify a different replacement character.
Figure 7-12: Setting miscellaneous naming options.
The default setting for Reference Mode is Add to object type name . For example, the reference scheme Country(Code) generates the name "CountryCode." This is usually best as a document wide setting. You can override this for individual object types, as discussed earlier. The Do not use for naming setting ignores the reference mode, instead using just the object type name, e.g., "Country." The Use as column name setting ignores the object type name and uses just the reference mode name, e.g., "Code."
The maximum name length setting specifies the maximum number of characters allowed in generated names. This defaults to 128, which is the maximum length for an identifier in the SQL-92 and SQL:1999 standards. Although a number of DBMSs, including Microsoft SQL Server, allow 128 character names, some DBMSs do not (e.g., Oracle identifiers are restricted to 30 characters). It is not uncommon to generate names that are longer than 30 characters , especially for foreign key constraints or tables that correspond to an n-ary predicate.
One way to avoid names that are too long for the target DBMS is to enter a smaller value (e.g., 30) for the maximum name length in this dialog. The tool then does its best to produce a shorter name that is still meaningful, mainly by eliminating vowels starting at the right-hand end. For example, if you specify an independent object type with the conceptual name "Person_or_Organization_or_Company" and set a thirty character limit on logical names, then this object type maps to a relational table named "Person_or_Organization_r_Cmpny".
The Use predicate text for mapping whenever possible check box should normally be checked, as this helps to provide more meaningful names. The Pluralize table names check box should be unchecked, unless the naming standards in your company require plural table names. Singular table names are preferable, especially for referencing columns (e.g., "Patient.familyName" is more natural than "Patients.familyName"). If you do specific plural names, the tool does a reasonable job of rendering these naturally. For example, "Country" is pluralized as "Countries," and "Mouse" is pluralized as "Mice."
The Abbreviation pane enables finer control over name generation, by enabling you to specify standard abbreviations for various words. This is especially helpful if your company naming standards require such standard abbreviations. The tool comes with a list of predefined abbreviations that are displayed when you open the Abbreviations pane (see Figure 7-13).
Figure 7-13: Predefined abbreviations supplemented by user -defined abbreviations.
Each predefined entry in the Name column ("a," "an," , "would") is abbreviated to the null string, as shown by the absence of an entry in the Abbreviation column. This causes those words to be removed from any generated names. You can add your own entries to the abbreviation list, as shown in Figure 7-13(b). Here the words "country," and "code" are to be abbreviated as "CNTRY", and "CD," respectively. Although such abbreviations like these may lose clarity, they may be imposed on you by company naming standards.
For example, Figure 7-14 shows the result of mapping the ORM schema in Figure 7-2, using the column reordering , table renaming, and ORM document settings discussed earlier (including the abbreviations "CNTRY" and "CD" for "Country" and "Code"). The Country table is now named "CNTRY," and its primary key is named "cNTRYCD" instead of countryCode ”the setting to start column names with lower case works poorly with words that are all upper case. The column "countryName" is not named "CTRYName" because abbreviations work only on whole words, not parts of words. However, you can automatically generate the name "CNTRYName" in this case simply by including a space to separate the words in the conceptual name (i.e., name the ORM object type "Country Name" instead of "CountryName").
Figure 7-14: A second mapping of the sample ORM model in Figure 7-2.
The foreign key of the Patient table is now named "bornCNTRYCD." This uses the user-defined abbreviations for Country and Code, as well as the predefined abbreviation of "was" to the empty string. Although the effect is not illustrated here, it is best to use "Code" instead of "code" for the reference mode of Country because the tool does not provide a capitalization option to start each successive word in a name with a capital letter.
The other document level dialog relevant to name generation is the Database Modeling Preferences dialog . To invoke this dialog from either an ORM source model or a logical database model use the main menu option: Database > Options > Modeling . Now select the miscellaneous pane (Logical Misc) . For naming purposes, the relevant aspects of this pane are the Default names suffixes for keys and indexes, and the FK name generation option . Unless your company's naming standards differ , we suggest you use the predefined default suffixes (e.g., primary key constraint names are appended with "_PK").
The predefined default for foreign key constraint names appends the base suffix (e.g., "_FK") and a distinguishing numeral (e.g., "1" or "2" to cater for multiple foreign keys) to the concatenation of the names of the parent (referenced) table and child (referencing) tables. For example, for the model in Figure 7-14, the foreign key connection from Patient to Country is named "CNTRY_Patient_FK1." If your target DBMS allows only short identifiers, you may wish to choose a shorter option from the drop-down list, as shown in Figure 7-15. You may also choose to start the constraint name with the name of the child table instead of the parent table.
Figure 7-15: Setting preferences for key and index suffixes, and FK constraint names.
As you may have noticed, some of the names in Figure 7-14 are still not ideal. The abbreviations for Country and Code are not really advisable, so we will remove those from now on. The other naming problems can be overcome by using local options and role names for name generation, as we now discuss.
Local Options for Name Generation
Some naming options set at the document level can be overridden on an individual basis for ORM model elements by choosing a different setting for them in their Database Properties dialog . For example, refer back to the simple ORM schema mapping in Figure 7-8(b). By default, the column names generated are "Person SSN" and "Sex Code" because the default document level setting for reference modes is to add them to the name of the object type (see Figure 7-12).
To generate the column name "SSN" instead of "Person SSN," select the Person object type, and in the Ref Mode pane of its Database Properties dialog, change the radio button option from " Use document's setting " to "Use as column name," as shown in Figure 7-16. This overrides the reference mode document level setting for the Person object type only. When you save this change and rebuild the logical model, the primary key of Person is named "SSN." If you use the "Do not use for naming" setting for the Sex object type, then the "Sex Code" column would be renamed "Sex," so you can achieve the result shown in Figure 7-8(c) without needing to do any renaming at the logical level.
Figure 7-16: Setting the reference mode of one object type for use in column naming.
If you choose the option Do not use for naming , the reference mode is removed from the column name. If you did this for both the Person and Sex object types, the mapped columns would then also be named "Person" and "Sex."
The other document level naming option you can override on an individual basis is the setting to Use predicate text for mapping wherever possible (see Figure 7-12). For example, by default the column name generated from the Patient was born in Country fact type in Figure 7-2 is "Born Country Code." The "was" and "in" from the predicate are omitted from the column name because their abbreviations are predefined to the null string.
If for some reason you wanted to remove the whole predicate from the column name, you could open the Database Properties dialog for Patient was born in Country and choose the Do not use option for Text use for naming, as shown in Figure 7-17. If you then rebuild the logical model, the column name appears as "Country Code." This option is rarely used because it typically makes it harder to determine the precise meaning of the column names. If you apply this setting to more than one Country predicate that mapped to the same table, the tool appends numbers to the column names to distinguish them.
Figure 7-17: Setting the text of an individual predicate for use in name generation.
Note that naming option changes for an ORM object type or predicate are ignored if you previously migrated a logical name change back to that model element . For example, suppose you changed "Person SSN" to "personSSN" in the logical model, migrated that change back to the ORM model, then chose the "Use as column name" setting for Person's reference mode. If you now rebuild the logical model, you will still get "personSSN," not "SSN" as the column name. This is yet another reason to avoid renaming at the logical level wherever possible .
Using Role Names to Generate Column Names
In an ORM model, each predicate must have at least one reading, but it is optional whether any role in a predicate is named. Although not displayed automatically on the diagram, role names are useful for controlling how column names are generated in logical models (and for specifying attribute-style derivation rules). For example, suppose we add the name "birthCountry" for the second role of Patient was born in Country by entering it in the Readings pane of the predicate's Database Properties dialog, as shown in Figure 7-18.
Figure 7-18: Using a role name to generate a column name.
This role name "birthCountry" will now be used in place of the predicate name "was born in" when generating the relevant column name. For example, instead of "BornCountryCode" the column will be named "birthCountryCode." If you also set the reference mode naming option for Country to Do not use , the column will be named "birthCountry". However the countryCode column for the Country table would then also be renamed "country," which is not advisable.
The ORM schema in Figure 7-19 reproduces our sample ORM model, to which we have now added two role names using the Database Properties dialog. Here, bracketed role names are displayed in text boxes on the diagram for visualization purposes. The role named "isSmoker" is added to the single role in the Person smokes fact type, and the role name "birthCountry" is added to the second role of Patient was born in Country . The ORM document options have been set to start column names with a lower case letter and to ignore space characters (as shown in Figure 7-11 and Figure 7-12).
Figure 7-19: Using role names to control name generation.
Figure 7-20 shows the resulting logical schema. Except for renaming the Allergy table, and reordering columns in the Patient table, all the logical names were determined by choosing appropriate names and settings in the ORM source model.
Figure 7-20: The final mapping for the ORM sample model in Figure 7-19.
The tool uses role names to generate column names but not table names. If a predicate is either many: many or n-ary, it maps to a table all by itself, and role names are ignored in name generation. This is why it was necessary to rename the Allergy table at the logical level. However, the options discussed give you almost total control over how names are generated in the logical model. Try to avoid as much as possible making any name changes directly to the logical model, because this makes it very awkward to make later changes to the corresponding elements in the conceptual model.
In addition to naming options, the tool provides several options for the presentation style in which the logical model is displayed. For example, to display the logical model in IDEFIX notation instead of pure relational notation, choose the main menu option Database > Options > Document to open the Database Document Options dialog for the logical solution, open its General pane, and then choose IDEFIX for Symbol set. This setting causes the model shown in Figure 7-20 to be redisplayed in IDEFIX notation, as shown in Figure 7-21. Many other display preference settings are available through this dialog, as discussed later in the book.
Figure 7-21: Displaying a logical model in IDEFIX notation.
Constraint Code
A logical database model diagram can display primary key, foreign key, uniqueness, and mandatory column constraints, as well as indexes. Any other constraints in the logical model can be viewed only textually in a code window . If an ORM constraint maps to a check clause , trigger , or stored procedure , you can open a code window to view the constraint code generated. In those few cases where the tool does not generate the code for an ORM constraint, you can use a code window to manually add the relevant code to the logical model.
To view or edit a check clause on a single column , proceed as follows .
- Select the relevant table on the diagram, and open its Database Properties dialog.
- Open the Columns pane, and select the relevant column. For example, Figure 7-22 highlights the sexCode column of the Patient table in Figure 7-20.
Figure 7-22: Columns pane of the Database Properties dialog for Patient.
- Press the Edit button to open the Column Properties dialog for that column and choose the Check pane. If the column has a value constraint, this is displayed by default using the ORM value constraint dialog (see Figure 7-23).
Figure 7-23: Check pane of the Column Properties dialog for sexCode.
- Select the radio button: Show check clause code . To edit or delete the code, press the Customize button, which then becomes a Delete button (Figure 7-24).
Figure 7-24: Viewing the body of the check clause code for sexCode.
If you do not press the Customize button, the name and body of the check clause are displayed in read-only format (grayed-out). The Physical name field displays the name that is automatically generated for the check clause (e.g., PatientsexCode_Chk). The Check clause field shows just the body of the clause. For our sexCode example, the actual check clause that will appear when the DDL is generated is:
constraint "PatientsexCode_Chk" check ("sexCode" in ('M','F'))
To view or edit a multi-column check clause , proceed as follows.
- Select the relevant table on the diagram, and open its Database Properties dialog.
- Open the Check pane. The ORM verbalization of the constraint appears in the Check clauses field, and the code body for the check clause appears in the Preview field. Figure 7-25 shows this for the Patient table in Figure 7-20. This code enforces the ORM subset constraint that each patient with a second given name must also have a first given name.
Figure 7-25: Table-level Database Properties Check pane for the Patient table.
- To edit the check clause, either press the Edit button or double-click in the Check clauses field. The body of the check clause is now displayed in the Code Editor , ready for editing (see Figure 7-26).
Figure 7-26: The body of the check clause for the subset constraint in Patient.
For our example, the actual check clause that will appear when the DDL is generated is:
alter table "Patient" add constraint Patient_subset check (("firstGivenName" is not null) or ("secondGivenName" is null))
As Figure 7-25 shows, the Check pane of the Database Properties dialog also includes an Add button for adding a check clause in the Code Editor and a Remove button for deleting a check clause. The Triggers pane of the Database Properties dialog may be used to view or edit triggers on the table.
If a logical model includes stored procedures, these can only be viewed via the Code window . This window also provides access to check clause and trigger code associated with the model. To open the Code window, choose Database > View > Code from the main menu. To open the Code Editor for any listed entry, simply double-click it. You can also use the code window to add new code and delete existing code.
The code window for our Patient example is shown in Figure 7-27. This contains one single-column check clause and one multi-column check clause. Single-column check clauses are depicted with a CC icon. Other rules are depicted with the Ru icon. If a check clause does not appear in the code window, you may need to first generate the physical DDL from the model to make it appear.
Figure 7-27: The Code window allows all code for the model to be viewed and edited.
A more comprehensive example is shown in Figure 7-28, which displays the code window for the sample Employee Database model that ships with the product. To access this model, choose the following options from Visio's main menu: File > New > Browse Sample Drawings > Database > Employee Database.vsd . If the DBMS is SQL Server, the Global Code section includes a stored procedure to enforce an exclusion constraint that spans two tables. The Local Code section includes several single-column and multi-column check clauses. You may double-click any item to view or edit its code in the Code Editor.
Figure 7-28: The code window for the Employee sample database model.
If an ORM constraint cannot be enforced in the target DBMS as a simple declarative constraint (e.g., primary key, foreign key, not null, unique, or simple check clause), the tool usually attempts to enforce the constraint by generating triggers or stored procedures, depending on the DBMS. Although generated code for a stored procedure contains the logic to enforce the constraint, you still need to manually write code to execute the procedure or transform it into appropriate triggers.
For those DBMSs that do not support stored procedures (e.g., Microsoft Access), stored procedures are documented in the DDL as comments, which you can use to help develop an alternative way to enforce the constraints.
ORM constraints that generate trigger or stored procedure code include constraints that span tables (e.g., external uniqueness and inclusive-or constraints spanning roles that map to separate tables), frequency constraints, some ring constraints, and some setcomparison (subset, equality, exclusion) constraints.
Ring constraints were discussed in section 5.8. An irreflexive ring constraint whose roles map to columns in the same table is enforced as a simple check constraint. For example, the mapping in Figure 7-29 shows the ORM constraint no Academic monitors itself mapping to a check clause with the c ondition academicEmpNr <> monitorEmpr . For visualization purposes, the relevant role names and check condition are displayed in text boxes, using superscripts to cross-reference the rule to its referents .
Figure 7-29: The irreflexive ring constraint maps to a simple check clause.
But now consider the mapping of the parenthood fact type shown in Figure 7-30, which is an extension of an example discussed in Chapter 5. Again, roles names are added in text to aid visualization. The frequency constraint indicates that a child may have at most two parents, and the ring constraints declare that parenthood is both asymmetric and intransitive . None of these constraints are mapped to check clauses.
Figure 7-30: The ring and frequency constraints do not map to check clauses.
If SQL Server is the target DBMS, each of these constraints is mapped to a stored procedure, which you can view in the Code Editor. The constraint code generated (minus comments) is as follows.
Create Procedure sp_Parenthood_freq1 as if (not exists (select * from "Parenthood" group by "Parenthood"."childld" having count( * ) > 2)) return 1 else return 2 Create Procedure sp_Parenthood_ring2 as if (not exists (select * from "Parenthood" X, "Parenthood" Y where X."parentld" = Y."childld" and X."childld" = Y."parentld") and not exists (select * from "Parenthood" X, "Parenthood" Y, "Parenthood" Z where X."childld" = Y."parentld" and Y."childld" = Z."childld" and X."parentld" = Z."parentld")) return 1 else return 2
The irreflexive constraint discussed earlier was enforced as a simple check clause, since its condition for any given row could be checked by looking at that row alone. This is not true for the frequency constraint and the asymmetric and intransitive ring constraints. Instead of a check clause, the tool generates stored procedures for these more complex constraints. If a procedure returns 1, then the constraint is satisfied; if it returns 2, the constraint is violated. These procedures could be expensive, so are not run automatically. It is your responsibility to decide how to use them. For example, you might write code to run the procedures, or you might replace the procedures by equivalent triggers.
Notice that the composite (as, it) ring constraint maps to a single stored procedure, with the first existential subquery checking asymmetry, and the second existential subquery checking intransitivity.
In reality, the parenthood relation is not just asymmetric but acyclic . Acyclicity is different from the other ring constraints because it involves recursion, so might be quite expensive to enforce. Some modern DBMSs, such as DB2 and the Yukon release of SQL Server, include the capability of performing recursive queries but not all DBMSs do so. Moreover, the SQL syntax used to support recursion may differ among DBMSs. If you wish to declare an acyclic ring constraint, the tool currently does no more than generate a comment for it, so you have to look after coding this yourself. Moreover, if you choose a composite ring constraint that includes acyclicity, the whole code for the composite constraint is simply a comment. You can view the comment in the Code Editor and edit it like any other code.
Subtype Mapping
Section 5.6 discussed how to specify subtyping in an ORM source model. We now consider subtype mapping and coding options to enforce subtype constraints within a logical database model. To illustrate these ideas, an extension of the ORM Patient model discussed in chapter 5 is used as the source to the mapping (see Figure 7-31). Here, the Patient object type directly plays three functional roles. A functional role is a role with a simple uniqueness constraint (and hence it functionally determines the other role). Indirectly, Patient also plays the roles of its subtypes MalePatient and FemalePatient.
Figure 7-31: MalePatient and FemalePatient are subtypes of Patient.
MalePatient plays only one role, and this is functional and optional: each MalePatient has at most one ProstateStatus. For example, one male patient might have his prostate status recorded as ˜BE ( benign enlargement ), while another male patient has never had his prostate checked.
FemalePatient plays two roles. One of these is functional ( each FemalePatient has at most one PregnancyCount ) and mandatory ( each FemalePatient has some PregnancyCount ). If a female patient has never been pregnant, this is recorded as a pregnancy count of 0. The other role played by FemalePatient is non-functional (it does not have a simple uniqueness constraint on it): it is possible that the same FemalePatient had more than one PapSmear . The other role in this fact type is functional ( each PapSmear is from at most one FemalePatient ) and mandatory ( each PapSmear is from some FemalePatient ).
As discussed in Chapter 5, subtyping details are entered in the Subtype pane of the subtype's Database Properties dialog. In addition to the subtype definition field, there are check boxes for table mapping and inheritance and a list box for selecting the primary supertype (see Figure 7-32). The check box titled " Map to separate table " does not relate to the conceptual level at all. Instead it is used to control how subtype specific details are mapped to a logical database schema. If this box is unchecked (the default), any functional roles attached to the subtype will be absorbed back into the supertype when the model is mapped to a relational database schema.
Figure 7-32: The Subtype pane of the Database Properties dialog for MalePatient.
By default, the ORM schema in Figure 7-31 maps to the relational schema shown in Figure 7-33 (with some column reordering , and name control, and ignoring the value constraint on sexCode). This is the structure you get by default if you leave the Map-to-separate-table options unchecked for the two subtypes. The prostate and pregnancy fact types are functionally dependent on their subtype, so are absorbed into the supertype table Patient. Hence the Patient table includes prostateStatus and pregnancyCount as optional columns. Recall that mandatory (not null) columns are displayed in bold, unlike optional (nullable) columns , and that "PK" denotes "primary key" and "FK" denotes "foreign key."
Figure 7-33: By default, functional subtype roles are absorbed into the supertable.
The three pap smear fact types in Figure 7-32 are functionally dependent on the object type PapSmear, so they maps to a table for that object type. The arrow between the tables is a foreign key reference or subset constraint (each patient number in the PapSmear table must also occur within the primary key of the Patient table).
If you compare the ORM model in Figure 7-31 with the relational model in Figure 7-33, it is obvious that much of the subtyping semantics have been lost in the translation. The Patient table has three optional columns: phoneNr, prostateStatus, and pregnancyCount. The phoneNr column is simply optional (there is no formal way of deciding which patients have their phone number recorded). But the prostateStatus and preganancyCount columns are not simply optional. Nor is the foreign key constraint a simple subset constraint.
To preserve the additional semantics in the ORM source model, we need to add qualifications to any optional columns or subset constraints that result from subtyping. We could denote these qualifications by annotating the relational diagram with superscripts and text as shown in Figure 7-34.
Figure 7-34: Subtyping yields qualifications on optional columns or subset constraints.
Here the annotations were added in simple text boxes, using the relational constraint syntax discussed in Halpin (2001). Qualification 1 means that prostateStatus is recorded only if the patient is male (sexCode = ˜M ). Qualification 2 means that pregnancyCount is recorded iff (if and only if) the patient is female (sexCode = ˜F ). Qualification 3 means that each patient number in the PapSmear table must equal the patient number of a female patient recorded in the Patient table (sexCode = ˜F ). Note that simply naming the foreign key column as "femalePatientNr" does not enforce the constraint that the patient number must be that of a female patient. Formally, this column may just as well have been named "patientNr." The informal semantics in the name is for the benefit of the human reader only and carries no formal weight.
Because the ORM tool does not yet support formal subtype definitions, it cannot generate the code to enforce these qualifications. So for now, you need to write this code for yourself. You can do this by editing the table properties of the logical model before generation (or less preferably, by editing the DDL that is generated from the logical model). For example, qualification 1 may be implemented by the following check clause on the Patient table: check ( prostateStatus is null or sexCode = ˜M ).
This involves more than one column, so requires a table-check clause rather than a column-check clause. To add the check clause, proceed as follows . Click the Patient table to bring up its Database Properties dialog, select the Check pane, and press the Add button to bring up the code editor. Select its Properties pane and enter a meaningful name for the check constraint, e.g., "ProstateStatusOnlyIfMale" (see Figure 7-35).
Figure 7-35: Naming the check constraint in the code editor.
Now select the Body pane and enter the body of the check-clause, as in Figure 7-36. Do not enter the "check ( )" wrapper for this code, since the tool does this automatically when generating the DDL. If you do include this wrapper, it will be treated as part of the code body, and hence generate an error.
Figure 7-36: Adding the body of a check-clause.
Press OK to enter the check-clause. This returns you to the properties dialog, with the check-clause listed as shown in Figure 7-37.
Figure 7-37: The check clause has been added.
You may now remove or edit the check-clause, or add more check clauses in a similar way. To implement qualification 2, you should add the following two check clauses. Try this for yourself. Suggested names for these constraints are appended as comments.
check (pregnancyCount is null or sexCode = 'F') --PregnancyCountOnlyIfFemale check (sexCode <> 'F' or pregnancyCount is not null ) --PregnancyCountIfFemale
If you add the three check-clauses as discussed, the following code is included in the generated DDL after the create-table clause for the Patient table:
alter table "Patient" add constraint ProstateStatusOnlyIfMale check (prostateStatus is null or sexCode = 'M') alter table "Patient" add constraint PregnancyCountOnlyIfFemale check (pregnancyCount is null or sexCode = 'F') alter table "Patient" add constraint PregnancyCountIfFemale check (sexCode <> 'F' or pregnancyCount is not null)
Qualification 3 (restricting the foreign key reference to female patients) can be implemented by using the Database Properties dialog to add appropriate triggers to the Patient and PapSmear tables.
Mapping Subtypes to Separate Tables
To summarize our earlier discussion, the ORM schema in Figure 7-31 maps to the logical schema shown in Figure 7-34, if we use the default mapping procedure, where subtypes roles that are functional (with a simple uniqueness constraint) are effectively absorbed back to the supertype before mapping. The prostate and pregnancy fact types are functionally dependent on their subtype, so are absorbed into the supertype table Patient. Hence the Patient table includes prostateStatus and pregnancyCount as optional columns. The actual subtype constraints (indicating the conditions under which subtype facts may be recorded) are now captured by qualifications on the optional prostateStatus and pregnancyCount columns, and on the subset constraint depicted as a foreign key relationship from PapSmear.patientNr to Patient.patientNr. These qualifications need to be coded as check clauses or triggers.
As a non-conceptual issue, the tool also allows you to specify options on the ORM model to provide alternative logical mappings for subtypes. If you double-click a subtype to bring up its Database Properties dialog, select the Subtype pane, and then check the "Map to separate table" option, this causes fact types that functionally depend on the subtype to map to a separate table, with the primary identifier of the subtye as the primary key. For example, to specify a separate table for the MalePatient subtype, mark the check box as shown in Figure 7-38.
Figure 7-38: Choosing to map functional details of MalePatient to a separate table.
Choosing the separate table mapping option for both the MalePatient and FemalePatient subtypes results in the logical schema shown in Figure 7-39. Prostate status is now stored in the MalePatient subtable, and pregnancy count is stored in the FemalePatient subtable. Pap smear facts are still stored in a separate PapSmear table, because these are a function of PapSmear rather than FemalePatient (a patient may have many pap smears). The numbered qualifications on the foreign key references have been manually added, and are explained later.
Figure 7-39: Functional subtype details are now mapped to separate tables.
When you choose the separate subtable mapping option, the tool uses a different notation for displaying foreign key relationships from subtables to supertable. Instead of arrows, a circle-bar notation is used, as shown. The circle is connected by a line to the supertable and has one or two bars underneath, connected by lines to the subtables.
If desired, you may specify a supertable attribute as a discriminator for the subtable hierarchy, by clicking the circle-bar icon to bring up its Database Properties dialog and selecting the discriminator from the attribute list displayed, as shown in Figure 7-40. The tool displays the discriminator (in this case "sexCode") besides the circle-bar icon. If desired, you can drag the control handle for this shape to reposition the discriminator on the diagram.
Figure 7-40: Specifying a subtyping discriminator and completeness status.
A single bar beneath the circle indicates that the categorization of a supertable into subtables is incomplete . In other words, it is possible that the union of the subtable primary key populations is a proper subset of the supertable primary key population. This is true for our example because recording of prostate status is optional for males, so there may be male patients recorded in the Patient table that are absent from both the subtables. Constraints are on populations, rather than types.
If we had made it mandatory for males to have their prostate status recorded, the categorization would be complete, and this could be declared by checking the "Category is complete" check-box in the properties dialog. The tool displays completeness of categorization as a double bar instead of a single bar.
The use of discriminators and completeness indicators covers only a fragment of ORM's subtyping semantics, which allows subtype definitions of arbitrary complexity (e.g., involving multi-branched paths through ORM space). However, since the tool does not yet support formal ORM subtype definitions, it will not generate a discriminator, or guarantee the correct completeness setting when you map to the logical level. So if you want these aspects displayed properly you need to look after them manually yourself.
Even for the simple example in Figure 7-39, the discriminator and incompleteness settings do not convey the full ORM subtyping semantics captured in Figure 7-31. To preserve the additional ORM semantics, we need to qualify the subtable foreign key references as shown in Figure 7-39. Here the annotations have been manually added in text boxes, using the relational constraint syntax discussed in Halpin (2001). Qualification 1 means that the set of patient numbers in the MalePatient table must be a subset of the set of patient numbers of male patients in the Patient table. So prostateStatus is recorded only where the patient is male (sexCode = ˜M ). Qualification 2 means the set of patient numbers in the FemalePatient table must equal the set of patient numbers of female patients recorded in the Patient table. So pregnancyCount is recorded exactly where the patient is female (sexCode = ˜F ).
Since pap smear tests are optional for female patients, there is no need to qualify the foreign key reference from the PapSmear table to the FemalePatient table. This foreign key connection is just an unqualified subset constraint, so will be enforced as a simple foreign key declaration generated in the DDL.
Because the ORM tool does not yet support formal subtype definitions, it cannot generate the code to enforce the two qualifications on the subtable foreign key references. So for now, you need to write this code for yourself. You can do this by editing the table properties of the logical database model before generation (or less preferably, by editing the DDL that is generated from the relational model). As neither of these qualifications can be implemented as a check-clause, you will need to enforce them by means of triggers or stored procedures.