Professional SQL Server Analysis Services 2005 with MDX (Programmer to Programmer)
Data warehouse designs consist of several fact tables and all the associated dimension tables. Small data warehouses are usually comprised of 10 to 20 tables, whereas the larger data warehouses can have more than a hundred tables. Even though you have a large number of tables in your data warehouse, you will likely work with a small subset of those tables; each of which has relationships between them. For example, assume you have sales, inventory, and human resources (HR) data to analyze and the HR data is not strongly related to the sales and inventory data but there is a desired linkage. Then you might create two cubes, one for Sales and Inventory information and another one for HR. It is quite possible the Sales, Inventory, and HR information could be stored in a single data source — in the ODS or OLTP system.
Employee information (HR) could be related to the sales and inventory information within the company so far as there is a link between a given sales event and the employee who made the sale. You might want to slice the sales data by a specific employee, but to do so you must access information that is a part of a separate cube only accessible to the HR department (for security reasons). You can get around this problem by making a single DSV containing all the tables that store sales, inventory, and HR information of a company. From that DSV, both cubes can be formulated and permissions set such that only members of the HR group can drill down on employee personal data.
Having a lot of tables in the DSV definitely makes the navigation and usability a bit complex. When you are working on the data of HR you will only want to see the tables related to this alone. For easy manageability you will need customizable views within your DSV that show only certain tables. Analysis Services 2005 provides you with the capability of having several views within the DSV that contain a subset of the tables in the DSV. These views are actually called diagrams. By default you get a diagram called <All Tables> when you complete the DSV wizard. You can create additional diagrams and select the tables that you want to include within this diagram. This feature was actually added to the product based on requests from the customers. Next, you learn how to create a new diagram and include only the tables you need within this diagram.
To create a new diagram, you need to do the following:
-
Right-click the Diagram Organizer pane and select New Diagram, as shown in Figure 4-14. Name this diagram view "Internet Sales."
Figure 4-14 -
You now have an empty diagram view. Right-click the diagram view and select Show Tables (see Figure 4-15). You are presented with the dialog where you can choose the table(s) you want to include in this diagram view.
Figure 4-15 -
Select all the tables that are part of the InternetSales fact table in this diagram view and click OK.
This gives you a diagram view of the InternetSales that contains the InternetSales fact table and the related dimension tables as shown in Figure 4-16. This InternetSales diagram has six of the eleven tables in the DSV and it makes it much easier to understand the relationship between these tables only.
If you do not want a specific table in your diagram view you can right-click the table and select Hide. Instead of steps 2 and 3 you can add tables to the diagram view by dragging and dropping tables from the Table pane to the Diagram pane. Create another diagram for ResellerSales and add the FactResellerSales table and related tables.
Data Source View Properties
Each object created within the BIDS has certain properties. Within the DSV you can view the properties of the object's DSV, tables, columns, and relationships. Properties of these objects are shown in the Properties window within the BIDS, as shown in Figure 4-17.
Figure 4-17 shows the properties of a column in a table, calculated column, a table, and a relationship. For the regular columns in a table you have the properties AllowNull, Data Type, Description, Friendly Name, Length, and Name. The properties of a column are populated by retrieving the corresponding property from the data source. The data type of this column is retrieved from the data source. Based on the properties defined in the data source, the properties AllowNull, Data Type, Length, Name, and Friendly Name are populated. The length is applicable only for the data type string. For all data types other than string the Length has a value of -1. You cannot change certain properties. Those properties are not editable in the Properties window and are grayed out. You can change the Friendly Name and provide a description to each column. Often columns of a table in the relational database might not have user-friendly names. User-friendly means the name of the column should indicate clearly what the data held by the column is. The Friendly Name is a property that can be changed by you so that this friendly name is shown in the DSV for an easier understanding of the model. You can provide an optional Description to each column if needed. The DSV provides you with the option of switching between the original column names and the friendly names. You can right-click in the DSV diagram view and toggle between the friendly name and the original column name by selecting the Show Friendly Name option.
Named columns created in the DSV do not have a Friendly Name property because you will be defining the name to this column which will be easily understandable. Instead, named columns have the Expression property because each named column is just a SQL expression. You can only change this expression in the Named Column dialog and not in the Properties window.
Tables have the properties Data Source, Description, FriendlyName, Name, Schema, and Table Type. The Data Source indicates the name of the data source of the Table. The Table Type shows whether the object in the underlying data source is a table or a view. Similar to the columns, tables also have the option to specify a friendly name that can be used in the DSV.
Relationships between tables are provided with a name that includes the tables that participate in the relationship. Similar to named columns, named queries do not have a Friendly Name property. They have a property called Query Definition that shows the query used to specify the named query object.
Different Layouts in DSVs
The DSV designer provides you with two layouts to view the tables in the DSV. When you create a DSV the default layout selected is called the rectangular layout. In the default layout the lines representing the relationships between tables are composed of horizontal and vertical lines, and these emerge from any of the sides of the table. The second layout offered by the DSV designer is called the diagonal layout. In the diagonal layout the tables are arranged in the way such that the lines showing the relationships between tables are originating at the end points of the table so that these lines appear to be along the diagonal of the tables — hence the name "diagonal layout." You can switch between the rectangular layout and the diagonal layout in the DSV by right-clicking in the DSV designer and selecting the layout of your choice. Figures 4-18 and 4-19 show the rectangular and diagonal layout, respectively, of the Internet Sales diagram.
Validating Your DSV and Initial Data Analysis
The relationships specified in the DSV will be used in creating your dimensions and cubes. Therefore, validating your DSV is crucial to your data warehouse design. The DSV designer provides a first level of validation when you specify relationships. If the data types of column(s) involved in the relationship do not match, the DSV will not allow you to establish the relationship. This forces you to make sure you cast the data types of the column(s) involved in the relationships appropriately. You might need another level of validation by looking at the data within each table. You can do this by issuing queries to the tables in a relational data source. The DSV provides a way of looking at sample data for validation. A few validations you can do within the DSV by looking at sample data are as follows:
-
Looking at the fact table data helps you in making sure this table contains fact data, the primary key has been specified correctly, and appropriate relationships needed for dimensions are established.
-
Analyzing the dimension table's sample data ensures that you have all the relationships established between the fact and dimension and any relationships within the table are established correctly. For example, if you have an Employee table that contains an employee and his manager, you might want to establish a relationship so that your model can take advantage of this.
In addition, a sample of data from the tables in the DSV helps you in identifying the measures of the cube as well as the hierarchies of each dimension. Analyzing sample data in DSV also helps you to identify dimensions that can be created from the fact table data. The analysis of sample data within the DSV is even more important in creating your Data Mining models. You learn more about analyzing the data with respect to Data Mining in Chapter 13.
To see a sample of the data you need, right-click a table in the DSV and select Explore Data. You can now see rows from the underlying table presented within the Explore <tablename> Table window as shown in Figure 4-20. The data presented is only a subset of the underlying table. By default the first 5,000 rows are retrieved and shown within this window. You can change the number of rows retrieved by clicking the Sampling Options button. Clicking the Sampling Options button launches the Data Exploration options where you can change the sampling method, sample count, and number of states per chart which is used for displaying data in the chart format. Once you have changed the sample count value you can click the Resample Data button to retrieve data based on the new settings. The Explore Table window has four tabs: Table, Pivot Table, Chart, and Pivot Chart. The Table tab shows the raw sampled data from the data source as rows and columns with column headings.
When you click the Pivot Table tab you get an additional window called PivotTable Field List that shows all the columns of the table, as shown in Figure 4-21. You can drag and drop these columns inside the pivot table in the row, column, details, or filter areas. The values in the row and column provide you with an intersection point for which the detailed data is shown. For example, you can drag and drop ProductKey, CustomerKey, and Sales Amount to the row, column, and detail data areas. The pivot table now shows you sales amount of each product by each customer. The pivot table actually allows you to view multidimensional data from a single table. You learn more about pivot tables in Chapter 15.
Analysis Services analyzes the sample data, identifies the most important columns within the table, and provides you distributions in the Chart tab. The Pivot Chart tab provides you functionality similar to the Pivot Table tab but in a chart view. The Chart and Pivot Chart tabs are typically used to do an initial analysis of the data so that appropriate columns can be used to create good Data Mining models. You learn more about these in Chapter 14.