Microsoft SQL Server 2005 Integration Services

The following category of transforms provides metadata about the data flow. They're useful for answering how fast or how many or from where type questions. They help you understand what kind of data is flowing through the data flow and how much of it. For example, you can spread these transforms throughout your data flow and log the information they create for later scrutiny or troubleshooting. They are very lightweight transforms and minimally impact the performance of the data flow.

Audit

The Audit transform allows you to create a new column on a given path that will contain system variable information. This is helpful for understanding where data came from and how it was added to a table. Table 20.18 provides a profile for this component.

Table 20.18. The Audit Transform Profile

Property

Value

Description

Component Type

Transform

 

Has Custom Editor

Yes

 

Internal File I/O

No

 

Output Types

Synchronous

 

Threading

Single

 

Managed

No

 

Number Outputs

1

Number Inputs

1

 

Requires Connection Manager

No

 

Supports Error Routing

No

 

Setting Up the Audit Transform

To set up the Audit transform, simply select the audit type and then input the name of the new column that will contain the audit information. Figure 20.29 shows the Audit Transformation Editor with two new columns created.

Figure 20.29. The Audit Transformation Editor

Table 20.19 shows the system variables that the Audit transform might use to populate a new column and their description.

Table 20.19. System Variables Available to the Audit Transform

System Variable

Index

Description

ExecutionInstanceGUID

0

The GUID that identifies the execution instance of the package

PackageID

1

The unique identifier of the package

PackageName

2

The package name

VersionID

3

The version of the package

ExecutionStartTime

4

The time the package started to run

MachineName

5

The computer name

UserName

6

The login name of the person who started the package

TaskName

7

The name of the Data Flow Task with which the Audit transform is associated

TaskId

8

The unique identifier of the Data Flow Task that contains the Audit transform

Percentage and Row Sampling

The Percentage and Row Sampling transforms provide a way to select subsets of source data. This is useful in a number of ways. For example, when profiling data, it isn't necessary to process the entire set of data, but rather a statistically significant subset. Using the Percentage Sampling transform, you can reduce the total number of rows that flow through your data profiling process, reducing the total processing time while still ensuring that the row selection is random and relevant. Table 20.20 provides a profile for these components.

Table 20.20. The Percentage and Row Sampling Transform Profiles

Property

Value

Description

Component Type

Transform

 

Has Custom Editor

Yes

 

Internal File I/O

No

 

Output Types

Synchronous/Asynchronous

Precentage Sampling/Row Sampling

Threading

Single

 

Managed

No

 

Number Outputs

2

 

Number Inputs

1

 

Requires Connection Manager

No

 

Supports Error Routing

No

 

Setting Up the Row and Percentage Sampling Transforms

Both these transforms are very similar and have very similar editors. The only difference is that the Row Sampling transform allows you to specify the number of rows you want to randomly select and the Percentage Sampling transform allows you to select a specified percentage of rows. Because both are so similar, this section only explains how to set up the Percentage Sampling transform.

The Percentage Sampling transform has two outputs. One contains the dataset sample and the other contains the entire dataset sans the dataset sample. Set the Percentage of Rows settings to what you want to sample. You can name the outputs if you want. Only specify a random seed if you want to return the same rows in subsequent runs, which might be useful for testing purposes. If you leave the Random Seed setting unchecked, the transform picks a different random seed on each subsequent run and returns a different set of data. Figure 20.30 shows the Percentage Sampling Transformation Editor.

Figure 20.30. The Percentage Sampling Transformation Editor

RowCount

The RowCount transform provides a way to track how many rows are flowing through a particular path. You can use this information in logs as diagnostic information or for sending in an email as a notification of the work a package has completed. Table 20.21 provides a profile for this component.

Table 20.21. The RowCount Transform Profile

Property

Value

Description

Component Type

Transform

 

Has Custom Editor

No

 

Internal File I/O

No

 

Output Types

Synchronous

 

Threading

Single

 

Managed

No

 

Number Outputs

1

 

Number Inputs

1

 

Requires Connection Manager

No

 

Supports Error Routing

No

 

Setting Up the RowCount Transform

The RowCount transform is very simple to set up. Drop it on the design surface, drag an output from another transform, and specify a read/write variable to contain the rowcount. Figure 20.31 shows the Advanced Editor with the rowcount property correctly set.

Figure 20.31. Setting the rowcount variable name with the Advanced Editor

Категории