Microsoft SQL Server 2005 Integration Services
"IT'S ALL JUST BOXES AND LINES." BRIAN CHRISTIAN On its face, the Data Flow Task looks much like workflow in a package. It's all just boxes and lines. You drop components (boxes) onto the surface and then create paths (lines) between them. But that's where the similarity ends. Whereas workflow is a coordinated sequence of atomic units of work possibly executing in parallel, data flow is a set of sequential operations on streaming data necessarily executing in parallel. Workflow executes a task once and moves on (loops excluded), whereas data flow operations process data in parallel until the data stream terminates. The Data Flow Task is a high-performance data transformation and integration engine. It is the data-processing heart of Integration Services, capable of consuming data from multiple diverse sources, performing multiple transformations in parallel, and outputting to multiple diverse destinations. It supports a pluggable architecture for stock components that ship with Integration Services as well as custom components written by third parties. With the emergence of the Web and clickstream data, technologies such as Radio Frequency Identification (RFID) and automated scanners, marketplace globalization, increasing regulatory pressures, sophisticated data gathering and analysis techniques, diversity of data formats and sources, and the ever-increasing competitive pressure to make the right decisions at the right time, integrating enterprise data is more important than ever. The Data Flow Task addresses these challenges by providing a rich selection of stock transformations in the box, the ability to process huge volumes of data in very short time frames, and support for structured, semistructured, and nonstructured data sources. Because of these fundamental capabilities, the Data Flow Task is capable of meeting the demanding requirements of traditional Extract, Transform, and Load (ETL) data warehouse scenarios while still providing the flexibility to support emerging data integration needs. This chapter is a gentle introduction to the Data Flow Task and lays the groundwork for the rest of the chapters in this part. If you're already familiar with the Data Flow Task, this chapter probably won't help you much. On the other hand, you should at least skim it because it does present some important basics that you should understand before taking the data flow plunge. |