Change Capture Stage

The Change Capture Stage captures the changes between two input datasets by comparing them based on a key column. The two input datasets are linked with the Change Capture stage using default link names, 'Before' and 'After'. The stage produces a change dataset, whose table definition is transferred from the after dataset's table definition, with an addition of one column: a change code with values encoding the four actions - insert, delete, copy, and edit.

Options

Change Keys/Key: Name of the column to be used as a key.
Change Values/Value: Type: Input Column, Name of a value column. When a before and after row are determined to be copies based on the difference keys, the value columns can then be used to determine if the after row is an edited version of the before row.
Change Mode: Defines keys & Values explicitly or implicitly. Choose 'All keys, Explicit values' to specify that value columns must be defined, but all other columns are key columns unless excluded. Choose 'Explicit Keys, All Values' to specify that key columns must be defined, but all other columns are value columns unless they are excluded.

Example:

Consider the following two datasets: Before Dataset - COL_1 A; After Dataset - COL_1 C. If we pass these datasets through the Change Capture stage, followed by a Sequential File, and add COL_1 and CHANGE_CODE column to the output, the result will be: COL_1, CHANGE_CODE, A 2, B 0, C 1.

Change Apply Stage

The Change Apply stage is a processing stage. It takes the change dataset, that contains the changes in the before and after datasets, from the Change Capture stage and applies the encoded change operations to a before dataset to compute an after dataset.

Understanding the Change Capture Stage in DataStage

The Change Capture Stage is a crucial component of IBM InfoSphere DataStage, a powerful data integration tool. This stage enables you to capture changes in a database, making it an essential part of ETL (Extract, Transform, Load) processes. Let's delve into the intricacies of this powerful feature.

What is the Change Capture Stage?

The Change Capture Stage reads data from a database table and identifies changes that have occurred since the last run. It captures these changes, stores them in a buffer, and makes them available for further processing.

When to Use the Change Capture Stage

How Does it Work?

The Change Capture Stage operates based on database timestamps or keys, and it can employ three methods to capture changes:

Example: Setting Up a Change Capture Stage

Conclusion

The Change Capture Stage in DataStage is a valuable asset for data integration tasks, offering efficient incremental extraction and ensuring data consistency. By understanding its operation and application, you can streamline your ETL processes and manage large datasets more effectively.

Change Capture Stage in DataStage - Data Warehousing