InfoSphere DataStage offers flexibility when it comes to handling metadata. It can manage situations where the metadata is not fully defined.
When data is sent from source to target, sometimes only required columns are needed. You can define a portion of your schema and specify that if your job encounters additional columns during runtime that are not defined in the metadata, it will adopt these extra columns and propagate them throughout the job. This process is called Runtime Column Propagation (RCP).
RCP can be enabled for a project via the Administrator client, and set for individual links via the Output Page Columns tab for most stages, or in the Output page General tab for Transformer stages.
If run time column propagation is enabled in the DataStage Administrator, you can select the Run time column propagation to specify that columns encountered by a stage in a parallel job can be used even if they are not explicitly defined in the metadata. It's essential to ensure that run time column propagation is turned on if you want to use schema files to define column metadata.
Run time column propagation is useful for partial schema usage. When we only know about the columns to be processed, and we want all other columns to be propagated to the target as they are.
Runtime column propagation (RCP) provides DataStage with flexibility regarding the columns defined in a job.
If RCP is enabled for a project, you can just define the columns you are interested in using in a job, but ask DataStage to propagate the other columns through the various stages.
So such columns can be extracted from the data source and end up on your data target without explicitly being operated on in between. However, it's important to note that sequential files don't have inherent column definitions, so DataStage cannot always tell where there are extra columns that need propagating.
To use RCP with sequential files, you must use the Schema File property to specify a schema that describes all the columns in the sequential file. You should specify the same schema file for any similar stages in the job where you want to propagate columns.
In the realm of ETL (Extract, Transform, Load) tools, DataStage by IBM stands out as a powerful solution for data integration. One of its key features is the ability to handle and propagate columns dynamically at runtime, a functionality we refer to as 'Runtime Column Propagation'. This article aims to shed light on this important aspect of DataStage.
Runtime Column Propagation is the ability for columns created or selected in one task to be made available for use in subsequent tasks within a DataFlow. This functionality allows for greater flexibility in designing and executing your data pipelines, as it enables the creation of dynamic, adaptable workflows.
Runtime Column Propagation can be particularly useful in scenarios where the structure of the input data is not fixed or predictable. For example:
To enable Runtime Column Propagation in DataStage, follow these steps: