Runtime Column Propagation in DataStage

Reading Time: 3 minutes

WHAT IS RCP IN DATASTAGE?

InfoSphere DataStage is also flexible about meta data. It can handle the situation in case meta data is not fully defined.

When we send the data from source to the target, sometimes we need to send only required columns. You can define part of your schema and specify that, if your job encounters additional columns that are not defined in the meta data when it actually runs, it will adopt these extra columns and propagate them through the rest of the job.  Which is called as Runtime Column Propagation (RCP).

RCP can be enabled for a project via the Administrator client, and set for individual links via the Output Page Columns tab for most stages, or in the Output page General tab for Transformer stages.

RCP Enable/Disable done at:

Project level:  in Administrator project properties 

Job level: Job properties General tab

Stage/s:  Link Output Column tab

If run time column propagation is enabled in the DataStage Administrator, you can select the Run time column propagation to specify that columns encountered by a stage in a parallel job can be used even if they are not explicitly defined in the meta data. You should always ensure that run time column propagation is turned on if you want to use schema files to define column meta data.

Run time column propagation is used in case of partial schema usage. When we only know about the columns to be processed and we want all other columns to be propagated to target as they are.

USING RCP WITH SEQUENTIAL STAGES

Runtime column propagation (RCP) allows DataStage to be flexible about the columns you define in a job.

If RCP is enabled for a project, you can just define the columns you are interested in using in a job, but ask DataStage to propagate the other columns through the various stages.

So such columns can be extracted from the data source and end up on your data target without explicitly being operated on in between.

Sequential files, unlike most other data sources, do not have inherent column definitions, and so DataStage cannot always tell where there are extra columns that need propagating.

You can only use RCP on sequential files if you have used the Schema File property to specify a schema which describes all the columns in the sequential file.

You need to specify the same schema file for any similar stages in the job where you want to propagate columns. Stages that will require a schema file are:

Sequential File

File Set

External Source

External Target

Column Import

Column Export