Join Stage in DataStage

Reading Time: 2 minutes

The Join stage is a processing stage. It performs join
operations on two or more data sets input to the stage and then outputs the
resulting data set.

The
Join stage is one of three stages that join tables based on the values of key
columns.

The
three stages differ mainly in the memory they use, the treatment of rows with
unmatched keys, and their requirements for data being input.

In
the Join stage, the input data sets are notionally identified as the
“right” set and the “left” set.

It
has any number of input links and a single output link.

The
data sets input to the Join stage must be key partitioned and sorted in
ascending order.

This
ensures that rows with the same key column values are located in the same
partition and will be processed by the same node.
The
stage can perform one of four join operations:

Inner It joins two or more
tables and returns only those records which satisfy join condition. Records
whose key columns do not contain equal values are dropped.

Left
outer
 transfers
all values from the left data set but transfers values from the right data set
and intermediate data sets only where key columns match. The stage drops the
key column from the right and intermediate data sets. puts NULL wherever
unmatched record from 2nd (Right) table.

Right
outer
 transfers
all values from the right data set and transfers values from the left data set
and intermediate data sets only where key columns match. The stage drops the
key column from the left and intermediate data sets. puts NULL wherever
unmatched record from 2nd (Left) table.

Full
outer 
 It
joins two or more tables and returns both matched and unmatched records from
all tables. (Full outer joins do not support more than two input links.)

Options:

Join
Keys/Key – Type: Input Column

Name
of input column you want to join on. Columns with the same name must appear in
both input data sets and have compatible data types.

Join
Type – Type of join operation to perform