Lookup Stage in DataStage

Reading Time: 4 minutes

The Lookup stage is a processing stage. It is used to perform lookup operations on a data set read into memory from any other Parallel job stage that can output data.
The Lookup stage is most appropriate when the reference data for all lookup stages in a job is small enough to fit into available physical memory. Each lookup reference requires a contiguous block of shared memory. If the Data Sets are larger than available memory resources, the JOIN or MERGE stage should be used.
Lookup stages do not require data on the input link or reference links to be sorted. Be aware, though, that large in-memory lookup tables will degrade performance because of their paging requirements.
Normal lookup: All the data from the database is read into memory, and then lookup is performed.
Sparse  lookup:  For  each  incoming  row  from  the  primary  link,  the  SQL  is  fired  on database at run time.
Range Lookup: Range Lookup is going to perform the range checking on selected columns.
For Example: — If we want to check the range of salary, in order to find the grades of the employee than we can use the range lookup.
Lets look at the example shown below.
Emp ID                  EmpName         Dept
1001                       John                     IT
1002                       Russ                      IT
1003                       Marie                    BS
Emp ID                  Salary                    Dept      Quarter
1001                       2000                       IT            Q1
1001                       3000                       IT            Q2
1001                       4000                       IT            Q3
Now if you use the lookup stage the with Emp iD as the key then the output  would be as below
EMp ID                 Salary                    Dept                      EmpName           Quarter
1001                       2000                       IT                            John                     Q1
But if you have a closer look at the data we can see that the reference table actually has three records for that ID. However your lookup stage actually only retrieved the one record. Now if you need to retrieve all 3 records for that ID then you will have to Go to -> constraints page of the lookup stage ->  ‘Multiple rows returned from link’ Select the reference link
This will modify your output as below
EMP ID                 Salary                    Dept                      EmpName           Quarter
1001                       2000                       IT                            John                     Q1
1001                       3000                       IT                            John                     Q2
1001                       4000                       IT                            John                     Q3

Only one reference link in the lookup stage can return multiple rows. This can’t be done for more than one reference link and can only be done for   in-memory lookups