DataStage OSH Script

Reading Time: 3 minutes

The IBM InfoSphere DataStage and QualityStage Designer client creates IBM InfoSphere DataStage jobs that are compiled into parallel job flows, and reusable components that execute on the parallel Information Server engine. It allows you to use familiar graphical point-and-click techniques to develop job flows for extracting, cleansing, transforming, integrating, and loading data into target files, target systems, or packaged applications.

The Designer generates all the code. It generates the OSH (Orchestrate SHell Script) and C++ code for any Transformer stages used.
Briefly, the Designer performs the following tasks:
* Validates link requirements, mandatory stage options, transformer logic, etc.
* Generates OSH representation of data flows and stages (representations of
framework “operators”).
* Generates transform code for each Transformer stage which is then compiled
into C++ and then to corresponding native operators.
* Reusable BuildOp stages can be compiled using the Designer GUI or from
the command line.

The osh command is the main program of the InfoSphere Parallel Engine. This command is used by DataStage to perform several different tasks including parallel job execution and dataset management. Normally, there is no need to run this command directly but sometimes it is useful to use it for troubleshooting purposes.

To run this command there are 3 environment variables that must be set. These are:

  1. APT_ORCHHOME should point to Parallel Engine location
  2. APT_CONFIG_FILE should point to a configuration file
  3. LD_LIBRARY_PATH should include the path to the parallel engine libraries.

Please note that the name of this environment variable may take a different name (such as LIBPATH in AIX or SLIB_PATH in HP-UX) depending on your Operating System. Note: This variable does not need to be set in Windows environments.

A short introduction to OSH:

  • OSH is a powerful PX-Engine Orchestrate Shell which can be used to perform various operations including debugging DataStage jobs to see what is happening without using clients.
  • OSH uses the familiar syntax of the UNIX shell. such as Operator name,
    schema, operator options (“-name value” format), input (indicated by n< where n is the input#), and output (indicated by the n> where n is the output #).
  • Comment blocks introduce each operator, the order of which is determined by
    the order stages were added to the canvas.
  • Virtual data sets (in memory native representation of data links) are
    generated to connect operators.
  • For every operator, input and/or output data sets are numbered sequentially
    starting from zero.

Framework (Information Server Engine) terms and DataStage terms have equivalency. The GUI frequently uses terms from both paradigms. Runtime messages use framework terminology because the framework engine is where execution occurs. The following list shows the equivalency between framework and DataStage terms:
* Schema corresponds to table definition
* Property corresponds to format
* Type corresponds to SQL type and length
* Virtual data set corresponds to link
* Record/field corresponds to row/column
* Operator corresponds to stage

Note: The actual execution order of operators is dictated by input/output designators, and not by their placement on the diagram. The data sets connect the OSH operators. These are “virtual data sets”, that is, in memory data flows. Link names are used in data set names — it is therefore good practice to give the links meaningful names.