When transforming or cleansing data, you must define the data that you are working with.
You define the data by importing or defining table definitions. You can save the table definitions for use in your job designs.
Table definitions are the key to your DataStage project and specify the data to be used at each stage of a job. Table definitions are stored in the repository and are shared by all the jobs in a project. You need, as a minimum, table definitions for each data source and one for each data target in the data warehouse.
When you develop a job you will typically load your stages with column definitions from table definitions held in the repository. You do this on the relevant Columns tab of the stage editor. If you select the options in the Grid Properties dialog box, the Columns tab will also display two extra fields: Table Definition Reference and Column Definition Reference. These show the table definition and individual columns that the columns on the tab were derived from.
You can import, create, or edit a table definition using the Designer.
IMPORTING A TABLE DEFINITION
The easiest way to specify a table definition is to import it directly from the source or target database.
A new table definition is created and the properties are automatically filled in with the details of your data source or data target.
You can import table definitions from the following data sources:
- Assembler files
- COBOL files
- DCLGen files
- ODBC tables
- Orchestrate schema definitions
- Data sources accessed using certain connectivity stages.
- Sequential files
- Stored procedures
- UniData files
- UniData 6 tables
- UniVerse files
- UniVerse tables
- Web services WSDL definitions
- XML table definitions
- IBM InfoSphere DataStage connects to the specified data source and extracts the required table definition metadata. You can use the Data Browser to view the actual data in data sources from which you are importing table definitions.
To import table definitions in this way:
- Choose Import > Table Definitions > Data Source Type from the main menu.or most data source types, a dialog box appears enabling you to connect to the data source (for some sources, a wizard appears and guides you through the process).
- Fill in the required connection details and click OK. Once a connection to the data source has been made successfully, the updated dialog box gives details of the table definitions available for import.
- Select the required table definitions and click OK. The table definition metadata is imported into the repository.
In Datastage, Schemas are an alternative way for you to specify column definitions for the data used by parallel jobs.
The following schema format is used to read a fixed width file:
//Schema File is used to read Input data with out specifyinh metadata in the Sequential File stage
Now we use sequential file stage to read data from the datafile using schemafile.
In order to use schema file concept we need to enable runtime propagation in the job properties.
In the sequential file stage add Schema File option and give schema file name including path as shown like below.