Architecture of DataStage

Reading Time: 4 minutes

It integrates data on demand with a high performance parallel framework, extended metadata management, and enterprise connectivity.

* Supports the collection, integration and transformation of large volumes of data, with data structures ranging from simple to highly complex.
* Offers scalable platform that enables companies to solve large-scale business problems through high-performance processing of massive data volumes
* Supports real-time data integration.
* Enables developers to maximize speed, flexibility and effectiveness in building, deploying, updating and managing their data integration infrastructure.
* Completes connectivity between any data source and any application.

What is IBM WebSphere DataStage?

• Design jobs for Extraction, Transformation, and Loading (ETL)
• Ideal tool for data integration projects –such as, data warehouses, data marts, and system migrations
• Import, export, create, and manage metadata for use within jobs
• Schedule, run, and monitor jobs, all within DataStage
• Administer your DataStage development and execution environments Create batch (controlling) jobs

DataStage is a comprehensive tool for the fast, easy creation and maintenance of data marts and data warehouses. It provides the tools you need to build, manage, and expand them. With DataStage, you canbuild solutions faster and give users access to the data and reports they need.

With DataStage you can:
• Design the jobs that extract, integrate, aggregate, load, and transform the data for your data warehouse or data mart.
• Create and reuse metadata and job components.
• Run, monitor, and schedule these jobs.
• Administer your development and execution environments.

This image has an empty alt attribute; its file name is image-14.png
DataStage follows the client-server architecture. The different versions of DataStage have different types of client-server architecture. 

Client Components: DataStage is divided into below client components:  

Administrator This component of DataStage provides a user interface for administrating projects.  It also manages global settings and maintains interactions with various systems. The Administrator’s role ranges from setting up users and project properties to adding, moving and deleting projects. It specifies general server defaults and purging criteria.  A command interface is provided by Administrator for DataStage Repository.  It plays a crucial role in managing job scheduling options, user privileges, setting up parallel job defaults and specifying job monitoring limits.  

Manager To view and edit the contents of DataStage repository, the DataStage Manager is considered to be the main interface of the DataStage repository. Whether you want to browse the DataStage repository or store and manage reusable Meta data, DataStage Manager renders all these services. Tables and files layouts, jobs and transforms routines which are defined in the project are displayed by it.  It has a crucial role in managing all the tasks related to DataStage repository.  

Designer The designer helps in creating DataStage jobs or application by providing a design interface.  These jobs are then complied to form executable programs.  Each job explicitly specifies the source of data, required transforms and the destination of data as well.  DataStage Director is responsible for scheduling the executables which are created from compiling these jobs. Designer also provides a user friendly graphical interface. The server takes care of running these executable programs.  This module is used by developers. The extraction, cleansing, transformation, integration and loading of data is performed via a visual data flow method.  

Director As mentioned earlier, DataStage Director provides an interface which schedules executable programs formed by the compilation of jobs.  It runs, validates, schedules and monitors server jobs and parallel jobs. The Director interface plays a vital role in parallel processing.  The main users of this interface are testers and operators.   Note: In latest versions of DataStage Manager component is combined into DataStage Director.  

Server Components: DataStage is divided into below server components:  

Engine tier The engine tier includes the logical group of components (the InfoSphere Information Server engine components, service agents, and so on) and the computer where those components are installed. The engine runs jobs and other tasks for product modules.  

Services tier The services tier includes the application server, common services, and product services for the suite and product modules, and the computer where those components are installed. The services tier provides common services (such as metadata and logging) and services that are specific to certain product modules. The services tier also hosts InfoSphere Information Server applications that are web-based.  

Metadata repository tier The metadata repository tier includes the metadata repository. The metadata repository contains the shared metadata, data, and configuration information for InfoSphere Information Server product modules.