Data Modelling with Erwin

Reading Time: 3 minutes

Definition:

Data modeling is a process used to define and analyze data requirements needed to support the business processes. The process of data modeling involves professional data modelers working with business stakeholders, as well as potential users of the information system.

Types:

Conceptual data model:

It is a set of technology independent specifications about the data and is used to discuss initial requirements with the business stakeholders.

Logical data model:

The conceptual data model is translated into a logical data model, which documents structures of the data the can be implemented in databases. Implementation of one conceptual model may require multiple logical data models.

Physical data model:

This is the last step in data model where the logical data model is transformed into a physical data model that organizes the data into tables, and accounts for access, performance and storage details.

Data modeling not only defines data elements, but also their structures and relationships between them.

Usage:

  • To assist business analysts, programmers, testers, manual writers, IT package selectors, engineers, managers, related organizations, clients to understand and use an agreed semi-formal model concepts of the organization and how they related to each other.
  • To manage data as a resource
  • For the integration of information systems
  • For designing databases/data warehouses (aka data repositories)

Erwin Data Modeler

CA Erwin Data modeler is a software tool for data modeling (data requirements analysis, database design etc.,) of custom developed information systems, including databases of transactional systems and data marts.

Erwin’s data modeling engine is based upon the IDEF1X method, although it now supports diagrams displayed with information engineering notation as well.

Features:

Conceptual data modeling

Logical data modeling

Physical data modeling

Logical to physical transformation

Includes an abbreviation/naming dictionary called “Naming standards editor” and a logical to RDBMS data type mapping facility called “Data type standards editor”, both of which are customizable with entries and basic rule enforcement

Forward engineering

Once the database designer is satisfied with the Physical data model, the tool can automatically generate a SQL Data definition language (DDL) script that can be directly executed in a RDBMS environment or saved to a file.

Reverse engineering

If an analyst needs to examine and understand an existing data structure, ERwin will depict the physical data base objects in an Erwin model file.

Model to model comparison

The ‘complete/compare” facility allows an analyst or a designer to view the differences between two model files (including real-time reverse engineered files), for instance to understand the changes between two versions of a model.

An UNDO feature is available from version 7.

Bulk editor: CA Erwin data modeler has a unique bulk editor feature that enable users to edit a collection of objects in a single editing session.

Active Model Templates: Derive new models from template models. Change in template model can be automatically synchronized into all derived models.

API (Application programmable interface) support: Erwin provides a programming interface to automate tasks that involves reading/writing of data models.

Domains (Column names): Erwin has a concept of domains that defines the characteristics of attributes/columns in a model. It acts as a library of attributes/columns so it can be reused across entities/tables and centrally managed through a flexible property inheritance/over-ride mechanism.

Model repository and version management: Erwin has a very powerful integrated model repository for change management (Workgroup edition) that support metadata governance, versioning, multi-user modeling etc.,

Trial version of the Erwin software can be downloaded at erwin.com

SMP vs MPP Architecture

Reading Time: 2 minutes

Databases such as Oracle, DB2, Sybase

Symmetrical Multi-Processing Architecture (SMP) was once the champion of the Data Warehouse. It was rivalled by MPP as SMP had the below disadvantages:

  • Failures of components did not result in graceful decline in performance. Rather, the whole system failed and data was unrecoverable until the failure was resolved
  • Upon recover, the failed components were unable to rejoin the system with ease
  • Failures should not result in data loss
  • The system should be scalable and hence support increased load capacity and performance agility

MPP Architecture = Share Nothing = Divide and Conquer

Databases such as Teradata

MPP systems consist of very large numbers of processors each processor has its own memory, backplane and storage. The no shared-resources approach of pure MPP systems allows nearly linear scalability

High availability is another advantage – when one node fails, another can take over.

In Teradata’s MPP Architecture , processor-RAM-storage disk pairs (“nodes”) operating in parallel divide the workload to execute queries over large sets of data. Each processor communicates with its associated disk drive to get raw data and perform calculations. One SMP Host collects intermediate results and assemble the query response for delivery back to the requesting application. With no contention for resources between MPP nodes, this architecture does allow for scalability to petascale database sizes. A major weakness of this architecture, however, is that it requires significant movement of data from disks to processors for BI queries. MPP Architecture

To Continue to dive dep into Teradata

Get full Teradata Course free

Learn About Netezza datbase

Netezza

Reading Time: 3 minutes

  • Creating User and User Management
  • Managing Workloads

our OTHER COURSES

CLOUD COMPUTING

Cloud computing is the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct active management by the user

BIG DATA

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.

DATA WAREHOUSING

Data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. 

WEB DEVELOPMENT

Web development is the work involved in developing a Web site for the Internet or an intranet. It can range from a simple single static page to complex Internet applications.

APP DEVELOPMENT

Mobile app development is the act or process by which a mobile app is developed for mobile devices, such as personal digital assistants, enterprise digital assistants or mobile phones.

SOFTWARE TESTING

Software testing is an investigation conducted to provide stakeholders with information about the quality of the software product or service under test.

NPS AMPP Architecture & Various Netezza 7.2 appliance models

  • Netezza High Availability Architecture (Clustering, Mirroring, failover)
  • Installing the Netezza system and client software
  • Installing Netezza Emulator for day-to-day practice
  • Nz Admin: GUI Admin Tool (Installation & Set-up)


Netezza Command Line Interface (CLI)

  • Manage NPS with CLI commands
  • Manage User access to Netezza Databases
  • Monitoring Netezza and Linux logs
  • Netezza Events (Set-up & Monitoring)


Databases & Tables

  • Data Distribution (Hash, Random), Cluster Base Tables, Table Skew
  • Generate Statistics, Zone Maps, Materialized Views, Groom Table
  • Backup & Restore (Host Level, Database Level, Table Level)
  • Database Refreshes & Migrations

Netezza Appliance migration (For Example: 6.x to 7.x migration)

  • Data Loading/Unloading using External Tables, NZLOAD, NZ_MIGRATE
  • Data Loading/Unloading using GUI Tools
  • Optimizer and query plans
  • Query history collection & Reporting


Netezza Replication/DR Architecture

  • Netezza performance improvement techniques
  • Common DBA activities such as SPU replacements, etc
  • ODBC/JDBC/OLEDB Client Connectivity
  • Working with IBM Netezza Support to resolve issues
Complete Netezza Course
Take your journey towards learning Netezza in just 30 days, just 1 hour a day. Joins 1000s  of others who have benefited by upskilling and boosting their careers
Free Netezza Course
Netezza TwinFin Arch.
Learn about Netezza TwinFin Architecture and how it is unique
TwinFin Architecture
Sequences
What are Sequences and how are they used to generate unique IDs
Netezza Sequences
Netezza Overview
Quick introduction to what is Netezza and how is it used in Data Warehousing
Netezza Overview
nzstats command
using nzstats to gather statistics and why is it useful
nzstats

Netezza System Tables/ Views

Reading Time: 3 minutes

Netezza System Tables and Views

Below are the list of some commonly used system tables and views:
View Table Description
_v_sys_columns Return a list of all columns of table available in database. This is very important system view that can be used to search columns.
_v_aggregate _t_aggregate Returns a list of all defined aggregates
_v_synonym _t_synonym Returns a list of all synonyms in database
_v_objects Lists the all objects like tables, view, functions
_v_qrystat Returns a query status
_v_qryhist Returns query history
_v_database _t_database Returns a list of all databases
_v_datatype Returns a list of all system data types
_v_function Returns a list of all defined functions
_v_group _t_group Returns a list of all groups
_v_groupusers Returns a list of all users of a group
_v_index _t_index Returns a list of all user indexes
_v_operator _t_operator Returns a list of all defined operators
_v_procedure Returns a list of all the stored procedures and their attributes
_v_relation_column Returns a list of all attributes of a relation, Constraints and other informations
_v_relation_column_def Returns a list of all attributes of a relation that have defined defaults
_v_sequence Returns a list of all defined sequences
_v_session Returns a list of all active sessions
_v_table Returns a list of all user tables
_v_table_dist_map Returns a list of all fields that are used to determine the table’s data distribution
_v_table_index Returns a list of all user table indexes
_v_user Returns a list of all users
_v_usergroups Returns a list of all groups of which the user is a member
_v_view Returns a list of all user views
_v_load_status Display the information about the progress of loads that are running on the system

System table or view in Netezza that shows the number of records for each table

SELECT TABLENAME,
       OBJTYPE,
       OWNER,
       CREATEDATE,
       USED_BYTES,
       USED_BYTES/1073741824 as USED_GB, 
       RELTUPLES as "ROWS"
 FROM _V_TABLE_ONLY_STORAGE_STAT
 WHERE OBJCLASS = 4905 OR OBJCLASS = 4911
 ORDER BY TABLENAME;
Netezza System Tables and Views Examples
Example queries SYSTEM.ADMIN(ADMIN)=> select SEQNAME,SEQNAME,OWNER from _v_sequence limit 2; SEQNAME | SEQNAME | OWNER -------------+-------------+------- _S_REPL_CSN | _S_REPL_CSN | ADMIN _S_HWID | _S_HWID | ADMIN (2 rows) Views SELECT * FROM _V_SYS_VIEW; will give you the list of system views. SELECT * FROM _V_VIEW; will also list some of the system views plus any user-defined views. V_OBJECTS SYSTEM.ADMIN(ADMIN)=> select * from _v_objects; OBJID | OBJNAME | OWNER | CREATEDATE | OBJTYPE | OBJCLASS | DESCRIPTION | SCHEMA | SCHEMAID ——–+———+——-+———————+———-+———-+————-+——–+———- 213243 | SEQ1 | ADMIN | 2016-09-04 07:25:19 | SEQUENCE | 4909 | | ADMIN | 6 (1 row)