Netezza CREATEXID DELETEXID

Reading Time: 2 minutes

Note: TRUNCATING TABLE cannot be UNDONE

Note: Netezza “soft” deletes until the groom process runs.  i.e. deleted data is really still there.  

Note: Netezza Updates are just soft deletes and inserts. So, they can be recovered too

  • Every insert, update, or delete transaction is assigned a sequential transaction id or xid.
  • To indicate which transaction last updated the row, createxid indicates the latest transaction for each row
  • To indicate deleted rows deletexid, assigned to each row is set to 0 if this is a readable (not a deleted row).  When the row gets deleted, this column gets populated with the transaction id assigned to the delete or update statement
  • Updates are essentially deletes that set the transaction id as deletexid to old row and createxid in the new row
  • Groom would make the soft deletes final

Typically, when you run a select you will not see rows that have a deletexid not equal to zero.  To change that, set the below option

set show_deleted_records = true

How to restore Netezza records

Here is a simple select statement:

select createxid,deletexid, * from table
Netezza Createxid and Deletexid

If you want to see just the deleted rows , just select where deletexid is not zero

select createxid,deletexid, * from table where deletexid !=0

Once you can see your deleted data, and figure out which transaction you are trying to undo, you can simply re-insert the data

insert into table
select * from table where deletexid=142233; –transaction id from delete.

To undo an update, just re-insert the deleted rows and delete the inserted rows.

insert into table 
select * from table where deletexid=146443 ;–transaction id from update
delete from table where createxid=146443; –transaction id from update

To undo an insert, use the below SQL

delete from table where createxid=1434443; –transaction id from insert

Is Netezza near end of life?

Reading Time: 3 minutes

Appliances that end of support

IBM has announced the end-of-support (EOS or End-of-life) dates for three generations of Netezza / IBM PureData analytics appliances. Additionally, Netezza Skimmer, Twinfin, Striper models have already passed their EOS date before the mid of 2020. Mako models will reach end of support by October 2023.

Model NameModel NumberGA DateEoM DateEoS Date
SkimmerN100-104/11/201120/06/201430/06/2019
Twinfin-3N1001-00204/11/201120/06/201430/06/2019
Twinfin-6N1001-00504/11/201120/06/201430/06/2019
Twinfin-12N1001-01004/11/201120/06/201430/06/2019
Twinfin-24N1001-02004/11/201120/06/201430/06/2019
Twinfin-36N1001-03004/11/201120/06/201430/06/2019
Twinfin-48N1001-04004/11/201120/06/201430/06/2019
Twinfin-72N1001-06004/11/201120/06/201430/06/2019
Twinfin-96N1001-08004/11/201120/06/201430/06/2019
Striper-3N2001-00501/02/201316/05/201430/06/2019
Striper-6N2001-01001/02/201316/05/201430/06/2019
Striper-12N2001-02001/02/201316/05/201430/06/2019
Striper-24N2001-04001/02/201316/05/201430/06/2019
Striper Gen 2N2002-00205/05/201530/06/201530/06/2020
Striper Gen 2N2002-00505/05/201530/06/201530/06/2020
Striper Gen 2N2002-01005/05/201530/06/201530/06/2020
Striper Gen 2N2002-02005/05/201530/06/201530/06/2020
Striper Gen 2N2002-04005/05/201530/06/201530/06/2020
Mako 1/4 RackN3001-00217/10/201410/04/201810/04/2023
Mako-3N3001-00517/10/201410/04/201810/04/2023
Mako-6N3001-01017/10/201410/04/201810/04/2023
Mako-12N3001-02017/10/201410/04/201810/04/2023
Mako-24N3001-04017/10/201410/04/201810/04/2023

IBM is not extending support

IBM is NOT offering an extended support service beyond their announced end-of-support date. When Netezza devices are reaching end of life, some companies are choosing to temporarily support it with their existing team without help from IBM until the final strategic solution is in place. Also, there are 3rd parties who are supporting service for these devices. examples are

  1. https://www.natrinsic.com/netezza/?hsLang=en
  2. Data Warehouse Support Services with Smart Associates (smart-associates.biz)

Most are finding a replacement

Most companies I know are finding a replacement solution and migrating their data warehouses. Snowflake or Redshift are the two most common choices I have heard. What are your companies migrating toward?

Staying with IBM Netezza

What happened to Netezza? - Journey to AI Blog (ibm.com)

Read the blog entry from IBM. They are not strategically moving away from Netezza technology itself. But the popular Netezza appliance that enabled a lot of companies achieve their Data warehousing strategy is changing. They are suggesting these 2 major alternatives:

  • IBM® Netezza® Performance Server for the cloud - support current workloads but is adapted for cloud performance
  • For on premises, in the IBM Cloud Pak® for Data System hyperconverged architecture, Netezza Performance Server clients connect to the Netezza Performance Server host, which runs as a Docker container named ipshost1 on the system Control Plane node.
Netezza migration options

Move to the cloud

In conclusion, most solutions are moving toward cloud workloads and if your industry is still hesitant to move towards cloud due to regulatory, privacy or lesser control, you could find on premises alternatives. Most client I have heard of are moving to AWS (Redshift) or Snowflake on (AWS or Azure) as IBM stops support and trying to get you into their IBM cloud solution.

Complete Netezza Course
Take your journey towards learning Netezza in just 30 days, just 1 hour a day. Joins 1000s  of others who have benefited by upskilling and boosting their careers
Free Netezza Course
Netezza TwinFin Arch.
Learn about Netezza TwinFin Architecture and how it is unique
TwinFin Architecture
Sequences
What are Sequences and how are they used to generate unique IDs
Netezza Sequences
Netezza Overview
Quick introduction to what is Netezza and how is it used in Data Warehousing
Netezza Overview
nzstats command
using nzstats to gather statistics and why is it useful
nzstats

Netezza Temporary Table Space

Reading Time: 2 minutes

In IBM PureData System for Analytics, each disk is roughly equally divided into 3 partitions:
1. The first partition holds the primary copy of one dataslice
2. The second partition holds a mirror copy of a dataslice from a different disk.
3. The temp partition is used to store data that is generated during query execution. Temporary tables, intermediary results, etc.

• When we use temporary tables, it is this third partition, the temp space, that we risk filling up.
• It is important to only create the minimum number of temporary tables needed at a time. It is also important to only make the temporary tables as large as needed.
• Temporary tables are local to a session rather than a specific query. This means that the table may be used repeatedly within a user session. That is the major difference between Temporary tables (multiple use) and derived tables (single use). A Temporary table is materialized in swap space. However, it is not discarded until the session ends or when the user manually drops it.

Storage layers


When explicitly creating large temporary tables used in JOINs it is recommended to GENERATE STATISTICS on the temporary table.

Netezza needs less Administration

Reading Time: < 1 minute
Netezza vs Tradistional RDBMS
Logical Model is enough to create a Table
No Indexes, tuning, rebuilding of indexes or space for the indexes
No Physical tuning
Distribute data by columns or by RANDOM
No Storage Adiminstration (No tablespace or dbspace allocation)
No page/block sizing configuration
No Software installation needed
No need for a large team of Database administrators

Netezza AMPP Architecture

Reading Time: 2 minutes

Primary Tier

  • Netezza’s AMPP architecture is a two-tiered system designed to handle very large queries from multiple users. The first tier is a high-performance Linux SMP host. A second host is available for fully redundant, dual-host configurations.
  • The host compiles queries received from applications, and generates query execution plans.
  • It then divides a query into a sequence of sub-tasks, or snippets, that can be executed in parallel, and distributes the snippets to the second tier for execution. The host returns the final results to the requesting application.

Secondary Tier

  • The second tier consists of dozens to hundreds or thousands of Snippet Processing Units (SPUs) operating in parallel.
  • Each SPU is an intelligent query processing and storage node, and consists of a powerful commodity processor, dedicated memory, a disk drive and a field-programmable disk controller with hard-wired logic to manage data flows and process queries at the disk level.
  • The massively parallel, shared-nothing SPU blades provide the performance advantage of MPP.

Overall

  • Nearly all query processing is done at the SPU level, with each SPU operating on its portion of the database.
  • All operations that lend themselves easily to parallel processing including: record operations, parsing, filtering, projecting, interlocking and logging, are performed by the SPU nodes, significantly reducing the amount of data required to be moved within the system.
  • Operations on sets of intermediate results, such as sorts, joins and aggregates, are executed primarily on the SPUs, but can also be done on the host, depending on the processing cost and complexity of that operation. 

Netezza Query Plan Analysis

Reading Time: 5 minutes

Netezza Explain Plan

As with most database management systems, Netezza generates a query plan for all queries executed in the system. The question plan determines ideal execution path by Netezza to fulfill each query. The Netezza component which generates and determines the optimal query path from the available alternatives is called the query optimizer and it relies on the number of data available about the database objects involved in the query executed. The Netezza query optimizer calculates the cost for the different execution path and chooses the path with the least cost.


Optimizer

The optimizer depends on the statistics about the database objects and can only give the best results when it has the most up to date results. It relies on:

  • Column data dispersion such as distinct values, unique value, and null values
  • # of rows in the tables
  • Minimum and maximum values stored in columns involved in the where clause or having clause of the query
  • Number of extents in each tables and the total number of extends on the data slice with the largest skew


Given that the optimizer depends on the statistics to decide the best execution plan, it is imperative to update the database statistics using the “GENERATE STATISTICS” commands. In addition, the optimizer considers the FPGA capabilities of Netezza to determine the ideal plan.


When coming up with the plan for execution, the optimizer looks for

  • optimal path for data scan operations i.e. to read data from the tables
  • optimal path to join the tables in the query like hash join, merge join, nested loop join
  • optimal path to distribute data in between SPUs like redistribute or broadcast
  • The order in which tables can be joined in a query join involving multiple tables
  • Opportunities to rewrite queries to improve performance like
    • Pull up of sub-queries
    • Push down of tables to sub-queries
    • De-correlation of sub-queries
    • Expression rewrite

Query Plan

Netezza breaks query execution into units of work called snippets which can be executed on the host or on the snippet processing units (SPU) in parallel. Queries could have either one or multiple snippets which will be executed in sequence on the host or the SPUs depending on what is done by the snippet code. The snippet code generated for SQL query execution helps with scanning and retrieving data from a table, sorting data, performing data aggregation, joining data retrieved from tables, grouping of data, dynamic distribution of data to help query performance. When the SQL query is executed, Netezza dynamically prepares the execution plan and the C code to execute each snippet of the plan. 

Netezza Explain Verbose

Plan Generation

The recent execution plans are stored in the data.<ver>/plan directory under the /nz/data directory. The snippet C code is stored under the /nz/data/cache directory and this code is used to compare against the code for new plans so that the compilation of the code can be eliminated if the snippet code is already available.

Apart from Netezza generating the execution plan dynamically during query execution, users can also generate the execution plan (without the C snippet code) using the “EXPLAIN” command. This process will help users identify any potential performance issues by reviewing the plan and making sure that the path chosen by optimizer is inline or better than expected. During the plan generation process, the optimizer may perform statistics generation dynamically to prevent issues due to out of statistics data particularly when the tables involved store large volume of data. The dynamic statistics generation process uses sampling which is not as perfect as generating statistics using the “GENERATE STATISTICS" command which scans the tables.

Joins 

• Use Same Data Type and Length for columns that are often used for joining tables, so that the query execution can be efficient which in turn helps queries execute faster 

• The largest table should be joined last in the query. 

• Use joins over correlated sub queries.   

Constraints 

Define all constraints and relationships between objects. Even though Netezza doesn’t enforce them other than the not null constraint and defaults, the query optimizer will still use these details to come-up with an efficient query execution plan.

Complete Netezza Course
Take your journey towards learning Netezza in just 30 days, just 1 hour a day. Joins 1000s  of others who have benefited by upskilling and boosting their careers
Free Netezza Course
Netezza TwinFin Arch.
Learn about Netezza TwinFin Architecture and how it is unique
TwinFin Architecture
Sequences
What are Sequences and how are they used to generate unique IDs
Netezza Sequences
Netezza Overview
Quick introduction to what is Netezza and how is it used in Data Warehousing
Netezza Overview
nzstats command
using nzstats to gather statistics and why is it useful
nzstats

Netezza storage

Reading Time: 3 minutes
Netezza storage disks

The primary partition in each disk is used to store user data like database tables, the mirror stores a copy of the primary partition of another disk so that it can be used in the event of disk failures and the temp/swap partition is used to store the data temporarily like when the appliance does data redistribution while processing queries. The logical representation of the data saved in the primary partition of each disk is called the data slice. When users create database tables and loads data into it, they get distributed across the available data slices. Logical representation of data slices is called the data partition.

For TwinFin systems each S-Blade or SPU is connected to 8 data partitions and some only to 6 disk partitions (since some disks are reserved for failovers). There are situations like SPU failures when a SPU can have more than 8 partitions attached to it since it got assigned some of the data partitions from the failed SPU.

The SPU is connected to 8 data partitions numbered 0 to 7. Each data partition is connected to one data slice stored on different disks.

Netezza SPU S-Blade Disks

What happens when a disk fails - Immediately after the disk stops responding, the disk with the mirror will be used by the system to satisfy queries for data from primary and mirror data. This will also create a bottleneck which in-turn impacts query performances. In the meantime, the contents in the failed disk  is regenerated on one of the spare disks in the disk array. Once the regen is complete the SPU data partition is updated to point to the data slice on the new disk. The regen process removes the bottleneck of disk to perform optimally. In the situation where a SPU fails, the appliance assigns all the data partitions to other SPUs in the system. Pairs of disks which contains the mirror copy of each other’s data slice will be assigned to other SPUs which will result in additional two data partitioned to be managed by the target SPU.

Netezza Security

Reading Time: 3 minutes

Two levels of security:

  • OS Level security
  • Database Level security

OS Level security

Netezza host uses Linux operating system customized for performance and functionality required for the appliance. User access restrictions are setup using user ids, user groups and passwords. When installed, the appliance is configured with a “root” user id which is the Linux super user and the user id “nz” which is the Netezza system administrator id which is used to run Netezza on the host. The “root” user id can be used to create other user ids for users who need to access the appliance natively through the host command shell. Since the host access is required to perform very restricted tasks primarily administration tasks, the number of user ids created to access the appliance should be fairly small. Restrictions on what users can perform can be set by creating Linux user groups with different access restrictions and attaching the relevant users to the groups. Setting password selection rules like mix of alphabets, numbers, special characters, minimum password length etc. along with password expiry for users is a good practice.

Database Level Security

Access to databases is controlled using user ids and passwords which are separate from the OS level user id and password. If an user need to be able to access to a Netezza database natively through a “nzsql” session on the host, the user need to use a OS level user id and password to log in to the host and then need to invoke the “nzsql” command using the database level user id and password which has access to the particular database of interest. Access to databases, objects with in a database and the type of activities which can be performed on them are all controlled by the privileges granted to the user id to perform the task. Netezza also supports user groups as with the Linux operating system where privileges can be assigned to groups and similar users can be attached to the group so that it is easier to manage access to databases. When a user id is attached to more than one group the user id gets combination of all the privileges assigned to the groups to which the user id is attached to. The following is a sample “create user” statement

create user user122 with password ‘#rws23$sbWeXO@112’; 

The following is a sample “create group” statement

create group bdmodeller with user user1, user2, user3;

What is Netezza

Reading Time: 5 minutes
IBM Netezza

Netezza Twinfin is the advanced analytics and warehousing solution provided by IBM. It currently has been rebranded as IBM Puredata for analytics (PDA).

  • It was an offering from a company known as Netezza launched in 1999 and then got acquired by IBM in the year 2010. Ever since it has been developed as a subsidiary of IBM.
  • It is based on the AMPP (asymmetric massively parallel processing) architecture which has an SMP frontend to get the queries from the client and communicate with the MPP backend to do the processing
  • IBM Netezza Analytics’ advanced technology supports data warehousing and in-database analytics into a scalable, high-performance, massively parallel advanced analytic platform that is designed to work with petascale data volumes.

Netezza utilizes a restrictive design called Asymmetric Massively Parallel Processing (AMPP) which joins the enormous information processing proficiency of Massively Parallel Processing (MPP) where nothing (CPU, memory, stockpiling) is shared and symmetric multiprocessing to arrange the equal processing. The MPP is accomplished through an array of S-Blades which are workers on its own running its own working frameworks associated with plates. While there might be different items which follow comparable design, one extraordinary equipment part utilized by Netezza called the Database Accelerator card which is joined to the S-Blades. These quickening agent cards can play out a portion of the question processing stages while information is being perused from the circle rather than the processing being done in the CPU. Moving huge measure of information from the circle to the CPU and playing out all the phases of question processing in the CPU is one of the significant bottlenecks in the huge numbers of the data set administration frameworks utilized for information warehousing and investigation use cases.

The fundamental equipment segments of the Netezza machine are a host which is a Linux worker, which can convey to an array of S-Blades every one of which has 8 processor centers and 16 GB of RAM running Linux working framework. Every processor in the S-Blade is associated with plates in a circle array through a Database Accelerator card which utilizes FPGA innovation. Host is additionally liable for all the customer collaborations to the apparatus like dealing with information base questions, meetings and so on alongside dealing with the meta-information about the items like data set, tables and so on put away in the apparatus. The S-Blades among themselves and to the host can convey through an exclusively fabricated IP based superior organization.

Netezza S-Blades

The S-Blades are likewise alluded as Snippet Processing Array or SPA in short and every CPU in the S-Blades joined with the Database Accelerator card appended to the CPU is alluded as a Snippet Processor.

Netezza Hardware
Netezza S-Blades

Let us use the example of a Data Warehouse for a huge retail firm and one of the tables store the insights concerning the entirety of its 10 million clients. Likewise expect that there are 25 columns in the tables and the absolute length of each table column is 250 bytes. In Netezza the 10 million client records will be stored fairly equally across all the disks available in the disk arrays connected to the snippet processors in the S-Blades in a compressed form. At the point when an user queries for state Customer Id, Name and State who joined the retail firm in a specific period arranged by state and name, the below is how the processing will occur:

  • The host gets the query, parses and confirms the question, makes the code to be executed to by the snippet processors in the S-Blades and passes the code for the S-Blades
  • The snippet processors execute the code and as a feature of the execution, the information block which stores the information needed to fulfill the inquiry in a compacted structure from the circle connected to the snippet processor will be added something extra to memory. The Database Accelerator card in the snippet processor will un-pack the information which will remember all the sections for the table, at that point it will eliminate the undesirable segments from the information which in the event that will be 22 segments for example 220 bytes out of the 250 bytes, applies the where proviso which will eliminate the undesirable columns from the information and passes the modest quantity of the information to the CPU in the snippet processor. In customary information bases every one of these means are acted in the CPU.
  • The CPU in the snippet processor performs assignments like accumulation, whole, sort and so forth on the information from the information base quickening agent card and parses the outcome to the host through the organization.
  • The host combines the outcomes from all the S-Blades and plays out extra advances like arranging or accumulation on the information prior to imparting back the end-product to the customer.

Conclusion

  • The Netezza can deal with huge volume of information in equal and the key is to ensure that the information is disseminated properly to use the enormous equal processing.
  • Execute plans such that the vast majority of the processing occurs in the snippet processors; limit correspondence between snippet processors and insignificant information correspondence to the host.
FPGA Accelerator

Netezza

Reading Time: 3 minutes

  • Creating User and User Management
  • Managing Workloads

our OTHER COURSES

CLOUD COMPUTING

Cloud computing is the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct active management by the user

BIG DATA

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.

DATA WAREHOUSING

Data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. 

WEB DEVELOPMENT

Web development is the work involved in developing a Web site for the Internet or an intranet. It can range from a simple single static page to complex Internet applications.

APP DEVELOPMENT

Mobile app development is the act or process by which a mobile app is developed for mobile devices, such as personal digital assistants, enterprise digital assistants or mobile phones.

SOFTWARE TESTING

Software testing is an investigation conducted to provide stakeholders with information about the quality of the software product or service under test.

NPS AMPP Architecture & Various Netezza 7.2 appliance models

  • Netezza High Availability Architecture (Clustering, Mirroring, failover)
  • Installing the Netezza system and client software
  • Installing Netezza Emulator for day-to-day practice
  • Nz Admin: GUI Admin Tool (Installation & Set-up)


Netezza Command Line Interface (CLI)

  • Manage NPS with CLI commands
  • Manage User access to Netezza Databases
  • Monitoring Netezza and Linux logs
  • Netezza Events (Set-up & Monitoring)


Databases & Tables

  • Data Distribution (Hash, Random), Cluster Base Tables, Table Skew
  • Generate Statistics, Zone Maps, Materialized Views, Groom Table
  • Backup & Restore (Host Level, Database Level, Table Level)
  • Database Refreshes & Migrations

Netezza Appliance migration (For Example: 6.x to 7.x migration)

  • Data Loading/Unloading using External Tables, NZLOAD, NZ_MIGRATE
  • Data Loading/Unloading using GUI Tools
  • Optimizer and query plans
  • Query history collection & Reporting


Netezza Replication/DR Architecture

  • Netezza performance improvement techniques
  • Common DBA activities such as SPU replacements, etc
  • ODBC/JDBC/OLEDB Client Connectivity
  • Working with IBM Netezza Support to resolve issues
Complete Netezza Course
Take your journey towards learning Netezza in just 30 days, just 1 hour a day. Joins 1000s  of others who have benefited by upskilling and boosting their careers
Free Netezza Course
Netezza TwinFin Arch.
Learn about Netezza TwinFin Architecture and how it is unique
TwinFin Architecture
Sequences
What are Sequences and how are they used to generate unique IDs
Netezza Sequences
Netezza Overview
Quick introduction to what is Netezza and how is it used in Data Warehousing
Netezza Overview
nzstats command
using nzstats to gather statistics and why is it useful
nzstats