Netezza uses the cost based optimizer to determine best methods determine redistribution, scan, join, join orders. Netezza query optimizer relies on the statistics from catalog to come up with an optimal query execution plan. So it is imperative that the statics need to be kept current without which the execution plan generated by the optimizer may be sub-optimal resulting in poor query performance. System uses the Netezza GENERATE STATISTICS command to generate information about each column proportion of the duplicate values, unique values, NULL values and the maximum and minimum values. The optimizer uses statistics information to determine the most efficient way to run a query.
In this article, we will cover hide
How Statistics are collected?Netezza system collects statistics in various ways:
- Statistics are automatically generated for certain database operations
- Collect full statistics by running Netezza GENERATE STATISTICS command on database or on individual tables. This operation is required when table changes significantly, say at least 10%.It usually collects all level of statistics, including dispersion values.
- You can generate statistics You should maintain the table by running Netezza GENERATE STATISTICS SQL command on regular basis.
- Just In Time Statistics (JIT): Runs at every execution time using sampling and zone maps
How Statistics are used?Some examples of how the statistics can be used by the optimizer
- If a column is null able additional code may be generated to check whether the value is null or not
- Knowing the number of records in the table, min and max values along with the number of distinct values can help estimate the number of relevant records which can be returned for the query assuming there is uniform distribution
- Based on the min and max values the optimizer can determine the type of math required to be performed like 64 or 128 bit computation.
Automatic Netezza Database StatisticsThe Netezza system automatically generates two ‘Base Statistics’ (table row count and min/max values for non-character columns) when performing certain database operations:
- nzload operations
- INSERT operations
- UPDATE operations
- GROOM TABLE operations
- TRUNCATE TABLE operations
Complete Netezza Course
Take your journey towards learning Netezza in just 30 days, just 1 hour a day. Joins 1000s of others who have benefited by upskilling and boosting their careers
Netezza GENERATE STATISTICS SyntaxSyntax of the command:
Usage: The following provides sample usage. Generate statistics for all column of the table TEST:
GENERATE STATISTICS ON <tablename> [ (<col>[,<col>… ]) ];
TRAINING.ADMIN(ADMIN)=>GENERATE STATISTICS ON TEST;Generate statistics for all the tables in database TRAINING:
Netezza GENERATE STATISTICS Best PracticesFollowing are some of best practices when you are working on generate statistics:
- It is very important to collect full statistics by running Netezza GENERATE STATISTICS command when table changes significantly, say at least 10%
- You should always collect statistics when perform nzload, insert, update, delete or truncate operations
- Generate statistics for columns which are often used. For examples, columns used in JOIN conditions, WHERE clause, GROUP BY clause and ORDER BY clause
- You should GENERATE STATISTICS on all TEMP tables those are explicitly used in JOIN
- You should not abort the GENERATE STATISTICS command as this will disable the zone maps and rebuilt it. Any interruption leaves the zone maps disabled and this of course impact the performance.
JUST IN TIME (JIT) Statistics and UsageNetezza automatically gathers statistics on the table with actual restrictions using sampler scan functionality, Just-In-Time (JIT) for planning. JIT Stats is automatically run on all qualified tables, such as:
- The table contains more than 5 Million rows
- Query contains at least one column restriction
- Restrictions do not contain sub-query or sub-plan expressions
- Must participate in a JOIN or must have a MATERIALIZED VIEW associated with them
- Must be a database table on disk No system tables, EXTERNAL TABLES or virtual tables
- JIT Dispersion values calculated on queried columns of tables > 500 Million rows
- JIT stats does not eliminate the need to run GENERATE STATISTICS