As with most database management systems, Netezza generates a query plan for all queries executed in the system. The question plan determines ideal execution path by Netezza to fulfill each query. The Netezza component which generates and determines the optimal query path from the available alternatives is called the query optimizer and it relies on the number of data available about the database objects involved in the query executed. The Netezza query optimizer calculates the cost for the different execution path and chooses the path with the least cost.
The optimizer depends on the statistics about the database objects and can only give the best results when it has the most up to date results. It relies on:
- Column data dispersion such as distinct values, unique value, and null values
- # of rows in the tables
- Minimum and maximum values stored in columns involved in the where clause or having clause of the query
- Number of extents in each tables and the total number of extends on the data slice with the largest skew
Given that the optimizer depends on the statistics to decide the best execution plan, it is imperative to update the database statistics using the “GENERATE STATISTICS” commands. In addition, the optimizer considers the FPGA capabilities of Netezza to determine the ideal plan.
When coming up with the plan for execution, the optimizer looks for
- optimal path for data scan operations i.e. to read data from the tables
- optimal path to join the tables in the query like hash join, merge join, nested loop join
- optimal path to distribute data in between SPUs like redistribute or broadcast
- The order in which tables can be joined in a query join involving multiple tables
- Opportunities to rewrite queries to improve performance like
- Pull up of sub-queries
- Push down of tables to sub-queries
- De-correlation of sub-queries
- Expression rewrite
Netezza breaks query execution into units of work called snippets which can be executed on the host or on the snippet processing units (SPU) in parallel. Queries could have either one or multiple snippets which will be executed in sequence on the host or the SPUs depending on what is done by the snippet code. The snippet code generated for SQL query execution helps with scanning and retrieving data from a table, sorting data, performing data aggregation, joining data retrieved from tables, grouping of data, dynamic distribution of data to help query performance. When the SQL query is executed, Netezza dynamically prepares the execution plan and the C code to execute each snippet of the plan.
The recent execution plans are stored in the data.<ver>/plan directory under the /nz/data directory. The snippet C code is stored under the /nz/data/cache directory and this code is used to compare against the code for new plans so that the compilation of the code can be eliminated if the snippet code is already available.
Apart from Netezza generating the execution plan dynamically during query execution, users can also generate the execution plan (without the C snippet code) using the “EXPLAIN” command. This process will help users identify any potential performance issues by reviewing the plan and making sure that the path chosen by optimizer is inline or better than expected. During the plan generation process, the optimizer may perform statistics generation dynamically to prevent issues due to out of statistics data particularly when the tables involved store large volume of data. The dynamic statistics generation process uses sampling which is not as perfect as generating statistics using the “GENERATE STATISTICS" command which scans the tables.
• Use Same Data Type and Length for columns that are often used for joining tables, so that the query execution can be efficient which in turn helps queries execute faster
• The largest table should be joined last in the query.
• Use joins over correlated sub queries.
Define all constraints and relationships between objects. Even though Netezza doesn’t enforce them other than the not null constraint and defaults, the query optimizer will still use these details to come-up with an efficient query execution plan.