Netezza Zone Maps Best Practices - learn

Reading Time: 5 minutes

1) Join Keys Best Practice

Use Integers as join keys for best performance results, example a surrogate key. Integers and integer based columns (e.g. timestamp and date) will compress with Netezza’s compression. As per Netezza docs, floating point numerics hash poorly and will force you to slower sort merge joins if joining on numeric columns that are not integers. Relying on a standard of integer data type on keys will avoid problems like a NOT achieving good join colocation. If two tables are not distributed on the join column, matching data from both the tables will end up in different data slices which means, the snippet processor needs to perform additional work to satisfy the join.

2) Distribution Keys

When two tables are joined together often like a customer table and order table, the distribution key selection of the two tables can play an important role in the performance of the queries. If the distribution key is on the join column, for e.g. customer id column in both the customer and order table, the data distribution will result in the records with the same customer id values ending up in the same data slice for both the tables. When a query joining the table is being processed, since the matching data from both the tables are in the same data slice the snippet processor will be able to perform the join locally and the send the result without performing additional work which in turn improves the performance of the query.

3) Clustered Based Table

In order to organize the existing records in the table, “GROOM TABLE” needs to be executed to take advantage of the data reorganization by queries. It is a good practice to have fact tables defined as clustered base tables with data organized on often joined columns to improve multi-dimensional lookup. At the same time care needs to be taken on the data organization columns by understanding the often executed queries and also minimizing the number of columns on which the data needs to be organized on.

4) Full Netezza course

Full Netezza course

Best Practices for Using Netezza Zone Maps

Netezza Zone Maps (Z-Maps) are a powerful tool to optimize query performance in Netezza by reducing I/O operations. In this article, we'll discuss best practices for using Z-Maps effectively.

Understand Your Data

Before creating Zone Maps, it is essential to understand the distribution of your data. Analyze the hot and cold spots in your tables to optimize Zone Map creation.

Create Z-Maps Carefully

1. Create Z-Maps for large tables with many partitions. 2. Use a sufficiently high sample rate (minimum 50%) to get an accurate representation of the data. 3. Avoid creating Zone Maps on tables with rapidly changing data as they may become outdated quickly.

Optimize Z-Map Maintenance

1. Schedule regular Z-Map refreshes to keep them up-to-date. 2. Use the REBUILD_ZMAP command to refresh your Zone Maps when necessary. 3. Consider enabling automatic Z-Map refreshing for specific tables.

Monitor Z-Map Performance

Regularly monitor the performance of your Zone Maps using tools like DBMON or the Netezza Performance Profiler (NPP). This will help you identify any issues and optimize their performance.

Example: Creating a Z-Map

    CREATE ZONEMAP FOR table_name ON columns_to_zonemap USING SAMPLE 75;
    

Conclusion

By following these best practices, you can effectively use Netezza Zone Maps to optimize query performance and reduce I/O operations. Always keep your Z-Maps up-to-date, monitor their performance, and adapt them as needed for optimal results.