Netezzaβs AMPP architecture is a two-tiered system designed to handle very large queries from multiple users. The first tier is a high-performance Linux SMP host, with a secondary host available for fully redundant, dual-host configurations.
The host compiles queries received from applications, and generates query execution plans.
It then divides a query into a sequence of sub-tasks, or snippets, that can be executed in parallel, and distributes the snippets to the second tier for execution. The host returns the final results to the requesting application.
2) Secondary Tier
The second tier consists of dozens to hundreds or thousands of Snippet Processing Units (SPUs) operating in parallel.
Each SPU is an intelligent query processing and storage node, consisting of a powerful commodity processor, dedicated memory, a disk drive, and a field-programmable disk controller with hard-wired logic to manage data flows and process queries at the disk level.
The massively parallel, shared-nothing SPU blades provide the performance advantage of MPP (Massively Parallel Processing).
3) Overall
Nearly all query processing is done at the SPU level, with each SPU operating on its portion of the database.
All operations that lend themselves easily to parallel processing, including record operations, parsing, filtering, projecting, interlocking, and logging, are performed by the SPU nodes. This significantly reduces the amount of data required to be moved within the system.
Operations on sets of intermediate results, such as sorts, joins, and aggregates, are executed primarily on the SPUs, but can also be done on the host, depending on the processing cost and complexity of that operation.
3.1) What to read next?
Add additional content here if necessary
Understanding Netezza AMPP Architecture
Netezza AMPP (Accelerated Massively Parallel Processing) is a high-performance data warehouse connectivity solution developed by IBM for Netezza and other big data systems. This article provides an overview of the Netezza AMPP architecture.
Key Components
Netezza Appliance: The core component, which includes massively parallel processing (MPP) hardware and proprietary database software.
AMPP Server: The AMPP server acts as a bridge between the client applications and the Netezza appliance. It manages the data flow and handles query optimization for better performance.
Client Applications: These include SQL clients like DB2, SQL Navigator, or third-party BI tools that can connect to the AMPP server to access the data in the Netezza appliance.
AMPP Architecture
The Netezza AMPP architecture is designed to distribute data and workload across multiple nodes for efficient processing:
Data Distribution
In the Netezza appliance, data is stored in tables that are horizontally partitioned across multiple nodes. Each node processes a portion of each table. This allows for parallel processing and improved query performance.
Query Processing
When a query is submitted to the AMPP server, it is first optimized based on the distributed data layout. The query is then broken down into smaller pieces (sub-queries) that can be executed in parallel across the Netezza appliance nodes.
Data Exchange
If a sub-query requires data from multiple nodes, data exchange occurs between nodes using a high-speed interconnect. The exchanged data is compressed for faster transfer.
Results Aggregation
Once the sub-queries are processed and results are gathered, they are aggregated to produce the final query result. This aggregation can occur on individual nodes or in a global aggregate node if necessary.
Benefits of Netezza AMPP
Improved query performance through parallel processing and data distribution
Scalability to handle large amounts of data with minimal impact on query performance
Flexibility to use various SQL clients or BI tools for accessing the data in the Netezza appliance