Netezza is a data warehouse and big data analytics appliance. It uses Asymmetric Massively Parallel Processing (AMPP) architecture, which combines an SMP front end with a shared MPP back end for query processing. Netezza is a result of database integration , processing engine and storage in a system. Netezza architecture resembles Hadoop cluster design in may ways. e.g. Distribution, active-passive node, data storing methods, replications etc.
As a commodity based appliance, Netezza TwinFin is a low-cost analytic option. But its value goes beyond the initial low purchase price. The Netezza appliance requires minimal ongoing administration, both in internal resources as well as implementation costs, for an overall low total cost of ownership. There are no hidden costs. Netezza TwinFin also provides built-in compliance controls, reports and workflow that significantly reduce the costs necessary to satisfy mandatory compliance regulations (e.g. SOX, PCI, HIPAA, FISMA, Basel II).
Netezza TwinFin comes in 3 models: TwinFin 3, TwinFin 6 and TwinFin 12; this number indicates number of S-blades or processors. All models come with
- Storage: 1 TB hard drives
- Processor: two CPU, quad core (that is, TwinFin 3 is 3 x 2 CPU with quad core processors).
- Disk storage: Processor-core factor in TB. For instance, for TwinFin 3 it is 24 TB (take above number, 3 x 2 x 4 core, take this in TB; in other words, each core gets a disk).
- Netezza version: 5.0 onwards; 6.0 beta is already out as of beginning for 2010 Q1.
It integrates database, processing and storage in a compact system. Netezza Twinfin is the latest generation data warehouse appliance. It has four major components:
Netezza Architecture – Hosts
The Netezza hosts are high-performance Linux servers that are set up in an active-passive mode for high availability. In case of active server failure, the passive host will take over the processing tasks. It just requires very small time to passive node to take over. The active host is an interface to external tools and client applications such BI, ETL, JDBC, ODBC tools. Client submits SQL requests via ODBC/JDBC. Number of tools such as Aginity, Squirrel, nzsql utility are used to submit SQL query to Netezza host. The Netezza compiles them into executable code segments called snippets (usually C/C++ codes) , and creates optimised query plans by distributing the snippets across to all the nodes for execution. FPGA fetches the required data and snippet execution takes place.
Field Programmable Ggate Arrays – FPGA
The FPGA is a Netezza proprietary hardware tool developed to filters out unwanted data as early as possible when SQL query is submitted to hosts. The data will be eliminated as early as when reading from disks. This process of data elimination removes IO bottlenecks and frees up downstream components such as the CPU, memory and network from processing extra data hence notably improves performance. The FPGA always rely on the zone maps to eliminate the unwanted data. Zone maps are created to every column in the tables during certain Netezza operations.
Snippet Blades (S-Blades)
S-Blades are intelligent processing nodes that make up the MPP engine of the Netezza data warehouse appliance. Each S-Blade is an independent server that contains powerful multi-core CPUs, multi-engine FPGAs and gigabytes of RAM, all working in parallel to deliver high performance. FPGA in each s-blade is important Netezza architecture hardware that improves the performance.
TwinFin 12 is a full rack containing all components. Top portion is occupied by storage array, followed by two hosts (Linux based hosts, configured for high availability so that if one host fails another takes over), KVM and an array of S-blades. Data between two hosts are replicated and are always synched up so that if one host fails, another can assume that role without any problem.
Disk space for TwinFin 12 is a total of 96 TB; one third of it is only used for storing data, another one third is for disk mirror and the other one third is for spool or temporary space for processing queries. Roughly, 32 TB is available on TwinFin 12 for storage of data.
Finally, other important Netezza architecture hardware is high performance Disks. The disk enclosures contain high density and high performance disks that are RAID protected. Each disk contains a slice of the data in a database tables. Either hash or random algorithm will be used by host to distributes the data across all the disks evenly. A mirror copy of each slice of data is maintained on a different disk drive if the mirroring is enabled. The disk enclosures are connected to the S-Blades via high-speed interconnects that allow all the disks simultaneously stream data to the S-Blades at the maximum rate possible. The data distribution and the storage is based on the distribution key which we use while creating table.
Netezza key features
The ones below are chosen from the most meaningful TwinFin’s features:
- supporting both, Business Intelligence and advanced analytics
- scalable (10-100x) performance at petascale
- efficient even if being used by thousands of users at the same time
- i-Class technology use for analytic developing
- streaming architecture based on blades
- ubiquitous simplicity of deployment and management
- data compliant
- compatible with the most popular Business Intelligence and analytic tools
- standard SQL, ODBC, JDBC, and OLE DB interfaces
- reliability and availability at 99,99% uptime level
- green orientation thank to low cooling and power requirements
- high load pace – over 2 TB of data per hour
- high backup creating pace – over 4 TB of data per hour
TwinFin is radically different from earlier models. Netezza implemented CPU, memory and storage along with an FPGA on previous models. TwinFin uses S-blades that includes CPU, memory and FPGA (a new term coined by Netezza: database accelerator card = FPGA + memory + IO interface); storage is separated and is located in a storage array.