What is ETL?
ETL stands for Extract, Transform & Load, and ELT stands for Extract, Load & Transform, and so in ETL Transforming the data into a common format is done before loading.
we will cover these topics:
hide
What is ELT?
In ELT, loading the data to the destination is performed first, and then the transformation is applied based on the destination format. The data first copied to the target and then transformed in place. In some architectures, the Staging area using ELT system is a part of the target. In most architectures, ELT systems directly transform data in the target database.
ELT is preferred to be used with no-Sql databases like Hadoop cluster, data appliance or cloud installation.
Difference between ETL vs. ELT
ETL and ELT process are different in following parameters:
Parameters | ETL | ELT |
---|---|---|
Process | Data is transformed at staging server and then transferred to Datawarehouse DB. | Data remains in the DB of the Datawarehouse. |
Code Usage | Used for
|
Used for High amounts of data |
Transformation | Transformations are done in ETL server/staging area. | Transformations are performed in the target system |
Time-Load | Data first loaded into staging and later loaded into target system. Time intensive. | Here since data is directly loaded into the target systems initially and all transformations are carried out at the objective systems. |
Time-Transformation | Since this process involves loading the data into ETL systems first and then into the respective target system this pulls in a comparatively larger time. | In ELT process, speed is never dependent on the size of the data. |
Time- Maintenance | It needs highs maintenance as you need to select data to load and transform. | Low maintenance as data is always available. |
Implementation Complexity | At an early stage, easier to implement. | To implement ELT process organization should have deep knowledge of tools and expert skills. |
Support for Data warehouse | ETL model used for on-premises, relational and structured data. | Used in scalable cloud infrastructure which supports structured, unstructured data sources. |
Data Lake Support | Does not support. | Unstructured data can be processed with data lakes here. |
Complexity | The ETL process loads only the important data, as identified at design time. | This process involves development from the output-backward and loading only relevant data. |
Cost | High costs for small and medium businesses. | Low entry costs using online Software as a Service Platforms. |
Lookups | In the ETL process, both facts and dimensions need to be available in staging area. | All data will be available because Extract and load occur in one single action. |
Aggregations | Complexity increase with the additional amount of data in the dataset. | Power of the target platform can process significant amount of data quickly. |
Calculations | Overwrites existing column or Need to append the dataset and push to the target platform. | Easily add the calculated column to the existing table. |
Maturity | The process is used for over two decades. It is well documented and best practices easily available. | Relatively new concept and complex to implement. |
Hardware | Most tools have unique hardware requirements that are expensive. | Being Saas hardware cost is not an issue. |
Support for Unstructured Data | Mostly supports relational data | Support for unstructured data readily available. |
Summary:
Criteria | ETL | ELT |
Flexibility | High | Low |
Working methodology | Data from the source system to the data warehouse | Leverages the target system to transform data |
Performance | Average | Good |
- ETL stands for Extract, Transform and Load while ELT stands for Extract, Load, Transform
- In ETL process data flows from the source to staging to the target. ELT lets the target system to do the transformation. No staging system involved.
- ETL is an older concept and been there in the market for more than two decades, ELT relatively new concept and comparatively complex to get implemented.
- In an ETL case, a large number of tools have only one of its kind hardware requirements that are posh. In the case of an ELT Since this falls under Saas hardware cost is not a concern.
- To carry out a lookup, ETL operates row by row pattern to map a fact-value with its dimension key element from a different table. In ELT we can directly map fact-value with dimension key elements.
- In ETL Relational data is prioritized here, whereas ELT Readily supports unstructured data.
- ELT address many a challenge of ELT but is expensive and requires niche skills to implement and maintain.