What is Data Modeling?

Reading Time: 6 minutes

What is Data Modeling? The method of creating a data store model is called data processing in a database. Data modeling is the process of identifying the entities in our domain, the relationships between these entities and how they will be stored in the database. This introduces theoretical data objects and connections between different data… Continue reading What is Data Modeling?


Reading Time: 4 minutes

What is MOLAP?Multidimensional OLAP (MOLAP) is a classical OLAP that facilitates data analysis by using a multidimensional data cube. Data is pre-computed,pre-summarized, and stored in a MOLAP (a major difference from ROLAP). Using a MOLAP, a user can use multidimensional view data with different facets. Multidimensional data analysis is also possible if a relational database is used. By that would require querying data from multiple tables. On the contrary, MOLAP has all possible combinations of data already stored in a multidimensional array. MOLAP can access this data directly. Hence, MOLAP is faster compared to Relational Online Analytical Processing (ROLAP). In this tutorial, you will learn- What is MOLAP? MOLAP Architecture Implementation considerations is MOLAP Molap Advantages Molap Disadvantages MOLAP Tools Key Points In MOLAP, operations are called processing.MOLAP tools process information with the same amount of response time irrespective of the level of summarizing.MOLAP tools remove complexities of designing a relational database to store data for analysis.MOLAP server implements two level of storage representation to manage dense and sparse data sets. The storage utilization can be low if the data set is sparse. Facts are stored in multi-dimensional array and dimensions used to query them.MOLAP Architecture MOLAP Architecture includes the following components − Database server.MOLAP server.Front-end tool. Above given MOLAPArchitectures, shown in given figure The user request reports through the interfaceThe application logic layer of the MDDB retrieves the stored data from DatabaseThe application logic layer forwards the result to the client/user. MOLAP architecture mainly reads the precompiled data. MOLAP architecture has limited capabilities to dynamically create aggregations or to calculate results that have not been pre-calculated and stored. For example, an accounting head can run a report showing the corporate P/L account or P/L account for a specific subsidiary. The MDDB would retrieve precompiled Profit & Loss figures and display that result to the user. Implementation considerations is MOLAPIn MOLAP it’s essential to consider both maintenance and storage implications to creating strategy for building cubes. Proprietary languages used to query MOLAP. However, it involves extensive click and drag support for example MDX by Microsoft. Difficult to scale because the number and size of cubes required when dimensions increase. API’s should provide for probing the cubes. Data structure to support multiple subject areas of data analyses which data can be navigated and analyzed. When the navigation changes, the data structure needs to be physically reorganized.Need different skill set and tools for Database administrator to build, maintain the database.MOLAP AdvantagesMOLAP can manage, analyze and store considerable amounts of multidimensional data.Fast Query Performance due to optimized storage, indexing, and caching.Smaller sizes of data as compared to the relational database.Automated computation of higher level of aggregates data.Help users to analyze larger, less-defined data. MOLAP is easier to the user that’s why It is a suitable model for inexperienced users. MOLAP cubes are built for fast data retrieval and are optimal for slicing and dicing operations.All calculations are pre-generated when the cube is created. MOLAP DisadvantagesOne major weakness of MOLAP is that it is less scalable than ROLAP as it handles only a limited amount of data.The MOLAP also introduces data redundancy as it is resource intensiveMOLAP Solutions may be lengthy, particularly on large data volumes. MOLAP products may face issues while updating and querying models when dimensions are more than ten. MOLAP is not capable of containing detailed data. The storage utilization can be low if the data set is highly scattered. It can handle the only limited amount of data therefore, it’s impossible to include a large amount of data in the cube itself. MOLAP ToolsEssbase – Tools from Oracle that has a multidimensional database. Express Server – Web-based environment that runs on Oracle database. Yellowfin – Business analytics tools for creating reports and dashboards.Clear Analytics – Clear analytics is an Excel-based business solution. SAP Business Intelligence – Business analytics solutions from SAPSummary: Multidimensional OLAP (MOLAP) is a classical OLAP that facilitates data analysis by using a multidimensional data cube.MOLAP tools process information with the same amount of response time irrespective of the level of summarizing.MOLAP server implements two level of storage to manage dense and sparse data sets.MOLAP can manage, analyze, and store considerable amounts of multidimensional data.It helps to automate computation of higher level of aggregates dataIt is less scalable than ROLAP as it handles only a limited amount of data.  

Difference between Database and Data Warehouse

Reading Time: 8 minutes

What is Database?A database is a collection of related data which represents some elements of the real world. It is designed to be built and populated with data for a specific task. It is also a building block of your data solution. In this tutorial, you will learn What is Database? What is a Data Warehouse? Why use a Database? Why Use Data Warehouse? Characteristics of Database Characteristics of Data Warehouse Difference between Database and Data Warehouse Applications of Database Applications of Data Warehousing Disadvantages of Database Disadvantages of Data Warehouse What is a Data Warehouse?A data warehouse is an information system which stores historical and commutative data from single or multiple sources. It is designed to analyze, report, integrate transaction data from different sources. Data Warehouse eases the analysis and reporting process of an organization. It is also a single version of truth for the organization for decision making and forecasting process. Why use a Database?Here, are prime reasons for using Database system: It offers the security of data and its access A database offers a variety of techniques to store and retrieve data. Database act as an efficient handler to balance the requirement of multiple applications using the same data A DBMS offers integrity constraints to get a high level of protection to prevent access to prohibited data. A database allows you to access concurrent data in such a way that only a single user can access the same data at a time. Why Use Data Warehouse?Here, are Important reasons for using Data Warehouse: Data warehouse helps business users to access critical data from some sources all in one place. It provides consistent information on various cross-functional activities Helps you to integrate many sources of data to reduce stress on the production system. Data warehouse helps you to reduce TAT (total turnaround time) for analysis and reporting. Data warehouse helps users to access critical data from different sources in a single place so, it saves user’s time of retrieving data information from multiple sources. You can also access data from the cloud easily. Data warehouse allows you to stores a large amount of historical data to analyze different periods and trends to make future predictions. Enhances the value of operational business applications and customer relationship management systems Separates analytics processing from transactional databases, improving the performance of both systems Stakeholders and users may be overestimating the quality of data in the source systems. Data warehouse provides more accurate reports. Characteristics of DatabaseOffers security and removes redundancy Allow multiple views of the data Database system follows the ACID compliance ( Atomicity, Consistency, Isolation, and Durability). Allows insulation between programs and data Sharing of data and multiuser transaction processing Relational Database support multi-user environment Characteristics of Data WarehouseA data warehouse is subject oriented as it offers information related to theme instead of companies’ ongoing operations. The data also needs to be stored in the Datawarehouse in common and unanimously acceptable manner. The time horizon for the data warehouse is relatively extensive compared with other operational systems. A data warehouse is non-volatile which means the previous data is not erased when new information is entered in it.Difference between Database and Data Warehouse Parameter Database Data Warehouse Purpose Is designed to record Is designed to analyze Processing Method The database uses the Online Transactional Processing (OLTP) Data warehouse uses Online Analytical Processing (OLAP). Usage The database helps to perform fundamental operations for your business Data warehouse allows you to analyze your business. Tables and Joins Tables and joins of a database are complex as they are normalized. Table and joins are simple in a data warehouse because they are denormalized. Orientation Is an application-oriented collection of data It is a subject-oriented collection of data Storage limit Generally limited to a single application Stores data from any number of applications Availability Data is available real-time Data is refreshed from source systems as and when needed Usage ER modeling techniques are used for designing. Data modeling techniques are used for designing. Technique Capture data Analyze data Data Type Data stored in the Database is up to date. Current and Historical Data is stored in Data Warehouse. May not be up to date. Storage of data Flat Relational Approach method is used for data storage. Data Ware House uses dimensional and normalized approach for the data structure. Example: Star and snowflake schema. Query Type Simple transaction queries are used. Complex queries are used for analysis purpose. Data Summary Detailed Data is stored in a database. It stores highly summarized data. Applications of DatabaseSector Usage Banking Use in the banking sector for customer information, account-related activities, payments, deposits, loans, credit cards, etc. Airlines Use for reservations and schedule information. Universities To store student information, course registrations, colleges, and results. Telecommunication It helps to store call records, monthly bills, balance maintenance, etc. Finance Helps you to store information related stock, sales, and purchases of stocks and bonds. Sales & Production Use for storing customer, product and sales details. Manufacturing It is used for the data management of the supply chain and for tracking production of items, inventories status. HR Management Detail about employee’s salaries, deduction, generation of paychecks, etc. Applications of Data WarehousingSector Usage Airline It is used for airline system management operations like crew assignment, analyzes of route, frequent flyer program discount schemes for passenger, etc. Banking It is used in the banking sector to manage the resources available on the desk effectively. Healthcare sector Data warehouse used to strategize and predict outcomes, create patient’s treatment reports, etc. Advanced machine learning, big data enable datawarehouse systems can predict ailments. Insurance sector Data warehouses are widely used to analyze data patterns, customer trends, and to track market movements quickly. Retain chain It helps you to track items, identify the buying pattern of the customer, promotions and also used for determining pricing policy. Telecommunication In this sector, data warehouse used for product promotions, sales decisions and to make distribution decisions. Disadvantages of DatabaseCost of Hardware and Software of an implementing Database system is high which can increase the budget of your organization. Many DBMS systems are often complex systems, so the training for users to use the DBMS is required. DBMS can’t perform sophisticated calculations Issues regarding compatibility with systems which is already in place Data owners may lose control over their data, raising security, ownership, and privacy issues. Disadvantages of Data WarehouseAdding new data sources takes time, and it is associated with high cost. Sometimes problems associated with the data warehouse may be undetected for many years. Data warehouses are high maintenance systems. Extracting, loading, and cleaning data could be time-consuming. The data warehouse may look simple, but actually, it is too complicated for the average users. You need to provide training to end-users, who end up not using the data mining and warehouse. Despite best efforts at project management, the scope of data warehousing will always increase. What Works Best for You?To sum up, we can say that the database helps to perform the fundamental operation of business while the data warehouse helps you to analyze your business. You choose either one of them based on your business goals.  

Overview of AWS Glue

Reading Time: 4 minutes

Migrating from on Premise solution to AWS Glue When you run AWS Glue, there are no servers or other infrastructure to manage. Pay only for the resources used while running the jobs and the metadata that is stored. If your organization is already invested in Informatica or Datastage, Talend, etc., it may be easy for… Continue reading Overview of AWS Glue

AWS Glue (Serverless ETL)

Reading Time: < 1 minute

Introduction Traditional ETL vs AWS Glue Overview of AWS Glue Demo: Creating a ETL solution using AWS Some use cases for using AWS Glue Summary What to read next? Overview of AWS Glue Traditional ETL vs AWS Glue Some use cases for using AWS Glue

ETL vs ELT: Must Know Differences

Reading Time: 5 minutes

What is ETL?ETL is an abbreviation of Extract, Transform and Load. In this process, an ETL tool extracts the data from different RDBMS source systems then transforms the data like applying calculations, concatenations, etc. and then load the data into the Data Warehouse system. In ETL data is flows from the source to the target. In ETL process transformation engine takes care of any data changes. What is ELT?ELT is a different method of looking at the tool approach to data movement. Instead of transforming the data before it’s written, ELT lets the target system to do the transformation. The data first copied to the target and then transformed in place. ELT usually used with no-Sql databases like Hadoop cluster, data appliance or cloud installation. Difference between ETL vs. ELTETL and ELT process are different in following parameters: table{width:100%;border-collapse:collapse}table td{line-height:20px;text-align:left;vertical-align:top;border:0 solid;border-top:1px solid #ddd;background-color:transparent}@media only screen and (max-width:760px),(min-device-width:768px) and (max-device-width:1024px){table,c,tbody,th,td,tr{display:block}thead tr{position:absolute;top:-9999px;left:-9999px}tr{border:1px solid #ccc}td{border:none;border-bottom:1px solid #eee;position:relative;padding-left:50%}td:before{position:absolute;top:6px;left:6px;width:45%;padding-right:10px}.table1 td:nth-of-type(1):before{content:”Parameters”}.table1 td:nth-of-type(2):before{content:”ETL”}.table1 td:nth-of-type(3):before{content:”ELT”}Parameters ETL ELT Process Data is transformed at staging server and then transferred to Datawarehouse DB. Data remains in the DB of the Datawarehouse. Code Usage Used for Compute-intensive TransformationsSmall amount of dataUsed for High amounts of data Transformation Transformations are done in ETL server/staging area. Transformations are performed in the target system Time-Load Data first loaded into staging and later loaded into target system. Time intensive. Data loaded into target system only once. Faster. Time-Transformation ETL process needs to wait for transformation to complete. As data size grows, transformation time increases. In ELT process, speed is never dependant on the size of the data. Time- Maintenance It needs highs maintenance as you need to select data to load and transform. Low maintenance as data is always available. Implementation Complexity At an early stage, easier to implement. To implement ELT process organization should have deep knowledge of tools and expert skills. Support for Data warehouse ETL model used for on-premises, relational and structured data. Used in scalable cloud infrastructure which supports structured, unstructured data sources. Data Lake Support Does not support. Allows use of Data lake with unstructured data. Complexity The ETL process loads only the important data, as identified at design time. This process involves development from the output-backward and loading only relevant data. Cost High costs for small and medium businesses. Low entry costs using online Software as a Service Platforms. Lookups In the ETL process, both facts and dimensions need to be available in staging area. All data will be available because Extract and load occur in one single action. Aggregations Complexity increase with the additional amount of data in the dataset. Power of the target platform can process significant amount of data quickly. Calculations Overwrites existing column or Need to append the dataset and push to the target platform. Easily add the calculated column to the existing table. Maturity The process is used for over two decades. It is well documented and best practices easily available. Relatively new concept and complex to implement. Hardware Most tools have unique hardware requirements that are expensive. Being Saas hardware cost is not an issue. Support for Unstructured Data Mostly supports relational data Support for unstructured data readily available. Summary:ETL stands for Extract, Transform and Load while ELT stands for Extract, Load, TransformIn ETL process data flows from the source to staging to the target.ELT lets the target system to do the transformation. No staging system involved.ELT address many a challenge of ELT but is expensive and requires niche skills to implement and maintain.  

OLAP Terms and Definitions

Reading Time: < 1 minute

Dimensions are lists of related terms used to organize your data. Thus, a natural Dimension name for the Members January, February and March might be Months. Dimensions, in turn, are used to construct Cubes, the multidimensional structures in which you store and model data. What to read next? Nothing to see here. Consider joining one of our full courses..

OLTP vs OLAP: What’s the Difference?

Reading Time: 7 minutes

What is OLAP?Online Analytical Processing, a category of software tools which provide analysis of data for business decisions. OLAP systems allow users to analyze database information from multiple database systems at one time. The primary objective is data analysis and not data processing. What is OLTP?Online transaction processing shortly known as OLTP supports transaction-oriented applications in a 3-tier architecture. OLTP administers day to day transaction of an organization. The primary objective is data processing and not data analysis Example of OLAPAny Datawarehouse system is an OLAP system. Uses of OLAP are as follows A company might compare their mobile phone sales in September with sales in October, then compare those results with another location which may be stored in a sperate database.Amazon analyzes purchases by its customers to come up with a personalized homepage with products which likely interest to their customer.Example of OLTP systemAn example of OLTP system is ATM center. Assume that a couple has a joint account with a bank. One day both simultaneously reach different ATM centers at precisely the same time and want to withdraw total amount present in their bank account. However, the person that completes authentication process first will be able to get money. In this case, OLTP system makes sure that withdrawn amount will be never more than the amount present in the bank. The key to note here is that OLTP systems are optimized for transactional superiority instead data analysis. Other examples of OLTP system are: Online bankingOnline airline ticket booking Sending a text messageOrder entryAdd a book to shopping cartBenefits of using OLAP servicesOLAP creates a single platform for all type of business analytical needs which includes planning, budgeting, forecasting, and analysis.The main benefit of OLAP is the consistency of information and calculations. Easily apply security restrictions on users and objects to comply with regulations and protect sensitive data. Benefits of OLTP methodIt administers daily transactions of an organization. OLTP widens the customer base of an organization by simplifying individual processes.Drawbacks of OLAP serviceImplementation and maintenance are dependent on IT professional because the traditional OLAP tools require a complicated modeling procedure. OLAP tools need cooperation between people of various departments to be effective which might always be not possible. Drawbacks of OLTP methodIf OLTP system faces hardware failures, then online transactions get severely affected. OLTP systems allow multiple users to access and change the same data at the same time which many times created unprecedented situation. Difference between OLTP and OLAPParameters OLTP OLAP Process It is an online transactional system. It manages database modification. OLAP is an online analysis and data retrieving process. Characteristic It is characterized by large numbers of short online transactions. It is characterized by a large volume of data. Functionality OLTP is an online database modifying system. OLAP is an online database query management system. Method OLTP uses traditional DBMS. OLAP uses the data warehouse. Query Insert, Update, and Delete information from the database. Mostly select operations Table Tables in OLTP database are normalized. Tables in OLAP database are not normalized. Source OLTP and its transactions are the sources of data. Different OLTP databases become the source of data for OLAP. Data Integrity OLTP database must maintain data integrity constraint. OLAP database does not get frequently modified. Hence, data integrity is not an issue. Response time It’s response time is in millisecond. Response time in seconds to minutes. Data quality The data in the OLTP database is always detailed and organized. The data in OLAP process might not be organized. Usefulness It helps to control and run fundamental business tasks. It helps with planning, problem-solving, and decision support. Operation Allow read/write operations. Only read and rarely write. Audience It is a market orientated process. It is a customer orientated process. Query Type Queries in this process are standardized and simple. Complex queries involving aggregations. Back-up Complete backup of the data combined with incremental backups. OLAP only need a backup from time to time. Backup is not important compared to OLTP Design DB design is application oriented. Example: Database design changes with industry like Retail, Airline, Banking, etc. DB design is subject oriented. Example: Database design changes with subjects like sales, marketing, purchasing, etc. User type It is used by Data critical users like clerk, DBA & Data Base professionals. Used by Data knowledge users like workers, managers, and CEO. Purpose Designed for real time business operations. Designed for analysis of business measures by category and attributes. Performance metric Transaction throughput is the performance metric Query throughput is the performance metric. Number of users This kind of Database users allows thousands of users. This kind of Database allows only hundreds of users. Productivity It helps to Increase user’s self-service and productivity Help to Increase productivity of the business analysts. Challenge Data Warehouses historically have been a development project which may prove costly to build. An OLAP cube is not an open SQL server data warehouse. Therefore, technical knowledge and experience is essential to manage the OLAP server. Process It provides fast result for daily used data. It ensures that response to the query is quicker consistently. Characteristic It is easy to create and maintain. It lets the user create a view with the help of a spreadsheet. Style OLTP is designed to have fast response time, low data redundancy and is normalized. A data warehouse is created uniquely so that it can integrate different data sources for building a consolidated database Summary:Online Analytical Processing is a category of software tools that analyze of data stored in a database.Online transaction processing shortly known as OLTP supports transaction-oriented applications in a 3-tier architecture OLAP creates a single platform for all type of business analysis needs which includes planning, budgeting, forecasting, and analysis.OLTP is useful to administer day to day transactions of an organization.OLAP is characterized by a large volume of data. OLTP is characterized by large numbers of short online transactions. A data warehouse is created uniquely so that it can integrate different data sources for building a consolidated database. An OLAP Cube takes a spreadsheet and three-dimensionless the experiences of analysis.  

Difference Between Fact Table and Dimension Table

Reading Time: 9 minutes

Fact Table: A fact table is a primary table in a dimensional model. A Fact Table contains Measurements/factsForeign key to dimension table Dimension table: A dimension table contains dimensions of a fact. They are joined to fact table via a foreign key. Dimension tables are de-normalized tables. The Dimension Attributes are the various columns in a dimension tableDimensions offers descriptive characteristics of the facts with the help of their attributesNo set limit set for given for number of dimensions The dimension can also contain one or more hierarchical relationships Difference between Dimension table vs. Fact tableParameters Fact Table Dimension Table Definition Measurements, metrics or facts about a business process. Companion table to the fact table contains descriptive attributes to be used as query constraining. Characteristic Located at the center of a star or snowflake schema and surrounded by dimensions. Connected to the fact table and located at the edges of the star or snowflake schema Design Defined by their grain or its most atomic level. Should be wordy, descriptive, complete, and quality assured. Task Fact table is a measurable event for which dimension table data is collected and is used for analysis and reporting. Collection of reference information about a business. Type of Data Facts tables could contain information like sales against a set of dimensions like Product and Date. Evert dimension table contains attributes which describe the details of the dimension. E.g., Product dimensions can contain Product ID, Product Category, etc. Key Primary Key in fact is mapped as foreign keys to Dimensions. Foreign key to the facts table Storage Helps to store report labels and filter domain values in dimension tables. Load detailed atomic data into dimensional structures. Hierarchy Does not contain Hierarchy Contains Hierarchies. For example Location could contain, country, pin code, state, city, etc. Type of factsType of facts Explanation Additive Measures should be added to all dimensions. Semi-Additive In this type of facts, measures may be added to some dimensions and not with others. Non-Additive It stores some basic unit of measurement of a business process. Some real-world examples include sales, phone calls, and orders. Types of Dimensions:Types of Dimension Definition Conformed Dimensions Conformed dimensions is the very fact to which it relates. This dimension is used in more than one-star schema or Datamart. Outrigger Dimensions A dimension may have a reference to another dimension table. These secondary dimensions called outrigger dimensions. This kind of Dimensions should be used carefully. Shrunken Rollup Dimensions Shrunken Rollup dimensions are a subdivision of rows and columns of a base dimension. These kinds of dimensions are useful for developing aggregated fact tables. Dimension-to-Dimension Table Joins Dimensions may have references to other dimensions. However, these relationships can be modeled with outrigger dimensions. Role-Playing Dimensions A single physical dimension helps to reference multiple times in a fact table as each reference linking to a logically distinct role for the dimension. Junk Dimensions It a collection of random transactional codes, flags or text attributes. It may not logically belong to any specific dimension. Degenerate Dimensions Degenerate dimension is without corresponding dimension. It is used in the transaction and collecting snapshot fact tables. This kind of dimension does not have its dimension as it is derived from the fact table. Swappable Dimensions They are used when the same fact table is paired with different versions of the same dimension. Step Dimensions Sequential processes, like web page events, mostly have a separate row in a fact table for every step in a process. It tells where the specific step should be used in the overall session.  

Data Modelling with Erwin

Reading Time: 3 minutes

Definition: Data modeling is a process used to define and analyze data requirements needed to support the business processes. The process of data modeling involves professional data modelers working with business stakeholders, as well as potential users of the information system. Types: Conceptual data model: It is a set of technology independent specifications about the… Continue reading Data Modelling with Erwin