Data Warehouse Design
Q1. A Comprehensive Understanding of the Concepts, Purposes, Architectures, Evolution Benefits of Data Warehouse
Data Warehousing is a collection of business information and data derived from functioning systems as well as external data sources. A data warehouse is created to prompt business decisions by permitting consolidation of data, analysis, and reporting at diverse aggregate levels (Ross et al. 2014). Data is transferred into the DW either by extraction, transformation, and or loading. The concepts of data warehousing involves integration of data stores and logical, physical, and conceptual models in efforts to support business objectives and information needs of the end-user (Ross et al. 2014). Generating a DW entails mapping data amid targets and sources, then capturing information details in a metadata repository. The data warehouse offers a single and comprehensive source of historical and current information.
There are three different types of DW, such as enterprise Data warehouse, Data Mart, and operational data store. Enterprise DW is an integrated warehouse, which offers decision support package across an enterprise (Patel & Patel 2012). It provides a unified method of consolidating and then representing data. Enterprise DW also offers data classification capacity according to the business and provides access according to the divisions. Operational Data Store (ODS) concept is a data store necessary when neither Oral Pulse Levodopa Therapy (OPLT) systems nor Data warehouse supports the establishment’s reporting requirements (Patel & Patel 2012). In ODS, real-time DW refreshment is possible, making it mostly preferred for predictable activities such as storing employees’ records. The concept of data Mart is a data warehouse subset. It is mainly designed for specific segments such as finance or sales. In a data mart that is independent, data can be collected directly from the sources (Patel & Patel 2012). The primary purpose of a data warehouse is to store large volumes of data for analysis and queries. Don't use plagiarised sources.Get your custom essay just from $11/page
Further, a data warehouse facilitates analytical and reporting processes, therefore, assisting the users make data-backed routine and strategic business decisions. The Data Warehouses architecture was developed in the 1980s to assist in transforming data from operational systems to decision-making support systems (Vaisman & Zimányi, 2014). Typically, a Data Warehouse is part of a corporation’s mainframe server or in the Cloud.
Generally, data warehouse architecture is mostly based on a Relational Database Management System Server that operates as a centralized repository for critical informational data (Vaisman & Zimányi, 2014). In DW architecture, processing and operational data are separate from the data warehouse processing. This centralized information repository is enclosed by several fundamental components designed to ensure the entire environment is functional, accessible, and manageable by the operational systems that source the data into the data warehouse and or the end-user analysis and query tools.
Typically, a Data Warehouse embraces a three-tier architecture such as the bottom tier, middle tier, and top tier (Patel & Patel 2012). The bottom tier architecture signifies the DW database server, which is also referred to as the relational database system. The back-end utilities and tools are used in feeding the data into a bottom tier. Additionally, back-end utilities and tools perform data extraction, cleaning, loading, and refreshing functions. On the other hand, the middle tier of a DW represents the Online Analytical Processing (OLAP) server that is an extension of the Relational Database Management System (Patel & Patel 2012). The Relational Online Analytical Processing (ROLAP) maps the processes on multidimensional data to a standard relational OLAP model, that directly processes the operations and multifaceted data. Further, Top-tier characterizes the front-end consumer layer (Patel & Patel, 2012). The over said layer holds the reporting tools and query tools, data mining tools, and analysis tools.
Data Warehouse has several benefits to the corporations. When a data warehouse is implemented into a business intelligence framework, it results in numerous benefits. For instance, DW delivers improved business intelligence, whereby information access from different sources is made available from a single platform, and decision-makers do not have data limits (Vaisman & Zimányi, 2014). Data warehouses can be applied in various businesses’ processes such as sales, market segmentation, inventory, financial, and risk management (Vaisman & Zimányi, 2014). DW also saves time it preserves, standardizes, and stores data from diverse sources hence consolidating and integrating all sorts of data (Vaisman & Zimányi, 2014).
Data Warehouse enhances data consistency and quality by converting data from various sources into a consistent format, resulting in more precise data, which is a source of reliable decisions. Moreover, DW produces a high Return on Investment (ROI) and delivers a competitive advantage since it helps in grasping the complete view of the current standing as well as evaluating risks and opportunities, therefore giving companies a competitive advantage (Vaisman & Zimányi, 2014). DW also improves the corporates decision-making process by enabling data transformation into purposeful information, hence performing more reliable, functional, and precise analysis. The organization can even forecast with confidence since business data is analyzed to predict current and future markets, estimate predicted results, and planning accordingly (Vaisman & Zimányi, 2014).
Q2. A systematic knowledge of how to apply ERD and Star Schema to design DW
Procedure for applying ERD in designing Data Warehouse
Larson & Chang (2016) suggest the following steps in using the Entity-Relationship Diagram:
- First, make the purpose of drawing the ERD clear. For instance, whether representing overall system architecture with diverse business objects or developing an ERD ready for database creation.
- Secondly, make the scope of the model clear to prevent the inclusion of redundant objects and relationships in the design.
- Thirdly, draw the significant objects included in the model scope.
- Fourthly define the entities’ properties by adding several columns.
- Review the ER model cautiously and evaluate whether the columns and objects are sufficient to store the system data and consider adding more units and columns.
- Reflect on all entities’ relationships and then relate them with appropriate cardinality such as a one-to-several between entity Order and Customer.
- Use the technique of DW normalization in re-structuring the units to increase data integrity and reduce data redundancy.
Figure 1: An ER model of a DW representing the relationships between customer and supplier of specific product
Procedure for applying Star Schema in designing DW
The fact table of Star Schema consists of the quantitative and measurable data, while the dimension table involves the descriptive attributes which are correlated to fact data (Larson & Chang (2016).
The three types of fact tables to consider:
- The transaction fact tables that are used to record facts about a precise event such as a holiday sales event.
- The snapshot fact tables used to record points for a specified period like account details at the end of the month quarter.
- Accumulate snapshot tables showing rapid accumulations at a specified time, such as total sales in a given month. A unique key (Surrogate key) is also allocated to a fact table to recognize each row uniquely.
Dimension Table in Star Schema: The dimensions in a DW can describe a diversity of characteristics. Some of the often used dimension tables to consider are the geography dimension table, time dimension table, product dimension table, employee dimension table, among others (Larson & Chang, 2016). Likewise, the tables are also allocated a single-column integer data type (surrogate primary key), denoting to the blend of dimension attributes creating the original key. The key things to consider in creating a dimensional table include:
- Ensure every dimension in a star schema is represented with the only one-dimension table.
- Ensure the dimension table contains the attributes sets.
- Join the dimension table to the fact table with foreign key
- Ensure the dimension tables are not linked to each other
Figure 2: A Star Schema DW sample showing the relationships between product, revenue, branch dim. And dealer
Q3. The ability to design appropriate data extraction, transformation, and loading strategy and create reasonable queries.
Extract-Transform-Load (ETL) covers a procedure of how the data is loaded from the system source into the DW. ETL includes the separate step of cleaning data (Vaisman & Zimányi 2014). The arrangement is, therefore, Extract-Clean-Transform-Load. Data Extraction: This step covers data extraction from the system source and makes it available for additional processing. The critical objective of the extraction step is to retrieve the necessary data from the system source with possibly limited resources (Vaisman & Zimányi, (2014). The extraction process can be done in several ways, including updating notification, incremental extract, and full extract. Data Cleaning: This step ensures the data is high quality in a data warehouse. Data cleaning have to carry out simple data unification guidelines, including:
- Creation of unique identifiers unique (sex classifications such as Male/Female/Unknown/null
- Conversion of null values into a standardized format such as Not Provided value/Not Available
- Conversion of phone numbers and ZIP codes into a standardized format
- Validation of address fields like converting them into appropriate naming such as Street/St/ Str
- Validation of the address fields alongside each other such as City/State, State/Country, City/ZIP code
Data Transformation: This step puts on a set of rules in transforming data from the system source to the target destiny (Torey & Song, 2017). The process involves converting measured data into a similar dimension, such as conformed dimension, by use of the same entities to ensure they join later (Torey & Song, 2017). This step also entails entering data from numerous sources, producing aggregates, sorting, creating surrogate keys, deriving of new calculated standards, and using innovative validation rules.
Data Loading: In this step, it is essential to ensure that data loading is done correctly with possibly limited resources (Torey & Song, 2017). In making the loading process efficient, it is necessary to disable all indexes and constraints before loading and later enable them back after the loading process is completed. Referential integrity should be preserved by the ETL tool to guarantee consistency.
References
Larson, D., & Chang, V. (2016). A review and future direction of agile, business intelligence, analytics, and data science. International Journal of Information Management, 36(5), pp. 700-710.
Patel, A., & Patel, J. (2012). Data modeling techniques for data warehouse. International Journal of Multidisciplinary Research, 2(2), pp. 240-246.
Ross, T. R., Ng, D., Brown, J. S., Pardee, R., Hornbrook, M. C., Hart, G., & Steiner, J. F. (2014). The HMO Research Network Virtual Data Warehouse: a public data model to support collaboration. Items, 2(1).
Storey, V. C., & Song, I. Y. (2017). Big data technologies and management: What conceptual modeling can do. Data & Knowledge Engineering, 108, pp. 50-67.
Vaisman, A., & Zimányi, E. (2014). Data warehouse systems. Data-Centric Systems and Applications.