Dimensional Modelling is a design technique used to organize data in a way that is easy to understand and analyze. It is mainly used in data warehousing and business intelligence projects. Dimensional modelling is based on two main concepts: dimensions and facts.
Dimensions are the characteristics of the data, such as time, geography, product, customer, etc. They are used to filter and aggregate the data in various ways. Facts are the measurable quantities that are being analyzed, such as sales, profit, cost, quantity, etc.
In this article, we will discuss the fundamental concepts of dimensional modelling and their significance in data warehousing.
Fact Tables
A fact table is the central table in a dimensional model. It contains the measures (facts) and foreign keys to the related dimensions. The fact table can be categorized into several types:
Conformed facts: Facts that are shared across multiple fact tables, such as the total sales for a product.
Transaction fact tables: Fact tables that record individual transactions, such as sales or orders.
Periodic snapshot fact tables: Fact tables that record periodic measurements, such as the monthly sales.
Accumulating snapshot fact tables: Fact tables that record the state of an activity at a particular point in time, such as the current inventory level.
Factless fact tables: Fact tables that have no measures but only contain foreign keys, used to represent events that did not occur.
For example, a sales fact table would contain measures such as revenue, quantity sold, and cost of goods sold, and foreign keys to related dimensions such as date, product, and store.
Dimensions
Dimensions are used to provide context to the data in a fact table. There are several types of dimensions:
Type 0 – Keep Original: The dimension attribute does not change over time, and the value is stored in the fact table.
Type 1 – Overwrite: The dimension attribute changes over time, and the new value overwrites the old value in the dimension table.
Type 2 – Add new row: The dimension attribute changes over time, and a new row is added to the dimension table for each change.
Type 3 – Add new attribute: The dimension attribute changes over time, and a new column is added to the dimension table for each change.
Type 4 – Add new table: A new dimension table is added to represent a new aspect of the data.
For example, a time dimension would contain attributes such as year, month, and day. A product dimension would contain attributes such as product name, category, and manufacturer.
Primary Keys
A primary key is a unique identifier for each row in a table. In a dimensional model, the primary key of a dimension table is usually a natural key, such as a product code or customer ID. The primary key of a fact table is a composite key made up of the foreign keys to the related dimension tables.
Surrogate Keys
A surrogate key is a system-generated identifier that is used instead of a natural key as the primary key of a dimension table. Surrogate keys are useful when the natural key is too long or complex, or when the natural key changes over time.
Granularity
Granularity refers to the level of detail in a fact table. The granularity of a fact table is determined by the combination of the foreign keys to the related dimension tables. A high granularity means more detail, while a low granularity means less detail.
Star Schema
A star schema is a type of dimensional model in which the fact table is connected directly to the dimension tables, forming a star shape. The star schema is simple and easy to understand, but it can become complex when there are many dimensions.
Physical Model
The physical model is a detailed representation of the data model that includes information about data types, keys, indexes, constraints, and other database-specific details. The physical model is used by database administrators to create the actual database schema.
Staging Tables
Staging tables are temporary tables used to store data before it is loaded into the data warehouse. Staging tables are used to cleanse, transform, and validate data, and to ensure that it conforms to the dimensional model.
Conformed Dimensions
Conformed dimensions are dimensions that are shared across multiple fact tables. Conformed dimensions ensure consistency and accuracy of the data and allow for easy integration of data from multiple sources.
Role Playing Dimensions
Role playing dimensions are dimensions that are used in multiple ways within a single fact table. For example, a date dimension can be used to represent both the order date and the ship date in an order fact table.
Junk Dimensions
Junk dimensions are dimensions that contain low-cardinality attributes that do not fit into any other dimension. Junk dimensions are used to simplify the dimensional model and reduce the number of dimensions.
Snowflaking
Snowflaking is the process of normalizing a dimension table by splitting it into multiple related tables. Snowflaking is used to reduce the redundancy in the dimensional model and improve performance.
Bridges
Bridges are tables used to represent many-to-many relationships between dimension tables. Bridges are used to simplify the dimensional model and improve performance.
Outriggers
Outriggers are dimensions that are related to another dimension but do not fit into the main dimension table. Outriggers are used to simplify the dimensional model and improve performance.
Semantic Layers
Semantic layers are an abstraction layer that sits between the data warehouse and the end-users. The semantic layer allows users to access the data in a business-friendly way without needing to understand the underlying data model.
Data Lake
A data lake is a repository that stores data in its raw form, including structured, semi-structured, and unstructured data. Data lakes are used to store large volumes of data for later analysis.
Data Warehouse
A data warehouse is a centralized repository of data that is used for reporting and analysis. Data warehouses are optimized for query performance and typically use a dimensional data model.
Data Mart
A data mart is a subset of a data warehouse that is designed to serve a specific business unit or department. Data marts are optimized for specific business needs and are easier to manage than a full data warehouse.
OLAP Cubes
OLAP (Online Analytical Processing) cubes are multidimensional structures used for analyzing data. OLAP cubes are designed to handle large volumes of data and provide fast response times for queries. OLAP cubes use a dimensional model and provide users with multiple dimensions to slice and dice the data.