Foundations of Semantic Data Modelling with Star Schema, Tabular Model and Power BI
Why read this article?
Because these are the foundational things that I wish, that I had known from the outset of developing semantic data models, using star schema and dimensional modelling techniques. Being critically aware of these foundations makes everything in tabular modelling so much clearer and more straight forward.
At the time of writing, I’ve been working with tabular models in Power BI service for around 3 years. Knowing the things in this article would have made the journey so much more direct. These things are also foundational, basing semantic modelling on these things leads to simpler, faster and more usable models.
There are many excellent articles available on modelling, and we’ll refer to those, but, for me at least, I missed the important points and context provided by this article.
Remember the customer?
Semantic data models are models for business users, with business meaning.
Business meaning sits in the context of business process.
Thus, semantic data models must be user oriented, have user friendly names and be as easy as possible to use. So Expense, and not FACT_Expense or ExpenseFact etc. They are built for an end user and therefore must be built by working with said user. They must answer the questions that a business process needs to answer. End users will often ask for things like extra columns “just-in-case” or from cross purposes and different processes. It is the modeller’s job to gain understanding with the user and to provide explanations so that clear scoping is achieved. This is often the most important step before modelling work begins.
An example of working with a customer follows. Details of the table types described in this interaction will become clear by reading the article to the end.
Customer, referring to a semantic data model that is built in a star schema, around an accumulating snapshot fact table:
We love this Order Model, it makes tracking our orders process very straight forward. Can we add all of the payment transactions for each order?
Modeller, describing that the request can be best fulfilled with a transactional fact table:
Yes, we can add another fact table and link it using the common dimensions, like the Order Number.
Why Star Schema?
Star schema has been proven repeatedly to create simple, performant models that answer business questions.
Correctly selected star schema(s) will keep the calculations (e.g. DAX) much simpler. If the DAX starts to get very complex then there is a high chance that something is wrong with the model selection, fit for purpose, or design/implementation.
Tabular models in Power BI are no exception, this is why Microsoft recommends star schema for Tabular models in Understand star schema and the importance for Power BI
Why Kimball & Dimensional Modeling?
Because these are proven patterns that a person will naturally find after quite some experience, or we can simply take the shortcut and start using the patterns and gain immediate understanding and success.
Rule #1: Load detailed atomic data into dimensional structures.
Rule #2: Structure dimensional models around business processes.The 10 Essential Rules of Dimensional Modeling
Again, consolidated fact tables are a complement to the detailed single-process fact tables, not a substitute for them.The 10 Essential Rules of Dimensional Modeling
Selecting the Grain and Fact Table Types
The two concepts of grain and fact table type fit inextricably together. The first part of ensuring the same grain on a fact row becomes quickly obvious. What is less obvious, is that the grain is also the fact table type.
Rule #4: Ensure that all facts in a single fact table are at the same grain or level of detail.
There are three fundamental grains to categorize all fact tables: transactional, periodic snapshot, or accumulating snapshot.The 10 Essential Rules of Dimensional Modeling
Knowing what type of fact table we need, and must build, leads to a clear and simple understanding, and an easy to apply pattern. Instead of getting lost in discussions with one’s self, or others, that don’t make sense!
It is highly recommended to view The Three Types of Fact Tables for illustrated examples before continuing.
A transactional fact table may have many, many, many rows and lends itself well to incremental refresh (the fact rows don’t change after they have been correctly stated). This is the ‘classic’ fact table with numerical measures and foreign keys to dimension tables. An example would be an Expense or Sale fact table.
A periodic snapshot fact table has a clear grain at the period, this is an aggregation of a transactional fact table that is useful when only aggregated answers are required, and the questions can’t be answered by a transactional table type with acceptable performance. An example would be an Expense By Month fact table where expense transactions are grouped by month.
An accumulating snapshot fact table is readily recognised by the many date dimensions and status dimensions. Expect junk dimensions and lag measures. The fact rows change on events. Each row reflects a transaction moving through the, relatively long-lived, pipeline of a business process. The question of lag between events is answered by the difference between dates and times (note that timestamps must be split into date dimensions and time columns on the fact table). This table type can be optionally enhanced to have temporal (longitudinal) snapshots by adding rows on events with valid from/to dates to create a special table sub-type. An example would be a Work Order or Claim fact table.
Build an accumulating snapshot fact table when:
We need a fact table to measure the process.
…business users are most interested in seeing the current state of the process.
“Claim” feels like a dimension, but at the same time, it can behave like a fact table.https://www.kimballgroup.com/2011/11/design-tip-140-is-it-a-dimension-a-fact-or-both/
Keep in mind that:
Often the best design response is a combination of two, or even all three, fact table types.https://www.kimballgroup.com/2014/06/design-tip-167-complementary-fact-table-types/
Multiple Fact Tables
Recall again, that detailed single-process fact tables are a required building block. When the fact table type is selected correctly then the resulting semantic data model can, and will, answer the questions required by a specific process.
A different business process however, even one that uses the same basic data, will likely require another star schema if the answers don’t easily ‘fall out’ of an existing star schema.
Data models and dimensional data warehouses are collections of one or more star schema, ideally with common (conformed) dimensions.
There are certainly valid cases when multiple fact tables must exist and be related in a semantic data model in order to compare them. For example, budget and actual expense facts in a financial model. These will typically be at different grains (and table types), typically these two, or so, fact tables can be related by their common (conformed) dimensions. In the financial model example, expense will typically need to be aggregated along the same dimension and to the same grain for comparison with the budget. For this, we should take note that aggregation is along dimensions. Note also that keys are dimensions too. This is what factless fact (simply records an event or coming together of dimensions) and bridging (fact-bridge-dimension, when M:M i.e. multi-valued dimensions per fact) tables do, they relate tables along common dimensions.
Without any other information you must assume all fact tables have a M:M relationship between them, so if you know it is 1:M how bad can it be?https://kimballgroup.forumotion.net/t2487-relation-between-two-fact-tables-having-a-one-to-many-relation
Another case appears to arise with head/line, header/detail or parent/child normalised tables. These perform exceptionally well in operational systems, but under-perform in analytical models. The reality here is that if the detail is required by the business process (and therefore the model) then the grain must be lowered to the atomic level of the line or detail. The tables must be denormalised.
Note that a limited length text description is a dimension, not a fact, and may be treated as a degenerate dimension i.e. as a column on the fact table.
If/when absolutely required, extended/long text comments are a good candidate for direct query and compute enabled dataflows in Power BI service. Indeed this is an exception to the header/detail rule, as the relationship will be from the fact to the comment (i.e. detail) table. The comment table is not able to be used as a dimension in this case.
“One premise of dimensional design is a fact table should stand on its own. The ‘child’ fact should contain all appropriate dimensions that are found in the ‘header’ fact. You should not need to join facts if all you want to report is balances. The need to join facts should always be an exception rather than the rule.” – join in this quote actually refers to relate in dimensional modelling; it is entirely valid to join in the SQL sense in order to create the required fact table(s).
“You do not have FK relationships between facts, that is relational modeling, not dimensional modeling.”https://kimballgroup.forumotion.net/t2487-relation-between-two-fact-tables-having-a-one-to-many-relation
In conclusion, we can see how simple the conversation with the customer became. The customer interaction described at the beginning of this article could have caused a lot of confusion (and a model that became a complicated mess). But the modeller was acutely aware of the need to select the correct fact table type(s). The modeller immediately identified that the scope of the current star schema was to answer questions about the relatively long-lived order process, and that another star schema with a different fact table type was required to easily answer the order amount by month questions. In this case there is no valid reason to even attempt to create a relationship between the two fact tables or the two star schema. The two star schema could optionally be bundled into the same model or dataset to enable creating visualisations from both star schema in the same report, with appropriate documentation and diagrams showing the two independent star schema.
Great article and summary of the foundational data modelling steps.
An additional benefit of adhering to best practise models is that the DAX code required to create measures & calculated columns becomes simpler.
Indeed! Expanded article to specifically include the benefit of simpler DAX.