On this blog, I showcase a lot of different techniques for manipulating and reshaping data. For anyone that follows the blog, you already know this, and you know it's a pretty important topic to me. But the thing we shouldn't lose site of is WHY we do this. It's to drive analytics. I'm fairly convinced that the majority of the loyal readers here already know this. Thus, I wanted to ask your opinion on something...
How do you design your data model?
What I'm specifically interested in is how you approach designing the Fact and Dimension tables you use for your Power Pivot model. And I'm not specifically talking about Power Query here. We all know you should be using what you learned from our recently relaunched Power Query Academy to do the technical parts. 😉
What I'm more interested in is the thought process you go through before you get to the technical bit of doing the data reshaping.
If you read books on setting up a data model, you'll probably be told that you need to do the following four steps:
- Identify the business process
- Determine the grain of the model
- Design your Dimension tables
- Design the Fact tables
So if you're asked "how do you design your data model", do these steps resonate with you, and why?
Do you consciously sit down, and work through each of these steps in order? I suspect that many self-service BI analysts skip the first step entirely as they are implicitly familiar with their business process. (As a consultant, I ask a lot of questions in this area to try and understand this before building anything.)
Do you design the reports on paper, then work backwards to the data you'll need, go find it and reshape it? Or do you go the other way, trying to collect and reshape the data, then build reports once you think you have what you need?
Do you explicitly define the model grain? And if you do, what does that mean to you? Is it restricted to "I want transactions at an monthly/daily/hourly basis"? Or do you do deeper like "I want transactions at a daily basis and want to break them down by customer, region and product"?
Why the question?
There's actually two reasons why I'm asking this question:
Reason 1 is that I'd I think healthy discussion makes all of us better. I'd like to hear your thoughts on this as I'm probably going to learn something that I haven't discovered in my own learning journey.
Reason 2 is that my whole business is around teaching people how to do these things, and I'm always looking to make things clearer. The more opinions I hear (even if they contrast with each other), the more I can help people understand his topic.
So sound off, please! We'd all love to hear how you approach the task of building a data model.