Have you ever set up a nice query to UnPivot other columns, only to find that the query data types change when you add new columns? This post will cover why values become text after unpivoting other columns.
We’ve got a nice little table called “Data” showing here. Nothing special, it just summarizes sales by region by month, and our goal is to unpivot this so that we can use it in future Pivot Tables. (You can download the source file here.)
Now, you will notice that April’s sales are outside the table. This is by design, and we’ll pull it in to the table later when we want to break things.
UnPivoting Other Columns – The Hopeful Start
If you’ve been following my blog for any period of time, you’ve seen this, but let’s quickly go over how to unpivot this:
- Select a cell in the table
- Go to Power Query (or Data in Excel 2016) –> From Table
We’re now looking at the Power Query preview of the table:
Great, now to unpivot…
- Hold down the Shift key and select the Country and Prov/State column
- Right click the header of either of the selected columns and choose Unpivot Other Columns
- Right click the headers of the two new columns and rename them as follows:
- Attribute –> Month
- Value –> Sales
Re-Pivoting from the Data Model
With the table complete, I’m going to load this to the data model and create a Pivot Table:
- Go to Home –> Close & Load –> Close & Load To…
- Choose to Load to the Data Model
The steps to create the Pivot depend on your version of Excel:
- Excel 2013: Go in to Power Pivot –> Home –> PivotTable and choose a location to create it
- Excel 2016: Click any blank cell and go to Insert –> PivotTable. As you have no data source selected, it will default to using the data model as your source:
With the PivotTable created, I’ve configured it as follows:
- Rows: Country, Prov/State
- Columns: Month
- Values: Sales
And that gives me a nice Pivot like this:
Let’s Break This…
Okay, so all is good so far, what’s the issue? Now we’re going to break things. To do that, we’re going to go back to our original data table and expand the range:
In the picture above, I’ve left clicked and dragged the tiny little widget in the bottom right corner of the table to the right. The table frame is expanding, and when I let go the Apr column turns blue, indicating that it is now in the boundaries of the table.
With that done, I’m going to right click and refresh my Pivot Table, leaving me with this:
Huh? Why was the sales measure removed? And if I drag it back to the table, I get a COUNT, not a SUM of the values? And even worse, when I try and flip it back to SUM, I’m told that you can’t? What the heck is going on here?
Importance of Power Query Step Order
To cut to the chase, the issue here is that when we first created the table in the data model, the Sales column was passed as values. But when we updated the data to include the new column, then Sales column was then passed entirely as text, not values. Naturally, Power Pivot freaks out when you ask for the SUM of textual columns.
The big question though, is why. So let’s look back at our query.
Our original data set
If we edit our query, we see that the steps look like this:
To review this quickly, here’s what happened originally
- Source is the connection that streams in the source data with the following columns:
- Changed Type set the data type for all the columns. In this case the Country and Prov/State fields were set to text, and the Jan, Feb & Mar columns were set to whole number. We can see this by looking at the icons in the header:
Note that if you don’t have these icons, you should download a newer version of Power Query, as this feature is available to you and is SUPER handy
- We then selected the Country and Prov/State columns and chose to Unpivot Other Columns. Doing so returned a table with the following headers
Notice that the first three columns are all textual, but Sales is showing a numeric format? Interestingly, it’s showing a decimal format now, but it shows the numeric format because all unpivoted columns had explicitly defined numeric formats already.
The final steps we did was to rename our columns and load to the data model, but the data types have been defined, so they were sent to the data model with Sales being a numeric type.
Why Values Become Text After UnPivoting Other Columns
Okay, so now that we know what happened, let’s look at what we get when we step through the updated data set.
- First we pulled in all the columns. We can plainly see that we have the new Apr column:
- The Changed Type step is then applied:
Hmm… do you see that last data type? Something is off here…
So when we originally created this query, Power Query helpfully pulled in the data and applied data types to all the existing columns. The problem here is two-fold: First, the Apr column didn’t exist at the time. The second problem is that Power Query’s M language uses hard coded names when it sets the data types. The end effect is that upon refresh, only the original columns have data types defined, leaving the new columns with a data type of “any” (or undefined if you prefer).
- We then unpivoted the data, but now we see a difference in the output
Check out that Value column. Previously this was a decimal number, now it’s an “any” data type. Why? Because there were multiple data types across the columns to be unpvioted, so Power Query doesn’t know which was the correct one. If one was legitimately text and Power Query forced a numeric format on it you’d get errors, so they err on the side of caution here. The problem is that this has a serious effect on the end load to Power Pivot…
- Finally, we renamed the last two columns… which works nicely, but it doesn’t change the data type:
Okay, so who cares, right? There is still a number in the “any” format, so what gives?
What you get here depends on where you load your data. If you load it to the Excel worksheet, these will all be interpreted as values. But Power Pivot is a totally different case. Power Pivot defaults any column defined as “any” to a Text data type, resulting in the problems we’ve already seen.
Fixing the Issue
For as long as we’ve been teaching our Power Query Workshop, we’ve advocated defining data types as the last step you should do in your query, and this is exactly the reason why. In fact, you don’t even need to define your data types in the mid point of this one, that’s just Power Query trying to be helpful. To fix this query, here’s what I would recommend doing:
- Delete the existing Changed Type step
- Select the final step in the query (Renamed Columns)
- Set the data type for each column to Text except the Sales column, which should be Decimal Number (or currency if you prefer)
When this is re-loaded to the Data Model, you’ll again be able to get the values showing on the Pivot Table as Sum of Sales.
Avoiding the Issue
Now, if you don’t want Power Query automatically choosing data types for you, there is a setting to toggle this. The only problem is that it is controlled at a Workbook level, not at a global Excel level. So if you don’t mind setting it for every new workbook, you can do so under the Power Query settings:
Is Changed Type Designed in the Correct Way?
It’s a tough call to figure out the best way to handle this. Should the data types be automatically hard coded each time you add a new column? If the UnPivot command had injected a Changed Type step automatically, we wouldn’t have seen this issue happen. On the other hand, if a textual value did creep in there, we’d get an error, which would show up as a blank value when loaded to Power Pivot. Maybe that’s fine in this case, but I can certainly see where that might not be desirable.
Personally, I’d prefer to get a prompt when leaving a query if my final step wasn’t defining data types. Something along the lines of “We noticed your final step doesn’t declare data types. Would you like me to do this for you now (recommended)” or something similar. I do see this as an alternate to the up-front data type declaration, but to be honest, I think it would be a more logical place.