More and more I’m seeing examples where people are trying to extract data from a mixed column. In other words, they have two data types in a single column, but need to find a way to extract one from the other.
Examining the issue
The sample data I’m using can be downloaded from this link.
I’m going to use Power BI Desktop for this, but the results will look identical in Excel using Power Query (except for the colour, of course.)
So let’s get started:
- Get Data (new Query in Excel) –> From CSV –> MixedDataInColumn1.csv
- Promote First Row as Headers
The issue can be seen in the red circles below… the report author injected the name of each vendor for the parts above their first part in the list.
So the issue here is how to extract the vendor name from Part No column. The problem is that there isn’t any obvious way to do this. We have different textual values in all columns, which could change over time. There’s really nothing that we can test for reliably in this case.
How to Extract Data from a Mixed Column
There are actually a few different ways to extract data from a mixed column… a few of which we demonstrate in our Power Query workshop. I’m going to show just one here.
Step 1 – Identify a column with a pattern you can exploit
The key we are really looking for is a column which has values or dates for all rows other that the one with our vendors. In this case we actually have two: Part No and Cost. Both have text on the Vendor lines, but what looks like values on the rest. The challenge we have here is that we can’t always guarantee that Part No won’t have text in it. It’s completely possible we could see a part number like TH-6715 or something. So this leaves us with the Cost column.
Step 2 – Duplicate the identified column
This next set of steps is actually the trick that lets us work this out.
- Right click the column in question and choose Duplicate Column
- Right click the Cost – Copy column –> Change Type –> Whole Number
- Right click the Cost – Copy column –> Replace Errors –> null
You should now have null values where the textual values were located:
Step 3 – Use a little conditional logic
We now have something that we can use in order to extract the Vendor name. So let’s build a little bit of conditional logic:
- Add Column –> Conditional Column
- Configure the Conditional Column as follows:
The only trick here is to make sure you change the Output to a column so that you can select from the list of columns.
- Click OK
- Right click the Vendor column –> Fill Down
The result is shown below:
Step 4 – Clean up
We’re now at the point of clean up which entails:
- Filter the Cost – Copy column to remove null values
- Delete the Cost – Copy column
- Set the data types on all columns
The results now look as follows:
At this point we can commit the query and we are good to go.
This is not a new trick by any means; I’ve been using it for a long time. The biggest key is really about identifying patterns and thinking outside the box.
It’s unfortunately very easy to get focused on the primary column we want to solve, and lose site of the others. (Trust me, I’ve been there too.) Sometimes though, when a column is particularly tough to deal with, we need to just step back and take a look at the bigger picture to see if there is a pattern in another column that we can exploit. In fact, I’d say that this is probably one of the most important things to master when working with Power Query.