A couple of weeks ago I was teaching a course on Power Query. I distributed a bunch of sample files, started to walk the attendees through the steps, and a hand went up.
“I get an error in my date column…”
For a second it stumped me… I tested my examples 15 times before pushing them out to the class. Why now?
As it turns out it’s that pesky issue that drives many non-North Americans crazy. I keep my regional settings set to use the MM/DD/YYYY format. I just find it makes life a lot easier for me. What we ran up against here was a user who was using the Canadian standard of DD/MM/YYYY. Yuck.
I promised a way to show how to fix it, and am finally getting around to posting it…
Replicating the Issue
The file I’m working with here is a csv file (saved in a txt format) that uses the DD/MM/YYYY format, as that replicates the issue for me. So I import it and end up with this:
So what’s the problem? Well, if you look at the areas I circled in red, the ChangedType step changed the data type for the TranDate column to Text. That’s no good. So I highlight the TranDate column and change it to Date:
Great. An error.
So this is what I’m talking about. The dates above have been converted to dates, but not the 1st of February, 2nd of February and so on. Instead it’s January 2nd, February 2nd, etc. As soon as it gets to the 13 it gags as there is no 13th month.
This is not an uncommon issue, by any means. If you database exports in one format and your system settings are in another, you’ll run into this. Multinationals that exchange files across borders see this all the time (just look in any forum for the amount of help posts on the topic.)
I REALLY wish that Power Query were smarter about this. I also wish there was an easy way to just tell Power Query which date format your data (not your system) was using, or an easy way to convert it. Anything to let me do something in one click versus what we need to do to fix this. (As an added insult, the old text import wizard DID let you declare this. Go figure!)
All right, so how do we fix it? I’m sure there are a variety of ways, and hopefully a new ribbon will come along to make this obsolete, but here is ONE way.
Step 1: Remove the garbage
First I deleted the “Changed Type” step of the query.
Step 2: Split by the / character
Next I selected the TranDate column, went to the Transform Tab (if you don’t have it download the latest Power Query update), chose Split Column, then By Delimiter, then Custom. I put in a / and clicked OK.
As an aside here, I wish Power Query suggested what it thought you wanted to use as the delimiter, without actually clicking OK for you. In this case I think it’s pretty obvious, even though it’s not in the list of defaults. Granted, I might want to override it with something crazy like a 0, which is why I wouldn’t want it to just click OK for me, but it might save me time if it made an intelligent suggestion.
At any rate, I end up with this:
Step 3: Rename the columns
This part is important for the next piece. Rename your columns to reflect which part of the date they belong to. You can do this by right clicking the column header and choosing Rename… (Alternately you can edit the M code in the formula bar manually, but you need to edit the “SplitColumnDelimiter” step first, then edit the ChangedType step to reflect the new column names. Probably easier to just right click and rename them:
Step 4: Putting the date together… correctly
OK, now the tricky part. We need to insert a column and write a Power Query formula. To do this, go to the Insert tab, and click Insert Custom Column. You’ll be presented with a dialog to write your formula. And here’s the formula you need:
Sadly, there really isn’t a lot of documentation yet that explains this stuff. I believe that the # sign tells PowerQuery that this is a data type, the “date” portion determines what type it is, and then we feed it the components that make it up (for each row). I also changed the name at the top of the dialog to “Date” and clicked OK:
Step 5: Force the data type
I’m not sure why this is, but even after declaring this as a date, it formats the column visually as a date, but the data type is left blank. For whatever reason, I had to select the column and force the data type to a date in order to make it recognize it properly as such.
Step 6: Cleanup
So this part is optional, but you can now delete the Day, Month and Year columns if you don’t need them any more. Unlike standard Excel, this move won’t leave you with a bunch of #REF! errors down the Date column. 🙂
The implications of this are somewhat interesting to me. Let’s assume that your data goes the other way. I send you a file in MM/DD/YYYY format, and you use DD/MM/YYYY format. You should be able to follow all of the above steps, with the only difference being which column you identify as which when you rename them. (In this case it would go Month Day Year after splitting them.)
I’d love to get some feedback from others on this, but I believe the formula in the custom column should work for you without modification as long as the columns are named correctly.
I’ve attached two files to this post for testing purposes. Right click the links below and SaveAs to get copies to play with:
Try them both, you should be able to generate a correct date for your format from either using these steps.