Data From Different TimeZones

A friend of mine emailed yesterday asking how to compare data from different timezones.  With how good the UI is in Power Query, you’d think this would be easy.  Unfortunately it’s a bit less than that, so I thought it would make a good example for today’s post.

Background

Let’s assume that we’ve got two columns of data; an Order Date and a Shipping Date.  We’d like to work out the number of days it took to ship our order.  Easy enough, we just need to subtract one from the other… except… the system that holds the Order Date reports it in UTC +0:00, and the shipping date is done from my home time zone (UTC –7:00).

The data table we’re starting with looks like this:

image

And you can download a copy of the workbook from my OneDrive here if you’d like to follow along.

Avoiding Temptation

So the first thing to do is pull the data in to Power Query.  So I clicked in the table, went to the Power Query tab, and chose From Table.  At this point we’re greeted with a nice table, and our first temptation is to go directly to the Transform tab and set the Data Type to Date/Time/Timezone:

image

And herein lies a problem.  The system has forced my local TimeZone on the data.  As specified in the initial problem, I need this to carry a UTC +0:00 distinction.

It’s a shame that there is no intermediate step here (how often do I ask for MORE clicks?) which allowed you to specify WHICH TimeZone.  If you’re into working with data from different regions (I.e. this feature), I’d don’t think I’m venturing out on a limb to say that this is pretty important.

To further complicate things, that is the extent of the TimeZone functionality in the UI.  And that’s not going to help us.  So let’s knock off the “Changed Type” step and look at this another way.

Using M to Deal with Data From Different TimeZones

The secret to making this work is to take explicit control of the time zone settings using some Power Query M functions.  It’s not as hard as it sounds.  In fact, we’re only going to use two in this example:

  • DateTime.AddZone to add a time zone to a DateTime data type
  • DateTimeZone.SwitchZone to convert from one time zone to another

I discovered both of these functions by searching the Power Query formula categories article on Microsoft’s site.

Forcing a DateTime to a Specific Time Zone

So we’re currently looking at this data in Power Query:

image

Let’s create a new column to convert the OrderDate:

  • Add Column –> Add Custom Column
    • Name:  Order Date (UTC +0:00)
    • Formula:  =DateTime.AddZone([OrderDate],0)

The secret here is in the last parameter, as we get to specify the time zone.  Since we know these dates/times come out of our system in UTC +0:00, we’re good to not add anything to it.  The result is shown below:

image

Converting a DateTime to a Different Time Zone

Now, in order to be able to compare our DateTimes easily, we want them both to be based in our own time zone.  Since my business works in UTC –7:00, I really want my Order Date represented in that time zone as well.  So let’s convert it.

  • Add Column –> Add Custom Column
    • Name:  Order Date (UTC -7:00)
    • Formula:  =DateTimeZone.SwitchZone([#"OrderDate (UTC +0:00)"],-7)

SNAGHTML1f066319

Beautiful.

Just a note here… It may have been tempting to force this data to UTC –7:00 when we added the time zone above, but that would have assigned the date based in the wrong time zone.  I.e. our first record would have returned 7/4/1996 1:12:00 PM –07:00, which is not the same as what we ended up with.

Forcing another DateTime to a Different Time Zone

Now we need to deal with the ShippedDate column, forcing that to my local time.  I could just select the column and turn it into a Date/Time/Timezone data type, but I won’t.  Why?  What if I send this workbook to another user?  It will return THEIR time zone, not mine.  And that could be different.  Much better to explicitly set it.

  • Add Column –> Add Custom Column
    • Name:  ShippedDate (UTC –7:00)
    • Formula:  DateTime.AddZone([ShippedDate],-7)

Notice that this time we do force it to be in the –7 time zone, as these DateTimes originated from that time zone. The result:

SNAGHTML1f0e9e46

Fantastic.  We’ve added time zone data, without changing the original times.

Let’s just go do a little bit of cleanup now:

  • Select the OrderDate and ShippedDate columns
  • Transform –> Data Type –> Date/Time
  • Select OrderDate (UTC +0:00) through ShippedDate (UTC –7:00)
  • Transform –> Date Type –> Date/Time/Timezone

Excellent.  Now they should show up correctly when we load them to an Excel table instead of losing their formatting.

Making Comparisons

We’re at the final step now: Working out the time to ship.  This is relatively straight forward:

  • Add Column –> Add Custom Column
    • Name:  Days to Ship
    • Formula:  [#"ShippedDate (UTC -7:00)"]-[#"OrderDate (UTC -7:00)"]
  • Select the Days to Ship column
  • Transform –> Data Type –> Duration

Note:  You can just double click the column names in the formula wizard and it will put the # characters in there for you.

And the final look in Power Query:

SNAGHTML1f1602fa

With that all complete, the final step is to give the query a name (I chose ShippingTimes) and load it to a worksheet:

image

Final Thoughts

Personally, I like to take explicit control over my data types.  Call me a control freak if you like (I’ve been called much worse) but relying on implicit conversions that set to “local time” scare me a bit, particularly if I’m going to be sending my workbook off to someone who lives in a different zone than I do.  Once you know how to do this it’s not super difficult, and I now know EXACTLY how it will represent on their side.

I’ll admit also that I’m a bit disappointed in the UI for datetime conversions.  To me, anyone playing in this field needs very granular control over every column.  An extra step in the Transform to Date/Time/Timzone step would go a long way to solving this, as you’d be able to skip writing custom formulas.  Hopefully that’s on the Power Query team’s radar for the future, as well as a full datetime menu that would allow us to easily choose from/add/convert to the majority of the formulas found in the article referenced above.

Power Query Training

Also don't forget.  If you love Power Query or are intrigued by the things you can do with it, we have an online training course coming up soon.  Check it out and register at www.powerquery.training/course

What Power Query Functions Exist?

I know that this topic has been covered before by others, but I think it’s still pretty valuable for a user to be able to figure out what Power Query functions exist, especially since they are often different than what we’re used to in Excel.

NOTE:  This article was updated 2015-05-20 at the request of a reader to include more coverage on implementing the discovered function into the solution.

 

Power Query Functions Documentation on the web

There’s a pretty good resource site available on the Microsoft Support site.  Personally I have that one bookmarked and head over there often when I’m looking for a new function to do something.  I find that with a quick CTRL + F on the page, I can quickly search and narrow in on the function I think I need in order to learn it’s syntax.

To be fair, I’m not always in love with the actual examples (many lack a power query UI view), but overall the site is fairly useful.

Power Query Functions Documentation in the client

Now that’s all good, but what if you’re working on a plane with no WIFI, and you need to figure out the syntax for a new function?

As luck has it, there is a way to pull up the list for most functions right in the client.  To do this, I:

  • Clicked Add Column –> Add Custom Column
  • Typed a 1 and clicked OK
  • Went to the Power Query formula bar and typed the formula below.  (Notice that this is case sensitive)

 =#shared

(Why the custom column? Because typing in the formula bar replaces the previous step, and I want to be able to revert to that since it’s part of my logic:

image

Now, you’ll see you get a list of (almost) all the functions that you can access:

SNAGHTML894ffc53

Now, let’s assume I’m trying to find a formula to remove certain characters from a text string.  I really need to search for “Text.”, but there isn’t a search option.  No big deal, let’s convert this list into a table:

SNAGHTML89522a10

Once we’ve done that, we get a nice table of all of the functions, and we can filter them to our heart’s content.  Here’s my table filtered down to just rows that begin with “Text”:

SNAGHTML897b95c3

And a page or so down, I found something that looks like it might work:  Text.Remove.

Investigating the Function Syntax

I clicked on the green Function beside the Text.Remove entry.  It pops up an Invoke Function box, and behind that is the syntax for how it’s supposed to work.  So that’s pretty cool.  I tried it out with some text, as shown below:

txtremove1

Clicking OK returned the following:

txtremove2

Now this is a bit… weird… and frustrating. Value? Why Value? (I actually don’t know why, you’d think it would have been the function name, wouldn’t you?)

I stepped back to the Value step of the query, as I wanted to look at the syntax page that popped up behind the Invoke Function dialog:

image_thumb.png

 

My only complaint here is that once you land in this window, the only indicator of the actual function name is in the smallest font on the page, buried in the middle. You’d think that the name would should up a little more prominently. Regardless, I copied the name of the function, then stepped back to the Invoked FunctionValue step and replace Value in the formula bar with the function name:

txtremove3

Perfect, it works.

Implementing the Function in the Solution

Now let’s see if I can get it into my original query. To do that I:

  • Copied that entire line of M from the formula bar,
  • Selected the Source step (I wouldn’t be able to do this if I had typed #shared while I had the Source step selected originally),
  • Choose to Add New Column –> Add Custom Column –> Accept the inserted step,
  • Pasted the copied M into the formula area, and
  • Replaced the original text (“My –Dog –Has –Fleas”) with the name of the appropriate column from my data set.

Visually, it looks like this:

txtremove4

 

And then I checked the query to see that it worked:

txtremove5

 

Cleanup

Now that I’ve been able to explore the functions and found and implemented the one I’d like to use, I can just knock off the extra steps shown below in yellow, returning me back to my next step:

txtremove6

 

 

 

Learning more about Power Query functions

For reference, this is one of the many things that Miguel and I will be covering in our upcoming Power Query training workshops.  Learn more about the workshop and register here:  http://powerquery.training/course/

Announcing Online Power Query Training!

I’m really pleased to announce that a new project I’ve been working on is live: powerquery.training, a website that offers online Power Query training!

online Power Query training

PowerQuery.Training

About PowerQuery.Training

PowerQuery.Training is a joint effort between Miguel Escobar (of PoweredSolutions.co) and myself: the two guys who are writing that Power Query book – M is for Data Monkey book.

M is for Data Monkey

M is for Data Monkey

What you can find at PowerQuery.Training

We believe that Power Query is a super important tool in the toolbox of every Excel user out there.  It's so easy to use, and so powerful, that we think everyone should know about it.  To that end, we've decided to include the following two areas on the site:

Power Query patterns

One of the areas of focus is to showcase Power Query patterns (techniques).  These patterns are well illustrated articles with supporting workbooks, which will help you build solutions for common scenarios.  There will be more coming as we have time to add them, but today you'll already find:

Online Power Query Training

We know that not everyone wants to read pages of documentation, trying to figure out how to use a new technology.  Some people want to carve out a block of time, link up with an instructor, and be taught the basics, and how to avoid the inevitable pitfalls.  That's our aim with our online Power Query training workshops.

That’s right, there will be live online offerings in order to help get you skilled up on Power Query without ever leaving the comfort of your own office!  All you need to do is dedicate 3 days of your time – well okay… and some space on your credit card – and you’ll be off to the races with this software, taming and automating the cleanup and refresh of your data.

Interested in seeing what the course covers?  Download the Course Agenda here.

Load Power Query directly to Power Pivot in Excel 2010

One of the cool features in Excel 2013’s Power Query is being able to load to the Data Model (PowerPivot) directly.  But Excel 2010 doesn’t appear to have this feature.  Interestingly, you can still load Power Query directly to Power Pivot in Excel 2010, it just takes a bit of a careful workaround.

Let’s look at the required steps

Step 1: Create Your Connection

First, I’m going to load in the content of a text file.  So I:

  • Go to Power Query –> From File –> From Text
  • I browsed to the file I needed, and imported it into Power Query
  • I do whatever cleanup is needed and name the query Sales
  • Next, we go to the Home tab –> Close and Load –> Close and Load To…

And here’s the important part:

  • Choose “Only Create Connection” –> Load

And I’ve now got a basic connection to my sales table without landing it in a worksheet:

image

Step 2: Grab the Connection String

Now, here comes the secret.  We need to get the connection string that Excel uses to connect to the Power Query.  Here’s how:

  • Go to the Data tab –> Connections

In there, you’ll see the name of your new connection:

image

  • Select your Query and click Properties
  • Click the Definition tab

Now you’ll be looking at something like this:

image

Notice that this query is actually an OLE DB Query that is simply “SELECT * FROM [Sales]”  That seems easy to work with.  But the key for us is the connection string shown (#2 in the image above).

  • Select the ENTIRE connection string
  • Press CTRL + C to copy it
  • Click Cancel

Note:  Make sure you start at “Provider=” and highlight all the way to the end.  (It’s much longer than what you see in that little box.)

Load Power Query directly to Power Pivot

Finally, we’re going to pull this into Power Pivot.  To do this:

  • Go to the Power Pivot tab –> PowerPivot Window
  • From Other Sources –> Others (OLEDB/ODBC) –> Next

image

  • Name your table
  • Paste your Connection String in the box

image

  • Click Next –> Next –> Finish –> Close

And voila!  We have our Power Query linked directly into Power Pivot in Excel 2010!

image

Just remember… if you do this, NEVER modify this table in Power Pivot.  Always go back to modify the table in the Power Query stage.  Failure to do so could set the table into a non-refreshable state.

Native Database Query Security Prompts

In a comment on my Parameter Tables post, Talha asks why we get prompted with a Native Database Query message each time the value of the parameter used in a SQL query changes:

I had a question. The value I grab from my parameter table, I store it in a variable and use it in my SQL query. However, every time I change the parameter, I get the following prompt:

------------------------------------------------

NATIVE DATABASE QUERY

You are about to run a native database query. Native queries can make changes to the database. Would you like to run the following query?

------------------------------------------------

Is there any way I can make this go away?

To answer this, it helps to understand why this message comes up.

Replicating The Issue

When you connect to a SQL server database in Power Query, there are three parts to the initial dialog:

image

Server and Database are pretty straight forward, but the SQL Statement is marked as optional, and is actually hidden under the arrow at the left.  Now, for anyone who has written SQL in the past, you open up that little window, pop in your SQL and click OK.  And you’ll get prompted with a message about trusting a Native Database Query.

On the other hand… if you DON’T fill in a SQL statement, you’ll be taken to a list of all tables and views in the database that you have access to, and will be able to bring those in to start working with them.  Do that, and you’ll NEVER SEE THIS PROMPT.

Why Prompt About a Native Database Query?

A native database query is basically SQL code that you provide to pass to the database.  Power Query treats this as potentially dangerous, so gives you a warning when you do this.  And every time you change the underlying SQL query (pass one parameter to it to change it in some way), the Query is seen as “new” and again, potentially dangerous.

I tend to agree that this is a bit overkill.  I’m not a SQL expert, by any means, but if the query is just a SELECT query, then I’m not sure why we need to be warned about it.  You’d think that would be easy enough for the PQ team to parse out, looking for keywords like ALTER, DROP, UPDATE, DELETE and such, but regardless, that’s the way it is.

Avoiding the Error

Now that we know what is doing it, how do we avoid it?  The answer is actually more simple, but harder to swallow than you might like. Don’t use the Native Database Query functionality!

I really struggled with this at first.  I saw Power Query as a “SQL Sever Management Studio Light” for Excel pros.  It’s not.  It’s totally different. What the PQ team has built is a way for non-power users to be able to query, clean, restrict and reshape their data.  And if you do it their way, then you won’t get this irritating prompt.

In the case of grabbing data from SQL Server, you’d simply connect to the server, grab the table/view you want, then start setting your filters in Power Query to restrict down the data you want.  The UI makes this easy, and it’s a lot easier to tweak the data to get it right than forcing an Excel person to go back and tweak their SQL.

And here’s the cool part… once you’ve done this, Power Query takes advantage of a technology called “Query Folding”, which folds up all the reshaping commands you’ve issued.  It passes those to the SQL server as a single command to bring back your data.  In layman’s terms (although not perfectly technically correct), you can look at this as if Power Query is allowing you a graphical way to write your SQL for you behind the scenes, then pass it to the server.  And because Power Query knows it won’t do anything dangerous, it doesn’t prompt you about potential problems.

What About Performance?

A few months ago I asked someone at Microsoft this question… the thrust of my question was if Power Query would be more or less efficient than passing an optimal SQL query.  His reply was that no, if someone was a SQL Ninja, they could certainly come up with something that performed better in some situations.  Having said that, for the majority of Excel pros, Power Query will do a better job, as we don’t tend to be SQL optimization experts.

Further to this, the very worst thing you can do for performance is provide a base query then try to do further filtering.  Look at these three cases:

Case 1 Use Power Query to:
Connect to table Transactions
Filter to Dept <> 150
Case 2 Native Database Query:
Select * From Transactions Where Dept <> 150
Case 3 Native Database Query:
Select * From Transactions
Use Power Query to filter:
Dept <> 150

Case 1 is the preferred method when dealing with Power Query.  It will roll both steps up into a single statement, and retrieve the requested set from the server.

Case 2 returns the same data as Case 1, but prompts you with a Native Database Query on each computer you run the solution on.  (You can’t even save the connection with the workbook, it’s a user specific trust flag.)

Case 3 is the worst performing.  The reason is that the initial table has to be brought to Power Query in it’s entirety, and THEN you can start filtering.  Why?  Because Power Query cannot query fold it’s commands into the SQL you provided.

Recommendation

Even though connecting to a table, then re-creating all the filtering steps seems less graceful at first, this is the method I’d recommend you take.  In fact, I’ve actually made it a habit to remove any Native Database Queries from project I wrote in the past, replacing them with the recommended method.

To me this offers two benefits:

  1. No security prompting.  May seem minor, but when you put a Power Query error in front of non-tech users, they tend to freak out.  Why put them through that?
  2. It makes the query easier to follow/tweak when you need to debug it later.  And let’s face it, sooner or later you’ll need to do this.

Looking for Power Query samples

How would you like to make your data famous?  Or at least – how would you like a solution built for your own data using Power Query… for free?

I have a couple of Power Query related things I’m working on right now, like these:

Now, if you’ve ever been to one of my courses, you’ll know that one of the most important things to me is to use data that is modelled on real world examples.  That’s right, no canned AdventureWorks samples.  I like data that real users face in the real world.

To that end, I’m looking for Power Query samples.  Specifically, for sample files of data to use as examples for my Power Query blog posts, courses and other educational products.  And while I’ve got lots of great data sets of my own, I’m particularly interested in stuff that I haven’t seen or demonstrated on the blog before.

So if you’ve got a nasty file that’s painful to import, or you have a few files that you’re having trouble combining, or you just can’t figure out a starting point with Power Query, I’d like to have a crack at your scenario.  It’s super easy… you email me the file, I’ll take a look at it, and cook up a solution.  And any new or interesting techniques that are needed become the subject of a future blog post, article, or even sample in the courses I teach.

I hope it goes without saying, but for legal reasons I’m going to say it anyway… any files you send me can and will be used in public.  In order to use them in my materials, by sending them to my you are, by default, assigning me copyright to those specific source files.  Because of that, it is VERY important that the data you include is not sensitive.  To be fair, I’m not interested in your data specifics, as much as the form and shape of it, how it starts out, and what the end product should become.

“Okay, that’s great for you, Ken, but what do we get out of sending you our files?”

How about a working solution?  Does that sound reasonable?  You send me your data, and I send you back something that uses Power Query to complete the goal?

If you’re interested (and who wouldn’t be), here’s what you need to include:

  • A sanitized copy of your source data.  (i.e. Do a mass find/replace on employee names/numbers, phone numbers, addresses and such.  If you have particularly troubling data that is interesting, I may be willing to even help here.)
  • A copy of what you want the output to look like.  (I’ll build to that end goal.)
  • A summary of any “rules” you use along the way to determine if something needs to change one way or another.  (I’d rather not spend my time guessing about your end goals.)
  • Your direction as to whether you’d like your/your company name included or withheld should I decide to publish a post using your data.

In return for that, I will send you a copy of the solution once I’ve crafted it.  This may happen days/weeks before any solution is posted publicly, or it may come in the form of an email to let you know that the post is live, at my discretion.

Please be aware that I also reserve the right to NOT use your submission at my discretion.  This could happen if my inbox gets overloaded with requests, if I’ve already dealt with your scenario on the blog, if your data would be better handled in another fashion, if I get too busy with paying projects, if the data is just too difficult to work with, or any other reason that I haven’t listed. (I’ll do my best, but this is free work, so it’s totally at my schedule and discretion.)

If that sounds interesting to you, my email is ken at excelguru dot ca.  I’m going to limit this to the first 10 data sets that land in my inbox, so if you want to try and take advantage of it, act quick!

Prompt for a Folder at Refresh

I got a comment on my blog the other day from David, asking if we could make Power Query prompt for a folder at refresh, allowing us to choose which folder to should be used to consolidate files.  That sounds like a pretty cool idea, and my first thought was “I think we can totally do that!”

I headed over to Chris Webb’s blog, as he’s done some cool things with Power Query prompts. Ultimately though, I got stuck on one thing… we have to go into the Query to invoke it.  My (albeit limited) tests left me unable to have that prompt when we refresh the query.

Not to be outdone by a lacking feature though, I cooked up a solution using some Power Query and VBA techniques that I’ve shared before.  This post wraps them up into a neat little package.

Background

The idea I’m going for here is to be able to click a button which:

  • Prompts the user to select a folder,
  • Feeds that folder into Power Query, and
  • Refreshes the Power Query output table.

To set it up, I used 3 techniques that I’ve shared before:

Initial Setup

To begin with, I created a blank workbook, and immediately added a Power Query parameter table as outline in Building a Parameter Table for Power Query.  I followed the steps there exactly, so I’m not going to detail that here.  The only things of note were:

  • I named the worksheet “Parameters”, and
  • I used the following for my “File Path” value:  D:\Consolidate\Begin

image

Next, I created a query to consolidate all the files in a folder.  Basically I ran through the following steps:

  • Power Query –> From File –> From Folder
  • Chose my folder:  D:\Consolidate\Begin
  • Did a bit of cleanup on the file
  • Implemented the “fnGetParameter” function call in place of the folder (as described in the aforementioned blog post)

The end result was some M code that looked like this:

let
Source = Folder.Files(fnGetParameter("File Path")),
#"Combined Binaries" = Binary.Combine(Source[Content]),
#"Imported CSV" = Csv.Document(#"Combined Binaries",null,",",null,1252),
#"First Row as Header" = Table.PromoteHeaders(#"Imported CSV"),
#"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",{{"TranDate", type date}, {"Account", Int64.Type}, {"Dept", Int64.Type}, {"Sum of Amount", type number}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Sum of Amount", "Amount"}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Renamed Columns",{{"Amount", type number}}),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type1", {"TranDate"})
in
#"Removed Errors"

And returned results like this:

SNAGHTML250112a0

I then loaded the Power Query to the worksheet, and called the worksheet “Data”.

So far so good?  Nothing really unusual yet.

Laying the Folder Change groundwork

With the basic functionality in place, I wanted to now give the user the ability to change the folder.

Rather than write the Browse For Folder routine from scratch, I headed over the VBAExpress.com and grabbed the one I submitted to the KB… um… a long time ago.  :)

Once I had that code copied I:

  • Opened the VBE
  • Browsed into my workbook
  • Right clicked the project and added a new Module
  • Pasted in all the code (shown below):

Function BrowseForFolder(Optional OpenAt As Variant) As Variant
'Function purpose:  To Browser for a user selected folder.
'If the "OpenAt" path is provided, open the browser at that directory
'NOTE:  If invalid, it will open at the Desktop level
Dim ShellApp As Object
'Create a file browser window at the default folder
Set ShellApp = CreateObject("Shell.Application"). _
BrowseForFolder(0, "Please choose a folder", 0, OpenAt)
'Set the folder to that selected.  (On error in case cancelled)
On Error Resume Next
BrowseForFolder = ShellApp.self.Path
On Error GoTo 0
'Destroy the Shell Application
Set ShellApp = Nothing
'Check for invalid or non-entries and send to the Invalid error
'handler if found
'Valid selections can begin L: (where L is a letter) or
'\\ (as in \\servername\sharename.  All others are invalid
Select Case Mid(BrowseForFolder, 2, 1)
Case Is = ":"
If Left(BrowseForFolder, 1) = ":" Then GoTo Invalid
Case Is = "\"
If Not Left(BrowseForFolder, 1) = "\" Then GoTo Invalid
Case Else
GoTo Invalid
End Select
Exit Function
Invalid:
'If it was determined that the selection was invalid, set to False
BrowseForFolder = False
End Function

With that in place, I just needed to link that function into a routine that can use it…

Rolling the Final Routine

I write a lot of VBA, so this was pretty quick to knock up for me.  The code itself looks like this:

Sub UpdatePQ()
Dim vFolder As Variant
Dim wsParameters As Worksheet
Dim wsData As Worksheet
'Set the worksheets
Set wsParameters = Worksheets("Parameters")
Set wsData = Worksheets("Data")
'Update the folder to import
vFolder = BrowseForFolder
If CStr(vFolder) = "False" Then
'No folder chosen, exit the routine
Exit Sub
End If
'Update parameter table with folder
wsParameters.Range("B2").Value = vFolder
wsData.Range("A1").ListObject.QueryTable.Refresh BackgroundQuery:=False
End Sub

In short, this routine basically:

  • Sets references to the two worksheets
  • Prompts the user for the folder.  The output here could be a folder path or “False” which means the user cancelled.  If it’s “False” we bail out
  • If the path was provided, it gets placed in B2 of the Parameter worksheet (which is the “Value” field for the “File Path” variable.  (So if you didn’t create your table starting in A1, then you’d need to update this.)
  • Finally, we refresh the Power Query output table which I landed in cell A1 of the Data worksheet.

Final Setup Step

The very last step is to link the UpdatePQ macro to a button.  This is fairly straight forward as well, and the steps are illustrated well in the Refresh Power Query with VBA post.

Make Power Query Prompt for a Folder at Refresh

And now, the only thing we need to do is click a button…

image

Choose a folder…

SNAGHTML2525f706

And the query will refresh the data in the landing page.

Try it out

You can download the completed workbook, as well as two folders with the source data, from my Skydrive. Click here to do so.

Caveats

I hope it goes without saying that the data structure of the files in the selected folders must be the same.  If they’re not, the query load will fail, of course.  But so long as you’re just trying to deal with different months/years, this should work nicely.

Consolidating Worksheet Print Areas With Power Query

I’ve posted on consolidating worksheets before, but in a task I got asked to do last week, I took a slightly different approach.  This approach allows consolidating worksheet print areas to grab the data that I need.

Background

In the previous post, I used some custom M code to reach outside the current workbook, and attach to the cached copy of the same workbook to read the data.  It works, but it’s a bit ridiculous that you need to.  One of the problems with that approach is that new sheets don’t show up in your solution until you save the workbook and refresh the query.  Not a big issue if you safeguard against it, but it’s a gotcha you want to be aware of.

In the situation I was working through, I wanted to consolidate some of the worksheets in the workbook, and also wanted to “future proof” it for new sheets that would be created later.  But I also wanted to make sure that all sheets would be pulled into the solution, even if they’d been newly created and the workbook hadn’t been saved.

So my data (which you can download here) looks like this:

image

Yes, that’s right.  It’s pivoted, even though it’s not in a Pivot Table.  And I’ve got one sheet per month.  Naturally we’d like to turn this into a proper data source.

My Approach

I decided to go seriously hardcore, and just went right into a blank query:

  • Power Query –> From Other Sources –> Blank Query

Not much to see yet (it is a blank query, after all) but I typed the following into the formula bar (which you can activate from the “View” tab if it’s not there.)

  • =Excel.CurrentWorkbook()
  • Enter

SNAGHTML1be24864

And check that out… I set some print areas!

What is in Excel.CurrentWorkbook()?

The Excel.CurrentWorkbook() command (and yes it is case sensitive) will give you a list of all tables, named ranges and power queries that are stored in your workbook.  And, as it happens, I can take advantage of this.  Because the print range is a named range, I can employ that to pick up my data.

Now, to be fair, I TOTALLY wish that Excel.CurrentWorkbook() also included all the raw worksheets.  That would be awesome.  You can get them by going to Excel.Workbook(File.Contents(full_file_path)) but that means it grabs them from the last saved version of the file in the workbook path.  Not ideal if your data is “live”.

At any rate, I’m good now.  So long as my users copy the previously set up sheet each month, and rename it to the current month, it will inherit the print area.  I now have something I can work with.

Consolidating Worksheet Print Areas

So let’s make use of this…

Something we need to recognize is that – as soon as we load this query to the worksheet – we are going to get a query and a table showing up in this list.  So let’s protect against that.  Click the Filter area on the “Name” column, and filter to records that contain “Print_Area”.

image

Now I don’t actually need the worksheet name for anything here, so I can

  • Select the Name column –> Right Click –> Remove
  • Click the Expand button (at the top of the the Content column)
  • Un-check the option to use original column name as prefix and click OK

And I’m left with this:

SNAGHTML1c49ca60

So now it’s just time for cleanup.  Let’s do that:

  • Go to Add Column –> Add Custom Column
  • Call the column Date and enter the following formula:
    • if Text.Start([Column1],5) = "Sales" then [Column3] else null
  • Select the Date column –> Transform –> Fill Down
  • Go to Home –> Remove Rows –> Remove Top Rows –> 2
  • Go to Transform –> Use First Row as Headers
  • Rename:
    • “Column1” to “Major Group”
    • “Column2” to “SubClass”
    • “Column7” to “Date”
  • Select the “Date” column –> Transform –> Data Type –> Date
  • Select “Major Group” column
    • Go to Transform –> Fill Down
    • Filter to only rows that “does not contain” “Sales”
  • Select “SubClass”
    • Filter to remove “null” records
  • Remove the “Total” column
  • Select the “Major Group”, “SubClass” and “Date” columns
  • Go to Transform –> Unpivot Columns –> Unpivot Other Columns
  • Rename
    • “Attribute” to “Category”
    • “Value” to “Units Sold”
  • Rename the query to “Sales” and load it to the worksheet

Create a Simple Pivot

Now that we have our data, we can create a simple Pivot Table from it:

image

Which is great, but what about next month?  Let’s see what happens…

  • Right click the “Feb Sales” worksheet –> Move or Copy
  • Check the “Create a copy” checkbox and click OK
  • Change C1 to 3/31/2015

Yuck… we’re simulating that we can’t trust our users to even rename the worksheet properly, but the do get the data right.  Now let’s see how our query reacts…

Go back to the Power Query worksheet, refresh the query, and the PivotTable if necessary…

image

Awesome!

What’s Safe?

This worked because we only have the print area defined for our input sheets.  If someone added a print area on other worksheets we’d have to add additional/adjust our logic to deal with that, as we’d get data in our query that possibly shouldn’t be there.

If someone adds something outside the print area, then you have an issue.  In those cases a table may be a better option to start with.  And naturally tables also show up in the Excel.CurrentWorkbook() call.  Having said that, it’s harder to control the default names there, so that could be an issue for you.

If someone modifies the shape of a range, that could also be a problem.  Say a new column gets added or something.

At the end of the day you need to think through an anticipate the scope of what your users are likely to do, and what shape the data will likely stay in (or not) so that you can plan accordingly.  In the scope of the solution that I built, I’m fairly certain it will work without a challenge.

Combine Multiple Excel Workbooks in Power Query

I got a comment on a previous post today, which made me realize I’d promised this but never posted it.  So let’s look at how to combine multiple workbooks together in Power Query, providing they have the same format.  Yes, it’s been covered before, (even linked to in the comments of the previous posts,) but I’m going to put my own flavour on it.  By the time we’re done, you’ll see how similar it is to working with non-Excel files.

Background

For our example we’re going to assume that we have four (or more) Excel files which you can download here.  I’ve stored in a folder called “Data”, which is a subfolder of the “Combine Workbooks” folder (more on why I store them in a subfolder a little later.)

Each file has a similar structure, which looks like this:

SNAGHTML1984743a

Notice that no one has bothered to set up a table style or anything, they are just raw worksheets of data.  Having said that, they are consistent in the fact that

  • The data starts in row 5
  • Each files is set up across same number of columns
  • The column headers and data types are consistent across files

(Just as a quick note, if they DID have tables set up, that would be okay too.  I’m just demoing that it isn’t necessary.)

The End Goal

The end goal we’re after is fairly common.  We basically want to grab the data from each file, strip out the first 4 rows, and append the tables together (without repeating header info.)  This will essentially create a nice data source that we can use in PivotTables, charts and other tools.

The Process

My preferred tool to combine multiple workbooks into one data source – where it used to be VBA – is most definitely Power Query today.  And here’s the steps we need to put together to make it work.

  1. Import a single workbook
  2. Convert it to a function
  3. Import all file contents (using our function)
  4. Combine all the data

Step 2 does involve a VERY minor manipulation of M code today, but as you’ll see it’s quite easy.

Let’s Combine Multiple Excel Workbooks

Step 1: Import a single workbook

To begin we’ll go to the Power Query menu and choose:

  • From File –> From Excel –> Sales-July2014.xlsx
  • Select Sheet1 and choose to Edit it

You’ll now see your query showing in the Query Editor:

SNAGHTML199ca279

We’ll need to do a bit of cleanup here to get the data just the way we need it:

  • Home –> Remove Rows –> Remove Top Rows –> 4 –> OK
  • Transform –> Use First Row As Headers
  • Select InventoryID and SalesPersonID –> Transform –> Data Type –> Whole Number
  • Select Cost, Price and Commission –> Transform –> Data Type –> Decimal Number
  • Select Date –> Transform –> Data Type –> Date
  • Select Date –> Home –> Remove Errors

That last one might be a bit odd, but I like to do that to my date columns.  The reason is that this protects me when I stack another table and it has headers.  I know that converting text to a date format will throw an error, so I’m guaranteed that any subsequent header rows will be nuked out.

At this point we’ve got a nice tidy import that would look pretty good if we decided to land it in a workbook.

Next up…

Step 2: Convert it to a function

Converting our nice query to a function is actually SUPER easy.  To begin, while still in the Power Query editor, we need to go to the View tab and click Advanced Editor.

When we do, we’ll see code similar to this, with the highlighted portion being the most important part:

SNAGHTML19b495d2

Okay, now follow carefully:  Right before the let statement at the very beginning, type:

(filepath)=>

The ()=> indicate to Power Query that this is a function, not a regular query.  And the “filepath” is the name of a parameter that we want to pass to the function.

The second part is that we want to replace the entire hard coded file path highlighted in yellow in the image above – including the quotes – with the name of our variable.  When we do, the lead three lines should look like this:

image

That’s all the code editing you need to do.  Let’s finalize this.  Click Done.  At which point you’ll see this:

image

No too inspiring or exciting really, but it IS what you want.  Final thing to do here is give the function a better name than Sheet1.  I’m going to use “fnGetContents”.  Choose your name, replace Sheet1 in the name box, then click File –> Close and Load.

If the query shows up in your Queries Pane saying “Load is Disabled”, don’t freak out.  That’s just a really poorly worded message that is indicating it is only a connection.  It will still work.  :)

You’ll also notice that, even though we imported this file, NOTHING has landed in any worksheet.  Huh?

Oddly enough, that’s as designed…

Step 3: Import all file contents (using our function)

All right, now it’s show time.  Let’s make some magic happen.  Let’s go get all of the files we need:

  • Go to the Power Query tab –> From File –> From Folder
  • Browse and select the folder that holds the data files
  • Click OK

You’ll now end up in the Query Editor and be staring at this:

SNAGHTML19d697a1

Now, normally we’d be tempted to click the double arrows beside the Content header to combine them all… but that doesn’t work here.  (It just returns the first file when we’re working with Excel files.)  So this is where we need to resort to our function.

  • Go to Add Column –> Add Custom Column
  • Enter the following:  fnGetContents([Folder Path]&[Name])
  • Click Enter
  • Right click the new “Custom” column –> Remove Other Columns

Almost done!

Step 4: Combine all the data

You should now be seeing:

image

Click the little double headed arrow, turn off the “Use original column name as prefix”, and click OK.

You should now have a nice table of data.  Give it a name, click Close and Load and you’re done!

Final Thoughts

Working with Excel files is pretty easy once you realize how easy creating that function really is.

One caveat though… If you store your “consolidating” workbook in the same folder as your data files, your fnGetContents() function will throw an error.  The reason for this is that your consolidation workbook will also get pulled in to the query, as well as the temp file version (prefixed with a ~).  You’ll need to filter out both files.

Slicers For Value Fields

Earlier this week I received an email asking for help with a Power Pivot model.  The issue was that the individual had built a model, and wanted to add slicers for value fields.  In other words, they’d built the DAX required to generate their output, and wanted to use those values in their slicers.  Which you can’t do.  Except maybe you can…  :)

My approach to solve this issue is to use Power Query to load my tables.  This gives me the ability to re-shape my data and load it into the data model the way I need it.  I’m not saying this is the only way, by any means, but it’s an approach that I find works for me.  Here’s how I looked at it in Excel 2013.  (For Excel 2010 users, you have to run your queries through the worksheet and into Power Pivot as a linked table.)

Background

The scenario we’re looking at is a door manufacturer.  They have a few different models of doors, each of which uses different materials in their production.  The question that we want to solve is “how many unique materials are used in the construction of each door?”  And secondarily, we then want to be able to filter the list by the number of materials used.

The first question is a classic Power Pivot question.  And the setup is basically as follows:

image

  • Create a PivotTable with models on rows and material on columns
  • Create a DAX measure to return the distinct count of materials:
    • DistinctMaterials:=  DISTINCTCOUNT(MaterialsList[material])
  • Add a little conditional formatting to the PivotTable if you want it to look like this:

image

The secret to the formatting is to select the values and set up an icon set.  Modify it to ensure that it is set up as follows:

image

Great stuff, we’ve got a nice looking Pivot, and you can see that our grand total on the right side is showing the correct count of materials used in fabricating each door.

Creating Slicers For Value Fields

Now, click in the middle of your Pivot, and choose to insert a slicer.  We want to slice by the DistinctMaterials measure that we created… except.. it's not available.  Grr…

image

Okay, it’s not surprising, but it is frustrating.  I’ve wanted this ability a lot, but it’s just not there.  Let’s see if we can use Power Query to help us with this issue.

Creating Queries via the Editor

We already have a great query that has all of our data, so it would be great if we could just build a query off of that.  We obviously need the original still, as the model needs that data to feed our pivot, but can we base a query off a query?  Sure we can!

  • In the Workbook Queries pane, right click the existing “MaterialsList” query and choose Edit.
  • You’ll be taken into the Power Query editor, and on the right side you’ll see this little collapsed “Queries” window trying to hide from you:

image

  • When you expand that arrow, you’ll see your existing query there!
  • Right click your MaterialsList query and choose “Reference”.

You’ve now got a new query that is referring to your original.  Awesome.  This will let us preserve our existing table in the Power Pivot data model, but reshape this table into the format that we need.

Building the Query we need

Let’s modify this puppy and get it into the format that will serve us.  First thing, we need to make sure it’s got a decent name…

  • On the right side, rename it to MaterialsCount

Now we need to narrow this down to a list of unique material/model combinations, then count them:

  • Go to Add Column –> Add Custom Column
  • Leave the default name of “Custom” and use the following formula:  [model]&[material]
  • Sort the model column in ascending order
  • Sort the material column in ascending oder

We’ve not got a nicely ordered list, but there’s a few duplicates in it.

SNAGHTML4800f6be[4]

Those won’t help, so let’s get rid of them:

  • Select the “Custom” column
  • Go to Home –> Remove Duplicates

Now, let’s get that Distinct Count we’re looking for:

  • Select the “model” column
  • Go to Transform –> Group By
  • Set up the Group By window to count distinct rows as follows:

image

Very cool!  We’ve now got a nice count of the number of distinct materials that are used in the production of each door.

The final step we need to do in Power Query is load this to the data model, so let’s do that now:

  • File –> Close & Load To…
  • Only create the connection and load it to the Data Model

Linking Things in Power Pivot

We now need to head into Power Pivot to link this new table into the Data Model structure.  Jump into the Manage window, and set up the relationships between the model fields of both tables:

image

And that’s really all we need to do here.  Let’s jump back out of Power Pivot.

Add Slicers for Value Fields

Let’s try this again now. Click in the middle of your Pivot and choose to insert a slicer.  We’ve certainly got more options than last time!  Choose both fields from the “MaterialsCount” table:

image

And look at that… we can now slice by the total number materials in each product!

image