Data From Different TimeZones

A friend of mine emailed yesterday asking how to compare data from different timezones.  With how good the UI is in Power Query, you’d think this would be easy.  Unfortunately it’s a bit less than that, so I thought it would make a good example for today’s post.

Background

Let’s assume that we’ve got two columns of data; an Order Date and a Shipping Date.  We’d like to work out the number of days it took to ship our order.  Easy enough, we just need to subtract one from the other… except… the system that holds the Order Date reports it in UTC +0:00, and the shipping date is done from my home time zone (UTC –7:00).

The data table we’re starting with looks like this:

image

And you can download a copy of the workbook from my OneDrive here if you’d like to follow along.

Avoiding Temptation

So the first thing to do is pull the data in to Power Query.  So I clicked in the table, went to the Power Query tab, and chose From Table.  At this point we’re greeted with a nice table, and our first temptation is to go directly to the Transform tab and set the Data Type to Date/Time/Timezone:

image

And herein lies a problem.  The system has forced my local TimeZone on the data.  As specified in the initial problem, I need this to carry a UTC +0:00 distinction.

It’s a shame that there is no intermediate step here (how often do I ask for MORE clicks?) which allowed you to specify WHICH TimeZone.  If you’re into working with data from different regions (I.e. this feature), I’d don’t think I’m venturing out on a limb to say that this is pretty important.

To further complicate things, that is the extent of the TimeZone functionality in the UI.  And that’s not going to help us.  So let’s knock off the “Changed Type” step and look at this another way.

Using M to Deal with Data From Different TimeZones

The secret to making this work is to take explicit control of the time zone settings using some Power Query M functions.  It’s not as hard as it sounds.  In fact, we’re only going to use two in this example:

  • DateTime.AddZone to add a time zone to a DateTime data type
  • DateTimeZone.SwitchZone to convert from one time zone to another

I discovered both of these functions by searching the Power Query formula categories article on Microsoft’s site.

Forcing a DateTime to a Specific Time Zone

So we’re currently looking at this data in Power Query:

image

Let’s create a new column to convert the OrderDate:

  • Add Column –> Add Custom Column
    • Name:  Order Date (UTC +0:00)
    • Formula:  =DateTime.AddZone([OrderDate],0)

The secret here is in the last parameter, as we get to specify the time zone.  Since we know these dates/times come out of our system in UTC +0:00, we’re good to not add anything to it.  The result is shown below:

image

Converting a DateTime to a Different Time Zone

Now, in order to be able to compare our DateTimes easily, we want them both to be based in our own time zone.  Since my business works in UTC –7:00, I really want my Order Date represented in that time zone as well.  So let’s convert it.

  • Add Column –> Add Custom Column
    • Name:  Order Date (UTC -7:00)
    • Formula:  =DateTimeZone.SwitchZone([#"OrderDate (UTC +0:00)"],-7)

SNAGHTML1f066319

Beautiful.

Just a note here… It may have been tempting to force this data to UTC –7:00 when we added the time zone above, but that would have assigned the date based in the wrong time zone.  I.e. our first record would have returned 7/4/1996 1:12:00 PM –07:00, which is not the same as what we ended up with.

Forcing another DateTime to a Different Time Zone

Now we need to deal with the ShippedDate column, forcing that to my local time.  I could just select the column and turn it into a Date/Time/Timezone data type, but I won’t.  Why?  What if I send this workbook to another user?  It will return THEIR time zone, not mine.  And that could be different.  Much better to explicitly set it.

  • Add Column –> Add Custom Column
    • Name:  ShippedDate (UTC –7:00)
    • Formula:  DateTime.AddZone([ShippedDate],-7)

Notice that this time we do force it to be in the –7 time zone, as these DateTimes originated from that time zone. The result:

SNAGHTML1f0e9e46

Fantastic.  We’ve added time zone data, without changing the original times.

Let’s just go do a little bit of cleanup now:

  • Select the OrderDate and ShippedDate columns
  • Transform –> Data Type –> Date/Time
  • Select OrderDate (UTC +0:00) through ShippedDate (UTC –7:00)
  • Transform –> Date Type –> Date/Time/Timezone

Excellent.  Now they should show up correctly when we load them to an Excel table instead of losing their formatting.

Making Comparisons

We’re at the final step now: Working out the time to ship.  This is relatively straight forward:

  • Add Column –> Add Custom Column
    • Name:  Days to Ship
    • Formula:  [#"ShippedDate (UTC -7:00)"]-[#"OrderDate (UTC -7:00)"]
  • Select the Days to Ship column
  • Transform –> Data Type –> Duration

Note:  You can just double click the column names in the formula wizard and it will put the # characters in there for you.

And the final look in Power Query:

SNAGHTML1f1602fa

With that all complete, the final step is to give the query a name (I chose ShippingTimes) and load it to a worksheet:

image

Final Thoughts

Personally, I like to take explicit control over my data types.  Call me a control freak if you like (I’ve been called much worse) but relying on implicit conversions that set to “local time” scare me a bit, particularly if I’m going to be sending my workbook off to someone who lives in a different zone than I do.  Once you know how to do this it’s not super difficult, and I now know EXACTLY how it will represent on their side.

I’ll admit also that I’m a bit disappointed in the UI for datetime conversions.  To me, anyone playing in this field needs very granular control over every column.  An extra step in the Transform to Date/Time/Timzone step would go a long way to solving this, as you’d be able to skip writing custom formulas.  Hopefully that’s on the Power Query team’s radar for the future, as well as a full datetime menu that would allow us to easily choose from/add/convert to the majority of the formulas found in the article referenced above.

Power Query Training

Also don't forget.  If you love Power Query or are intrigued by the things you can do with it, we have an online training course coming up soon.  Check it out and register at www.powerquery.training/course

What Power Query Functions Exist?

I know that this topic has been covered before by others, but I think it’s still pretty valuable for a user to be able to figure out what Power Query functions exist, especially since they are often different than what we’re used to in Excel.

NOTE:  This article was updated 2015-05-20 at the request of a reader to include more coverage on implementing the discovered function into the solution.

 

Power Query Functions Documentation on the web

There’s a pretty good resource site available on the Microsoft Support site.  Personally I have that one bookmarked and head over there often when I’m looking for a new function to do something.  I find that with a quick CTRL + F on the page, I can quickly search and narrow in on the function I think I need in order to learn it’s syntax.

To be fair, I’m not always in love with the actual examples (many lack a power query UI view), but overall the site is fairly useful.

Power Query Functions Documentation in the client

Now that’s all good, but what if you’re working on a plane with no WIFI, and you need to figure out the syntax for a new function?

As luck has it, there is a way to pull up the list for most functions right in the client.  To do this, I:

  • Clicked Add Column –> Add Custom Column
  • Typed a 1 and clicked OK
  • Went to the Power Query formula bar and typed the formula below.  (Notice that this is case sensitive)

 =#shared

(Why the custom column? Because typing in the formula bar replaces the previous step, and I want to be able to revert to that since it’s part of my logic:

image

Now, you’ll see you get a list of (almost) all the functions that you can access:

SNAGHTML894ffc53

Now, let’s assume I’m trying to find a formula to remove certain characters from a text string.  I really need to search for “Text.”, but there isn’t a search option.  No big deal, let’s convert this list into a table:

SNAGHTML89522a10

Once we’ve done that, we get a nice table of all of the functions, and we can filter them to our heart’s content.  Here’s my table filtered down to just rows that begin with “Text”:

SNAGHTML897b95c3

And a page or so down, I found something that looks like it might work:  Text.Remove.

Investigating the Function Syntax

I clicked on the green Function beside the Text.Remove entry.  It pops up an Invoke Function box, and behind that is the syntax for how it’s supposed to work.  So that’s pretty cool.  I tried it out with some text, as shown below:

txtremove1

Clicking OK returned the following:

txtremove2

Now this is a bit… weird… and frustrating. Value? Why Value? (I actually don’t know why, you’d think it would have been the function name, wouldn’t you?)

I stepped back to the Value step of the query, as I wanted to look at the syntax page that popped up behind the Invoke Function dialog:

image_thumb.png

 

My only complaint here is that once you land in this window, the only indicator of the actual function name is in the smallest font on the page, buried in the middle. You’d think that the name would should up a little more prominently. Regardless, I copied the name of the function, then stepped back to the Invoked FunctionValue step and replace Value in the formula bar with the function name:

txtremove3

Perfect, it works.

Implementing the Function in the Solution

Now let’s see if I can get it into my original query. To do that I:

  • Copied that entire line of M from the formula bar,
  • Selected the Source step (I wouldn’t be able to do this if I had typed #shared while I had the Source step selected originally),
  • Choose to Add New Column –> Add Custom Column –> Accept the inserted step,
  • Pasted the copied M into the formula area, and
  • Replaced the original text (“My –Dog –Has –Fleas”) with the name of the appropriate column from my data set.

Visually, it looks like this:

txtremove4

 

And then I checked the query to see that it worked:

txtremove5

 

Cleanup

Now that I’ve been able to explore the functions and found and implemented the one I’d like to use, I can just knock off the extra steps shown below in yellow, returning me back to my next step:

txtremove6

 

 

 

Learning more about Power Query functions

For reference, this is one of the many things that Miguel and I will be covering in our upcoming Power Query training workshops.  Learn more about the workshop and register here:  http://powerquery.training/course/

Announcing Online Power Query Training!

I’m really pleased to announce that a new project I’ve been working on is live: powerquery.training, a website that offers online Power Query training!

online Power Query training

PowerQuery.Training

About PowerQuery.Training

PowerQuery.Training is a joint effort between Miguel Escobar (of PoweredSolutions.co) and myself: the two guys who are writing that Power Query book – M is for Data Monkey book.

M is for Data Monkey

M is for Data Monkey

What you can find at PowerQuery.Training

We believe that Power Query is a super important tool in the toolbox of every Excel user out there.  It's so easy to use, and so powerful, that we think everyone should know about it.  To that end, we've decided to include the following two areas on the site:

Power Query patterns

One of the areas of focus is to showcase Power Query patterns (techniques).  These patterns are well illustrated articles with supporting workbooks, which will help you build solutions for common scenarios.  There will be more coming as we have time to add them, but today you'll already find:

Online Power Query Training

We know that not everyone wants to read pages of documentation, trying to figure out how to use a new technology.  Some people want to carve out a block of time, link up with an instructor, and be taught the basics, and how to avoid the inevitable pitfalls.  That's our aim with our online Power Query training workshops.

That’s right, there will be live online offerings in order to help get you skilled up on Power Query without ever leaving the comfort of your own office!  All you need to do is dedicate 3 days of your time – well okay… and some space on your credit card – and you’ll be off to the races with this software, taming and automating the cleanup and refresh of your data.

Interested in seeing what the course covers?  Download the Course Agenda here.

Load Power Query directly to Power Pivot in Excel 2010

One of the cool features in Excel 2013’s Power Query is being able to load to the Data Model (PowerPivot) directly.  But Excel 2010 doesn’t appear to have this feature.  Interestingly, you can still load Power Query directly to Power Pivot in Excel 2010, it just takes a bit of a careful workaround.

Let’s look at the required steps

Step 1: Create Your Connection

First, I’m going to load in the content of a text file.  So I:

  • Go to Power Query –> From File –> From Text
  • I browsed to the file I needed, and imported it into Power Query
  • I do whatever cleanup is needed and name the query Sales
  • Next, we go to the Home tab –> Close and Load –> Close and Load To…

And here’s the important part:

  • Choose “Only Create Connection” –> Load

And I’ve now got a basic connection to my sales table without landing it in a worksheet:

image

Step 2: Grab the Connection String

Now, here comes the secret.  We need to get the connection string that Excel uses to connect to the Power Query.  Here’s how:

  • Go to the Data tab –> Connections

In there, you’ll see the name of your new connection:

image

  • Select your Query and click Properties
  • Click the Definition tab

Now you’ll be looking at something like this:

image

Notice that this query is actually an OLE DB Query that is simply “SELECT * FROM [Sales]”  That seems easy to work with.  But the key for us is the connection string shown (#2 in the image above).

  • Select the ENTIRE connection string
  • Press CTRL + C to copy it
  • Click Cancel

Note:  Make sure you start at “Provider=” and highlight all the way to the end.  (It’s much longer than what you see in that little box.)

Load Power Query directly to Power Pivot

Finally, we’re going to pull this into Power Pivot.  To do this:

  • Go to the Power Pivot tab –> PowerPivot Window
  • From Other Sources –> Others (OLEDB/ODBC) –> Next

image

  • Name your table
  • Paste your Connection String in the box

image

  • Click Next –> Next –> Finish –> Close

And voila!  We have our Power Query linked directly into Power Pivot in Excel 2010!

image

Just remember… if you do this, NEVER modify this table in Power Pivot.  Always go back to modify the table in the Power Query stage.  Failure to do so could set the table into a non-refreshable state.

Native Database Query Security Prompts

In a comment on my Parameter Tables post, Talha asks why we get prompted with a Native Database Query message each time the value of the parameter used in a SQL query changes:

I had a question. The value I grab from my parameter table, I store it in a variable and use it in my SQL query. However, every time I change the parameter, I get the following prompt:

------------------------------------------------

NATIVE DATABASE QUERY

You are about to run a native database query. Native queries can make changes to the database. Would you like to run the following query?

------------------------------------------------

Is there any way I can make this go away?

To answer this, it helps to understand why this message comes up.

Replicating The Issue

When you connect to a SQL server database in Power Query, there are three parts to the initial dialog:

image

Server and Database are pretty straight forward, but the SQL Statement is marked as optional, and is actually hidden under the arrow at the left.  Now, for anyone who has written SQL in the past, you open up that little window, pop in your SQL and click OK.  And you’ll get prompted with a message about trusting a Native Database Query.

On the other hand… if you DON’T fill in a SQL statement, you’ll be taken to a list of all tables and views in the database that you have access to, and will be able to bring those in to start working with them.  Do that, and you’ll NEVER SEE THIS PROMPT.

Why Prompt About a Native Database Query?

A native database query is basically SQL code that you provide to pass to the database.  Power Query treats this as potentially dangerous, so gives you a warning when you do this.  And every time you change the underlying SQL query (pass one parameter to it to change it in some way), the Query is seen as “new” and again, potentially dangerous.

I tend to agree that this is a bit overkill.  I’m not a SQL expert, by any means, but if the query is just a SELECT query, then I’m not sure why we need to be warned about it.  You’d think that would be easy enough for the PQ team to parse out, looking for keywords like ALTER, DROP, UPDATE, DELETE and such, but regardless, that’s the way it is.

Avoiding the Error

Now that we know what is doing it, how do we avoid it?  The answer is actually more simple, but harder to swallow than you might like. Don’t use the Native Database Query functionality!

I really struggled with this at first.  I saw Power Query as a “SQL Sever Management Studio Light” for Excel pros.  It’s not.  It’s totally different. What the PQ team has built is a way for non-power users to be able to query, clean, restrict and reshape their data.  And if you do it their way, then you won’t get this irritating prompt.

In the case of grabbing data from SQL Server, you’d simply connect to the server, grab the table/view you want, then start setting your filters in Power Query to restrict down the data you want.  The UI makes this easy, and it’s a lot easier to tweak the data to get it right than forcing an Excel person to go back and tweak their SQL.

And here’s the cool part… once you’ve done this, Power Query takes advantage of a technology called “Query Folding”, which folds up all the reshaping commands you’ve issued.  It passes those to the SQL server as a single command to bring back your data.  In layman’s terms (although not perfectly technically correct), you can look at this as if Power Query is allowing you a graphical way to write your SQL for you behind the scenes, then pass it to the server.  And because Power Query knows it won’t do anything dangerous, it doesn’t prompt you about potential problems.

What About Performance?

A few months ago I asked someone at Microsoft this question… the thrust of my question was if Power Query would be more or less efficient than passing an optimal SQL query.  His reply was that no, if someone was a SQL Ninja, they could certainly come up with something that performed better in some situations.  Having said that, for the majority of Excel pros, Power Query will do a better job, as we don’t tend to be SQL optimization experts.

Further to this, the very worst thing you can do for performance is provide a base query then try to do further filtering.  Look at these three cases:

Case 1 Use Power Query to:
Connect to table Transactions
Filter to Dept <> 150
Case 2 Native Database Query:
Select * From Transactions Where Dept <> 150
Case 3 Native Database Query:
Select * From Transactions
Use Power Query to filter:
Dept <> 150

Case 1 is the preferred method when dealing with Power Query.  It will roll both steps up into a single statement, and retrieve the requested set from the server.

Case 2 returns the same data as Case 1, but prompts you with a Native Database Query on each computer you run the solution on.  (You can’t even save the connection with the workbook, it’s a user specific trust flag.)

Case 3 is the worst performing.  The reason is that the initial table has to be brought to Power Query in it’s entirety, and THEN you can start filtering.  Why?  Because Power Query cannot query fold it’s commands into the SQL you provided.

Recommendation

Even though connecting to a table, then re-creating all the filtering steps seems less graceful at first, this is the method I’d recommend you take.  In fact, I’ve actually made it a habit to remove any Native Database Queries from project I wrote in the past, replacing them with the recommended method.

To me this offers two benefits:

  1. No security prompting.  May seem minor, but when you put a Power Query error in front of non-tech users, they tend to freak out.  Why put them through that?
  2. It makes the query easier to follow/tweak when you need to debug it later.  And let’s face it, sooner or later you’ll need to do this.

Use Dynamic Ranges in Power Query

This was a great question that came up in my M is for Data Monkey session at the Amsterdam Excel summit:  Can we use dynamic ranges in Power Query?

If I recall correctly, this was asked by Gašper Kamenšek, after I demonstrated how to use the =Excel.CurrentWorkbook() method to consolidate worksheet tables.  So we gave it a shot to find out.

Setup:

Nick Hodge jumped up and built us a quick little dynamic named range solution that looked like this:

SNAGHTML21f653dd

And defined a named range call “Animals”, as follows:

=OFFSET(Sheet1!$A$4,0,0,COUNTA(Sheet1!$A$4:$A$12),2)

The next step was to pull it into Power Query, but I didn’t have a Table or normal Named Range to pull from, so I went to Power Query –> From Other Sources –> Blank Query.

The Normal Piece

As I’d shown earlier, I typed the following in the formula bar in the Power Query editor:

=Excel.CurrentWorkbook()

And at this point I was pretty chuffed as it looked pretty straight forward:

image

This is exactly what I was hoping I would see.  Fantastic!  So I did a little normal cleanup:

  • Expanded the content
  • Moved the first row to a header
  • Deleted the “Animals” column that was carried down the table

I then proudly announced that “Yes!  It works!”, clicked Close and Load to land it in a table… and it failed:

image

The Data Fun House of Mirrors

That was a bit of a shock.  It worked in the Query window, but when loading it failed.  There’s something not adding up here.  So I jumped back in to edit the query:

I was immediately presented with this:

image

Doesn’t make sense… so I started to step through the process. I clicked on the Source step:

image

Aha!

Use Dynamic Ranges in Power Query

What I forgot was that I’d used =Excel.CurrentWorkbook().  That  function returns a table which contains all tables, connections and named ranges in the workbook.  And when we create a new Power Query, it adds an output table and a connection.  So we basically got a circular reference.

No problem.  We can easily deal with this.  All we need to do is filter down to the Animals table that we need.  (That way we don’t have to worry about any new tables being created.)

So I added that filter immediately after the Source step.  And boom, everything works again because it is restricted to just the table I originally had.  And now, when I commit it the the worksheet, it loads:

SNAGHTML229c0b05

Proof Positive

Perfect.  It loaded, now lets just go add some data and make sure it’s working properly at update.  I added a new record to the table:

SNAGHTML22839637

And voila!

SNAGHTML22a09947

So it looks like we can use dynamic ranges in Power Query without any issues.

Alternate Method

When I was writing this up, I couldn’t replicate the error at first.  The method I used was just slightly different.  Here’s what I did instead:

  • Pulled in the table
  • Clicked the green Table to drill into it
  • Promoted headers
  • Landed it

It worked perfectly at update.  Why?  Because I drilled into a specific table, avoiding the issue at update.  And in all honesty, it’s probably a better method. 😉

Looking for Power Query samples

How would you like to make your data famous?  Or at least – how would you like a solution built for your own data using Power Query… for free?

I have a couple of Power Query related things I’m working on right now, like these:

Now, if you’ve ever been to one of my courses, you’ll know that one of the most important things to me is to use data that is modelled on real world examples.  That’s right, no canned AdventureWorks samples.  I like data that real users face in the real world.

To that end, I’m looking for Power Query samples.  Specifically, for sample files of data to use as examples for my Power Query blog posts, courses and other educational products.  And while I’ve got lots of great data sets of my own, I’m particularly interested in stuff that I haven’t seen or demonstrated on the blog before.

So if you’ve got a nasty file that’s painful to import, or you have a few files that you’re having trouble combining, or you just can’t figure out a starting point with Power Query, I’d like to have a crack at your scenario.  It’s super easy… you email me the file, I’ll take a look at it, and cook up a solution.  And any new or interesting techniques that are needed become the subject of a future blog post, article, or even sample in the courses I teach.

I hope it goes without saying, but for legal reasons I’m going to say it anyway… any files you send me can and will be used in public.  In order to use them in my materials, by sending them to my you are, by default, assigning me copyright to those specific source files.  Because of that, it is VERY important that the data you include is not sensitive.  To be fair, I’m not interested in your data specifics, as much as the form and shape of it, how it starts out, and what the end product should become.

“Okay, that’s great for you, Ken, but what do we get out of sending you our files?”

How about a working solution?  Does that sound reasonable?  You send me your data, and I send you back something that uses Power Query to complete the goal?

If you’re interested (and who wouldn’t be), here’s what you need to include:

  • A sanitized copy of your source data.  (i.e. Do a mass find/replace on employee names/numbers, phone numbers, addresses and such.  If you have particularly troubling data that is interesting, I may be willing to even help here.)
  • A copy of what you want the output to look like.  (I’ll build to that end goal.)
  • A summary of any “rules” you use along the way to determine if something needs to change one way or another.  (I’d rather not spend my time guessing about your end goals.)
  • Your direction as to whether you’d like your/your company name included or withheld should I decide to publish a post using your data.

In return for that, I will send you a copy of the solution once I’ve crafted it.  This may happen days/weeks before any solution is posted publicly, or it may come in the form of an email to let you know that the post is live, at my discretion.

Please be aware that I also reserve the right to NOT use your submission at my discretion.  This could happen if my inbox gets overloaded with requests, if I’ve already dealt with your scenario on the blog, if your data would be better handled in another fashion, if I get too busy with paying projects, if the data is just too difficult to work with, or any other reason that I haven’t listed. (I’ll do my best, but this is free work, so it’s totally at my schedule and discretion.)

If that sounds interesting to you, my email is ken at excelguru dot ca.  I’m going to limit this to the first 10 data sets that land in my inbox, so if you want to try and take advantage of it, act quick!

Create Running Totals in Power Query

I was presented with an interesting problem last week; how to create running totals in Power Query.  There doesn’t appear to be a built in method to do this, so I thought it would make an interesting use case study.

Background

Let’s take a look at the following table, which can be found in the sample file found here (if you’d like to follow along.)

SNAGHTML1ec4a255

Now, we could certainly use a Pivot Table to create running totals, but consider the scenario where we want to preserve the original list.  Pivot Tables aggregate data and apply running totals to the aggregation, so that might not suit our needs.  (In the case of the table above, it would aggregate both records for 1/2/2015 into a single line.)

We could also set up a formula on our table that summed the row about plus the row to the left, for example.  Of course that means you MUST land it in the worksheet, which may not be ideal either.

This solution is entirely possible to solve with Power Query, but with no one click feature in the user interface, how do we pull it off?

Creating the Staging Query

The first thing I did is create a query to pull in my data source and land it in a “Staging” table inside Power Query, as per this post.  To do that I:

  • Selected a cell in the table
  • Went to Power Query –> From Table
  • Selected the Date column –> Transform –> Data Type –> Date
  • Went to Add Column –> Add Index Column –> From 0

At this point the query looks like this:

image

Next I finalized the staging query.  I:

  • Changed the table name to “Staging-Sales”
  • Went to File –> Close and Load to… –> Only Create Connection
  • Right clicked the query in the Workbook Queries pane –> Move to Group –> Staging

Building a Running Total function - Groundwork

With the staging query in place, it was time to tackle the real problem; the method to create running totals in Power Query.  That work requires a bit of down and dirty M work, and here’s how I approached this to build up my end solution:

  • Right click the “Sales” staging query and choose Edit

This will land you back in the Power Query editor.  On the left side of the window, expand the vertical “Queries” window by click on the little arrow until you see this:

image

With this window expanded, we can begin creating our function:

  • Right click “Sales” –> Reference
  • Right click “Sales (2)” –> Move to Group –> Create Group –> “Functions”
  • Change the name from “Sales (2)” to “fnRunningTotal”

Things should now look as follows:

image

Perfect.  We now have the query set up the way we need to take our next steps.  Right now it’s just pointing to our Sales query and pulling the data in, but that will change.

From a logic point of view, here’s what I want to happen:

  1. I want to take a row from the Sales table
  2. I want to send a copy of the table to my function
  3. I want my function to remove all rows where the index is higher than the submitted row
  4. I want to sum the remaining values
  5. I want to return that total back to the calling row

So let’s start working through that process… we’ll come back to steps 1 and 2 in a bit… let’s pretend those have already happened, and focus on the rest.

  • Select the Sale and Index columns –> Home –> Remove Columns –> Remove Other Columns
  • Click the Index column filter –> Number Filters –> Less than or equal to
  • We’ll pick an arbitrary starting point like 2 for right nowimage
  • Rename the “Sale” column to “Temp”

Now we’re going to take the final step with this data before we convert it into a function.  We are going to group it, so go to:

  • Transform –> Group By

Now, we need to tweak the default a bit to get it right.  The very first thing we need to do is remove the “Group By” level that groups by Temp, as we want to group all remaining rows together.  To do that, you click the – button beside Temp.

Next we need to set our data correctly.  We are going to set it to:

  • New column name:  RunningTotal
  • Operation:  Sum
  • Column:  Temp

And at this point, our output should look like this:

image

Building a Running Total function – Finalization

With this in place, we now need to turn this into a real function.  And to do that, we need to step into the M code itself.  So go to View –> Advanced Editor to do that.

Now, at the very top of the M code (before the let line), we are going to insert the following code:

(tbl as table, sumcolumn as text, rowindex as number) =>

This code will allow us to pass in the table, which column we want to sum, and which row we are working with when we call the function.

Next we need to plumb these variable into our code.  To do that we will:

  • Remove the “Source = Sales,” line of the M query
  • Replace every instance of Source in query (there is only one) with the name of our table: tbl

image

  • Replace every instance of “Sale” in the query with the name of the variable which holds our column name; sumcolumn.

image

Note that we do not need to wrap sumcolumn in quotes, as Power Query knows that it is text based on the parameter definition at the head of the function.

  • In the #”Filtered Rows” step, replace the hard coded 2 with the name of the variable which holds our index number; rowindex.

image

  • Place a comma at the end of the #”Grouped Rows” line (to make way for the next step)
  • Replace the last two rows with the following to return the single value back to the caller:

   RunningTotal = Record.Field(#"Grouped Rows"{0},"RunningTotal")
in
RunningTotal

When complete the function looks as follows:

image

And now click Done.  At this point, your query should flip to appear as a function.  This is evident in three ways:

  1. The icon in the left pane changes to an fx logo
  2. The table disappears and is replaced with the function parameters and an invoke box
  3. The Applied Steps collapse to a single line (I wish this didn’t happen)

Commit it by choosing File –> Close and Load

Create Running Totals in Power Query

So now… did it work?  Let’s find out.

  • Right click the Sales query in the Workbook Queries pane and choose Edit
  • Expand the queries pane on the left
  • Right click Sales –> Reference
  • Right click Sales (2) –> Move to Group –> Other Queries
  • Rename Sales (2) to “Landing”
  • Go to Add Column –> Add Custom Column
  • Call the column “Running Total”

And now we need to enter our formula.  The actual formula itself will be:

=fnRunningTotal(Source,"Sale",[Index])

Here’s how it breaks down:

  • fnRunningTotal is the name of the function
  • Source is the name of the previous step in the Applied Steps section.  This allows us to pass the entire table that is generated up to that point
  • Sale” is the name of the column we want to summarize.  The quotes here indicated that we are going after the column’s name, not the data in that column’s row.
  • [Index] is surrounded in square brackets as we want to pass the value for that row of the index column.  That is the row that is then used to filter down the data.

And the result, if you followed along exactly, is this:

image

So at this point, we can remove the Index column and call it done.  You’ll find a complete version here if you’d like to review it.

Final Thoughts

This looks like a very long and cumbersome route, but overall it’s actually not that complicated once you understand it.  The key points that I want to highlight here are:

  • We started with a simple Staging query
  • We used that as a reference and stepped through the process of filtering down the data for one record
  • We converted the new query to a function by:
    • Defining a line at the beginning to receive 3 variables
    • Integrated the variables into the code in place of hard values
    • Added an ending to feed back a single value
  • We used our function

The first line of the query is essentially just () => and any variables we need go between the brackets.  And while I declared there types, you don’t actually even have to do that.

The last line of code before the in is the trickiest part.  And here’s a secret… I just come back to a blog post and copy this part:

RunningTotal = Record.Field(#"Grouped Rows"{0},"RunningTotal")

The secret is understanding that the #"Grouped Rows" portion is just the name of the previous step in the M code, and the final “RunningTotal” is the name of the column you want to return.  That lets me retrofit it pretty easily into any other function.

So… would it be faster to just flip it into a pivot and add running totals there?  Absolutely.  Would it be faster to just land the data in a table and create a running total formula down the table?  Absolutely.  But if those aren’t options, then Power Query may be a viable solution for you.  :)

Prompt for a Folder at Refresh

I got a comment on my blog the other day from David, asking if we could make Power Query prompt for a folder at refresh, allowing us to choose which folder to should be used to consolidate files.  That sounds like a pretty cool idea, and my first thought was “I think we can totally do that!”

I headed over to Chris Webb’s blog, as he’s done some cool things with Power Query prompts. Ultimately though, I got stuck on one thing… we have to go into the Query to invoke it.  My (albeit limited) tests left me unable to have that prompt when we refresh the query.

Not to be outdone by a lacking feature though, I cooked up a solution using some Power Query and VBA techniques that I’ve shared before.  This post wraps them up into a neat little package.

Background

The idea I’m going for here is to be able to click a button which:

  • Prompts the user to select a folder,
  • Feeds that folder into Power Query, and
  • Refreshes the Power Query output table.

To set it up, I used 3 techniques that I’ve shared before:

Initial Setup

To begin with, I created a blank workbook, and immediately added a Power Query parameter table as outline in Building a Parameter Table for Power Query.  I followed the steps there exactly, so I’m not going to detail that here.  The only things of note were:

  • I named the worksheet “Parameters”, and
  • I used the following for my “File Path” value:  D:\Consolidate\Begin

image

Next, I created a query to consolidate all the files in a folder.  Basically I ran through the following steps:

  • Power Query –> From File –> From Folder
  • Chose my folder:  D:\Consolidate\Begin
  • Did a bit of cleanup on the file
  • Implemented the “fnGetParameter” function call in place of the folder (as described in the aforementioned blog post)

The end result was some M code that looked like this:

let
Source = Folder.Files(fnGetParameter("File Path")),
#"Combined Binaries" = Binary.Combine(Source[Content]),
#"Imported CSV" = Csv.Document(#"Combined Binaries",null,",",null,1252),
#"First Row as Header" = Table.PromoteHeaders(#"Imported CSV"),
#"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",{{"TranDate", type date}, {"Account", Int64.Type}, {"Dept", Int64.Type}, {"Sum of Amount", type number}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Sum of Amount", "Amount"}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Renamed Columns",{{"Amount", type number}}),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type1", {"TranDate"})
in
#"Removed Errors"

And returned results like this:

SNAGHTML250112a0

I then loaded the Power Query to the worksheet, and called the worksheet “Data”.

So far so good?  Nothing really unusual yet.

Laying the Folder Change groundwork

With the basic functionality in place, I wanted to now give the user the ability to change the folder.

Rather than write the Browse For Folder routine from scratch, I headed over the VBAExpress.com and grabbed the one I submitted to the KB… um… a long time ago.  :)

Once I had that code copied I:

  • Opened the VBE
  • Browsed into my workbook
  • Right clicked the project and added a new Module
  • Pasted in all the code (shown below):

Function BrowseForFolder(Optional OpenAt As Variant) As Variant
'Function purpose:  To Browser for a user selected folder.
'If the "OpenAt" path is provided, open the browser at that directory
'NOTE:  If invalid, it will open at the Desktop level
Dim ShellApp As Object
'Create a file browser window at the default folder
Set ShellApp = CreateObject("Shell.Application"). _
BrowseForFolder(0, "Please choose a folder", 0, OpenAt)
'Set the folder to that selected.  (On error in case cancelled)
On Error Resume Next
BrowseForFolder = ShellApp.self.Path
On Error GoTo 0
'Destroy the Shell Application
Set ShellApp = Nothing
'Check for invalid or non-entries and send to the Invalid error
'handler if found
'Valid selections can begin L: (where L is a letter) or
'\\ (as in \\servername\sharename.  All others are invalid
Select Case Mid(BrowseForFolder, 2, 1)
Case Is = ":"
If Left(BrowseForFolder, 1) = ":" Then GoTo Invalid
Case Is = "\"
If Not Left(BrowseForFolder, 1) = "\" Then GoTo Invalid
Case Else
GoTo Invalid
End Select
Exit Function
Invalid:
'If it was determined that the selection was invalid, set to False
BrowseForFolder = False
End Function

With that in place, I just needed to link that function into a routine that can use it…

Rolling the Final Routine

I write a lot of VBA, so this was pretty quick to knock up for me.  The code itself looks like this:

Sub UpdatePQ()
Dim vFolder As Variant
Dim wsParameters As Worksheet
Dim wsData As Worksheet
'Set the worksheets
Set wsParameters = Worksheets("Parameters")
Set wsData = Worksheets("Data")
'Update the folder to import
vFolder = BrowseForFolder
If CStr(vFolder) = "False" Then
'No folder chosen, exit the routine
Exit Sub
End If
'Update parameter table with folder
wsParameters.Range("B2").Value = vFolder
wsData.Range("A1").ListObject.QueryTable.Refresh BackgroundQuery:=False
End Sub

In short, this routine basically:

  • Sets references to the two worksheets
  • Prompts the user for the folder.  The output here could be a folder path or “False” which means the user cancelled.  If it’s “False” we bail out
  • If the path was provided, it gets placed in B2 of the Parameter worksheet (which is the “Value” field for the “File Path” variable.  (So if you didn’t create your table starting in A1, then you’d need to update this.)
  • Finally, we refresh the Power Query output table which I landed in cell A1 of the Data worksheet.

Final Setup Step

The very last step is to link the UpdatePQ macro to a button.  This is fairly straight forward as well, and the steps are illustrated well in the Refresh Power Query with VBA post.

Make Power Query Prompt for a Folder at Refresh

And now, the only thing we need to do is click a button…

image

Choose a folder…

SNAGHTML2525f706

And the query will refresh the data in the landing page.

Try it out

You can download the completed workbook, as well as two folders with the source data, from my Skydrive. Click here to do so.

Caveats

I hope it goes without saying that the data structure of the files in the selected folders must be the same.  If they’re not, the query load will fail, of course.  But so long as you’re just trying to deal with different months/years, this should work nicely.

Master Your Excel Data: Live Courses

I’m pleased to announce that I’m just opened registration for 3 days of Excelguru live training courses in May 2015.

Master Your Excel Data

This course will focus on using tables, PivotTables and Power Query to master your Excel data.  I’m super stoked to be teaching this, as I think Power Query is just one of the hottest tools out there right now.  Based on it’s incredible ability to set data into a useful format it only makes sense to blend it with a bit of PivotTables to show how we can take nasty data and turn it into something beautiful.

It will be a full day, hands-on experience, and I’ll be running it at the following dates/times:

  • Victoria, BC – May 11, 2015 (9:00-4:30)
  • Kelowna, BC – May 14, 2015 (9:00-4:30)

To learn more and register, click here:  http://www.excelguru.ca/content.php?291-Live-Course-Master-Your-Excel-Data

Financial Model Design

This course will focus on the steps required to build a solid financial model.  From planning to architecture and implementation, we’ll look at both theory and the tools to help it become a reality.  More than that, we’ll build a solution that is not only easy to audit, but also focus on making it stand the test of time.

It will also be a full day, hands-on experience, and I’ll be running it at the following dates/times:

  • Victoria, BC – May 12, 2015 (9:00-4:30)

To learn more and register, click here:  http://www.excelguru.ca/content.php?292-Live-Course-Financial-Model-Design

Not Coming to Your Area?

For reference, if either of these (or any other course) is something you’d like to see in your area, I’d love to hear about it. Drop us a note and let us know, as we’ll be trying to reach out further and offer more courses as the year progresses.