Power Query – The Round Function

The other day I asked one of my co-workers how many ways he knew of to round a number.  His answer was one… if it ends in .4 it rounds down and if it ends in .5 it rounds up.  My guess is that most people would answer along similar lines.

Interestingly though, there are a bunch of different ways to round, depending on your needs, and Excel has a bunch of functions to support them: ROUND, ROUNDUP, ROUNDDOWN, FLOOR, CEILING, EVEN, ODD, TRUNC, INT and MROUND.

Power Query also has a bunch of rounding formulas as well but, given that the function names can be somewhat different in Power Query (as we first saw here), I thought it might be interesting to see how to convert each of those functions from Excel to Power Query’s structure.

Background Setup

To start with, I created a very simple structure: a two column table with some random values in the “Value” column and then rounded them to 2 decimals using the formula =ROUND([@Value],2)  The output, after feeding it through Power Query, looks like this:

SNAGHTML20a5291b

The blue table on the left is the Excel table, and the green table on the right is the Power Query output.  (There is a completed example file available hereexample file available here.)

Creating the Round function

I love the ROUND function in Excel.  I use it constantly – rounding everything that uses multiplication or division – and pretty much have it burned into muscle memory.  So to me this was a logical place to start with Power Query.  Naturally, the syntax is just a bit different from Excel though:

Syntax
Excel =ROUND(number,digits)
Power Query =Number.Round(value, digits, roundingMode)

Hmm… we know that the Power Query function will be case sensitive.  In addition, it has an extra parameter.  The valid options are:

  • RoundingMode.Up
  • RoundingMode.Down
  • RoundingMode.AwayFromZero
  • RoundingMode.TowardZero
  • RoundingMode.ToEven

Let’s see what we can do with this.

I open up my query, select Add Column, and put in the formula as shown below:

image

Pretty easy, just Number.Round([Value],2).  In fact, it’s so similar to Excel it’s awesome!

So I click OK, save the query, and have a look at my results.  And that’s when I notice something isn’t quite right.  I’ve added some conditional formatting to the table below so you can see it easily:

image

This is the default?

Notice all the numbers that don’t match?  Can you spot the pattern?  It’s the oddest damn thing I’ve ever seen, to be honest, and I’ve never heard of anyone rounding in this way.

The default “RoundingMode” for Power Query is “Round to Even”.  What that means is that if there is a tie in the numbers it will round up or down to the closest even number.  So in the case of 1.645, it will round down to 1.64 as that is closer than 1.66.  (1.64 is .05 away from 1.645, where 1.66 is 0.15 away from 1.645).

I find this deeply disturbing.  I personally think that every user would expect Excel and Power Query’s default rounding methods to line up exactly, and this doesn’t.  How serious is this?  I’m not sure.  I think I’ll let someone from the scientific community ponder that.

Using RoundingMode.Up

Since the default plainly doesn’t work for us, it looks like it’s time to figure out which of the additional parameters we need.  Let’s try adding RoundingMode.Up to see if that will fix it.

I open Power Query again, and added a new custom column with the following formula:

=Number.Round([Value],2,RoundingMode.Up)

And the results are as follows:

image

Um… uh oh.  It seems to work above 0, but below is another matter.  That –5.245 is rounding down, not up! (Yes, from a technical perspective I am aware you can argue the words I used, but you get the idea.)

Using RoundingMode.Down

Now I’d be surprised if this came up with numbers consistent with the Excel formula, but let’s just check it for good measure.  The formula is:

=Number.Round([Value],2,RoundingMode.Down)

And the results:

image

So now numbers greater than 0 get rounded down, where numbers less than 0 are being rounded up (away from zero).

Let’s try another:

Using RoundingMode.AwayFromZero

Here’s our next option:

=Number.Round([Value],2,RoundingMode.AwayFromZero)

And these results are pleasing!

image

Look at that… we finally found the one that works!

Using RoundingMode.TowardZero

We’ve only got one other option we haven’t explored, so we might as well use it too, just for the sake of completeness:

=Number.Round([Value],2,RoundingMode.TowardZero)

For some reason, I’m incapable of typing TowardZero the first time I type this.  I always type TowardsZero and end up with an error!  At any rate, the results:

image

Thoughts

As a tool that is built for the Excel audience, I am having some real difficultly accepting the default parameter for this function in Power Query.  I HOPE that this is a bug, and not a design choice, although the documentation would suggest it is the latter.  If that’s the case, I think it’s a HUGE mistake.

Excel’s ROUND formula defaults to round away from zero.  Power Pivot’s DAX ROUND formula defaults to round away from zero.  VBA’s Application.Round function defaults to round away from zero. (As pointed out by Rory Archibald on Twitter, VBA’s Round function – without the application. prefix – does use banker’s rounding though.)

In my impression, if the Power Query formula holds the same name (at least after the Number. portion) it should return the same results as the Excel function.  In fact, I would venture to say that virtually every Excel pro would expect this.

My bigger concern would be that, with one of Power Query’s big selling features being it’s ability to re-shape and process large volumes of data, how quickly will a user realize that the Rounding function they thought they had is NOT working the way they expected?  Not good news at all.

I’m curious to hear your impressions.  Please leave a comment!

Want to see if for yourself?

Download the example file with all the formulas already in place.

Merging Columns with Power Query

The August update for Power Query was finally made available on Sept 1, and it has some pretty cool stuff in it.  In this week’s segment I thought we’d cover off one of the features that I’m most excited about as an Excel Pro: merging columns.

The old way

It’s been possible to merge two or more columns together in the past, but you had to write a formula to do it.  Honestly, it wasn’t a huge deal, but it still took a bit of know-how and work.  Assume, for example, we had this:

SNAGHTML702fae4

And our goal is to concatenate the Account and Dept columns together with a hyphen between them.  Here’s what you had to do:

  • Insert a New Column (the steps for this varied depending on the version of Power Query you are running.  Currently it is Add Column –> Add Custom Column)
  • When the prompt pops up you had to provide a formula like shown below:

image

Okay, so not a huge deal.  Just =[Column1] & “-“ & [Column2]

But you still had to write it.  I’ve lost count of how many people to whom I’ve taught the simple & shortcut for Excel formulas, but it’s enough to say that it’s probably not intuitive.

So it worked, but could it become easier?  We now know the answer is Hell Yeah!

The new way

This time we’ll do it differently…

  • Select the Account column
  • Hold down CTRL (or SHFT) and select the Dept column
  • On the Add Column tab, click Merge Columns

image

  • Choose your separator.  The default is –None- (meaning it will just mash them together), but other pre-defined options include Comma, Colon, Equals Sign, Semicolon, Space, Tab
  • What I want (a minus sign) isn’t there, so I’m going to choose –Custom–

image

  • Now I’ll enter a – (minus) sign and click OK

image

And that’s it!  My output comes together nicely:

SNAGHTML71194af

Now, to be fair, I still have to rename the column.  I do wish this interface had a way to name the column in advance (like exists when you create a custom column.)  Hopefully the PQ team will retrofit us with that ability at some point in the future.

In the mean time, we can either right click the column header and rename it there, or we can edit the column directly in the formula bar.  Just change the highlighted part shown below:

image

Like this:

image

So honestly, it’s not that much more efficient, why do I think this is cool?  Well, it’s not that much more efficient with 2 columns.  But try 4.  Or when you just need to put 4 columns back together with no spaces in between.  Then it starts to make life much easier.

Power Query – Multi Condition Logic

In my last post, we looked at creating an IF statement using Power Query.  This time we’re going to go a bit deeper and look at a scenario where we need to choose between several options.

The base scenario

In the last post I talked about my billing code setup.  As a reminder, it’s a 10 digit code that is constructed as follows:

  • Char 1 – Alpha – Indicates the division (G = Golf, F = Fitness, M = Marina)
  • Char 2 – Alpha – Indicates the billing type (D = Dues, S = Pass, A = Annual Moorage, P = Periodic Moorage)
  • Char 3-4 – Numeric – Indicates the number of months of coverage for the product (1-12)
  • Char 5-6 – Numeric – Indicates the start month (and subsequent anniversary) for the customer’s product
  • Char 7-8 – Variable – Slip length (in feet) for a boat in the case of marina customers, or SG, CP or CS for golf (indicating single, couple primary or couple spouse)
  • Char 9 – Text – A variety of single letter codes indicating specific things we want to know. (Outlined below)
  • Char 10 – Text – Indicates the payment method (F = Financed, P = Paid up front, C = Comp/Honorary)

(Note that the sample data only includes records for Marina data)

Sample file

If you’d like to follow along, download the sample file here.

Multi condition logic – Using IF

So, building on my previous two posts (using text functions and creating IF statements), we could easily break the first character into pieces by nesting 2 IF tests together:

=if Text.Start([BillingCode],1)=”G” then “Golf” else “two options left”

=if Text.Start([BillingCode],1)=”F” then “Fitness” else “Marina”

into

if Text.Start([BillingCode],1)=”G” then “Golf” else if Text.Start([BillingCode],1)=”F” then “Fitness” else “Marina”

Not too hard really.  In fact, we can even build each IF statement separately, then just copy the second to replace the “two options left” part without making any other changes at all.  No parentheses or anything needed.

More than 3 options

But what if you have a whole bunch of options that you need to work with?  Let’s look at the 9th character in our billing code.  I haven’t given the details yet for that one, but here are the options:

E = Employee, S = Yacht Club, N = Non-Taxable, R = Restricted, I = Inactive, L = Social, M = Medical, U = Regular

Wow.  That’s a whole lot of possibilities, and would make for one monster nested IF statement.  That wouldn’t be a lot of fun to write, nor maintain.  So how would we deal with it?

In Excel proper, we would probably separate these options into a separate table, then use VLOOKUP to pull the appropriate value into the table.  So we just need a Power Query VLOOKUP function… except there isn’t one.

We actually have a couple of different methods to deal with this.  We could either:

  1. Split the first character into one column, create an Excel table with the first letter in column 1 and the appropriate match in column 2, then merge the two using Power Query’s merge function. (Maybe I’ll write a future post on it.)
  2. Build our own CHOOSE function inside Power Query (or SWITCH if you prefer Power Pivot’s DAX version.)  This is way more fun, so let’s do that.  :)

Building a CHOOSE function

This actually isn’t too hard once you know the basic structure.  It basically goes like this:

function_name = (input) => let
   values = {
         {result_1, return_value_1},
         {input, “Undefined”}
     },
   Result = List.First(List.Select(values, each _{0}=input)){1}
in
   Result,

The key parts to recognize here are:

  • We can change the “function_name” part to be whatever we want/need,
  • result_1 is the first of the possibilities that we may pass TO the function
  • return_value_1 is the value that we’d like to return if the first value is result_1
  • if we need more values, we just insert another comma after the value_1 section and put in a value_2 section
  • we can keep adding as many values as we need.
  • the “Undefined” value will return the text “Undefined” if the value you pass isn’t in your provided list of options (it’s the Else portion of the CHOOSE statement)

Using this structure, we could write a CHOOSE function for our scenario as follows:

fnChoose_CustCode = (input) => let
   values = {
         {“E”, “Employee”},
         {“S”, “SCYC”},
         {“N”, “Non-Taxable”},
         {“R”, “Restricted”},
         {“I”, “Inactive”},
         {“L”, “Social”},
         {“M”, “Medical”},blah
         {“U”, “Regular”},
         {input, “Undefined”}
     },
   Result = List.First(List.Select(values, each _{0}=input)){1}
in
   Result,

Notice that I changed a couple of things:

  1. I gave the function a name so that I can recognize it, and also so that I can create more than one function with different names.  This one is fnChoose_CustCode.
  2. I created a list of all the options I needed.

Implementing the CHOOSE function

Okay, so now we have this, how do we use it?  Again, we’ve got two options.  I’ll focus on the other option at some other time, but for this scenario I want to build this right into an existing query.  So here’s how I do it.

First I created a brand new query that just pulls my table into Power Query, resulting in the following:

SNAGHTML76c03ee

Let’s go and inspect the code that pulls this in.  We need to click View –> Advanced Editor.  That will bring up the following code:

let
    Source = Excel.CurrentWorkbook(){[Name="Customers"]}[Content]
in
    Source

Not too complicated (yet).  Let’s paste in our code just before the Source = line:

let

fnChoose_CustCode = (input) => let
   values = {
         {“E”, “Employee”},
         {“S”, “SCYC”},
         {“N”, “Non-Taxable”},
         {“R”, “Restricted”},
         {“I”, “Inactive”},
         {“L”, “Social”},
         {“M”, “Medical”},blah
         {“U”, “Regular”},
         {input, “Undefined”}
     },
   Result = List.First(List.Select(values, each _{0}=input)){1}
in
   Result,

   Source = Excel.CurrentWorkbook(){[Name="Customers"]}[Content]
in
    Source

Perfect.  And yet it doesn’t exactly look like much.  In fact, beyond adding a new line in the Steps section of the Editor, we don’t see any changes:

image

So what good did that do then?

As it turns out, we’ve only pasted in our function to make it available to the Power Query engine.  We haven’t actually told Power Query to do anything with it.  So why don’t we do that now?

Using our new CHOOSE function

You’re going to be amazed how easy this is…

First we’re going to add a new column (Add Column –> Add Custom Column).  When the dialog pops up, we’ll create a formula to return the letter we want to pass to the function:

=Text.Range([BillingCode],8,1)

And that gives us the following result (assuming we provided the column name of Status):

SNAGHTML29d7316

Cool stuff.  Now, let’s amp this up and use our function.  We’ll click the gear next to the Status step and wrap the existing formula with our function call.  (Don’t forget the extra  parenthesis needed at the end):

=fnChoose_CustCode(Text.Range([BillingCode],8,1))

Which gives us the following:

SNAGHTML2a06cbf

The end effect

You’ll find that all the sample codes in the data work just fine, and that nothing comes back as undefined.  If you’d like to see how the query reacts to different items, go back to the Customers table and try changing the second to last letter to something else.  When you refresh the table, you’ll find that it will evaluate the new character and return the appropriate result.

Caveat

It should be noted that the function as written above is case sensitive, meaning that a code of MP010450uP would return “Undefined”.  This is expected in my case, as valid codes are made up of upper case letters.

If I wanted to accept either case I would need to modify my Text.Range function to force it to upper case.  This would result in a function call that reads as follows:

=fnChoose_CustCode(Text.Upper(Text.Range([BillingCode],8,1)))

Which would work, as you can see here:

SNAGHTML2be7f6a

Power Query – The IF function

In my last post I talked about useful text functions, and how they differed between Excel and Power Query.  Today we’re going to look at another compare/contrast scenario, but this time it’s going to be the IF function.

Critical background

The only important thing we need to remember here is that all functions in Power Query, whether text, logic or anything else, are case sensitive.  That may strike you as weird in this one, but we need to remember that “if” is not the same as “IF”, and that Power Query will gag on the latter.

The base scenario

For this example I’m going to work with a table of data that holds a customer number, a boat type and a billing code schema.  While the data has been scrambled, this represents a real structure that we use in my day job.

There’s no real mystery to the first two items, but the billing code schema we designed holds a ton of info.  It’s always 10 characters long, and breaks down like this:

  • Char 1 – Alpha – Indicates the division (G = Golf, F = Fitness, M = Marina)
  • Char 2 – Alpha – Indicates the billing type (D = Dues, P = Pass, A = Annual Moorage, P = Periodic Moorage)
  • Char 3-4 – Numeric – Indicates the number of months of coverage for the product (1-12)
  • Char 5-6 – Numeric – Indicates the start month (and subsequent anniversary) for the customer’s product
  • Char 7-8 – Variable – Slip length (in feet) for a boat in the case of marina customers, or SG, CP or CS for golf (indicating single, couple primary or couple spouse)
  • Char 9 – Text – A variety of single letter codes indicating specific things we want to know. (Will factor in to a future post.)
  • Char 10 – Text – Indicates the payment method (F = Financed, P = Paid up front, C = Comp/Honorary)

So a table of customer data could look like this:

SNAGHTML153bc957

Turning data into more useful data

So great, we’ve got this awesome billing code schema, but it doesn’t really tell me anything when I look at it, as it’s too complicated to read.  I really need to break this into separate pieces, and make useful and readable columns out of it.  So that’s what I’m going to start doing now.

The first step is, of course, to click in the table and go to Power Query –> From Table.

My goal here is to make a column that says “Annual” if the second character is an “A”, or “Periodic” if the second character is a “P”.  To start, I’m just going to reach back to last week’s article and make sure I can identify which character I’m looking at.  So first I’ll click “Add Custom Column”.

I’ll call my new column “Seasonality”, and use a formula to extract just the 2nd character:

SNAGHTML16328557

And with that in place we can now focus in on the important data here:

SNAGHTML163368c2

Writing IF functions in Power Query

Assuming the data was in a table that started in row 2 of the worksheet, either of the following formulas would work to convert “A” to “Annual” or “P” to “Periodic”:

=IF([@Seasonality]=”A”,”Annual”,”Periodic”)
=IF(D2=”A”,”Annual”,”Periodic”)

Easy enough, right?  But look at how the signatures differ from Excel to Power Query:

Excel =IF(test, value_if_true, value_if_false)
Power Query =if test then value_if_true else value_if_false

Notice that there are no parenthesis or commas in the Power Query version, but you need to actually type out the “then” and “else” portions.  So to create the same thing in Power Query, we’d need a new column that uses the formula:

=if [Seasonality]=”A” then “Annual” else “Periodic”

Or, as is my preference, we modify the Seasonality column we already built, wrapping the text extraction with the IF function as follows:

=if Text.Range([BillingCode],1,1)=”A” then “Annual” else “Periodic”

Once we modify the original formula, our table now correctly shows the different values all the way down:

SNAGHTML1642027d[5]

Observations

Once again, I find this a bit of a departure from regular Excel formulas.  Although it’s not hard to make the transition once you understand it, it would still be nice if the language could leverage the skill set we’ve worked so hard to master.  You could argue that the verboseness of the Power Query IF function is easier to read, but it’s still inconsistent with the formulas we know and love.

I still feel it would be nice if we could have an alternate pointer into the same function so that I could type this in Power Query too:

=IF([Seasonality]=”A”,”Annual”,”Periodic”)

I think that would just make it so much easier to get off the ground running for Excel pros.

I’ll also point out that the error message Power Query gives you when you create an IF function or formula is not exactly helpful:

image

Most Excel pros aren’t going to understand what “Token Eof expected.” means, and I really have to question how it is telling me anything that I need to do to fix the formula.  Hopefully, in future versions of Power Query we get a more helpful message that says something like “It looks like you typed an upper case formula name.  Can I fix that for you?” (Maybe that will come with Intellisense and auto-complete…)

Taking this further

Next blog post we’ll look at how to take this a bit further… extending our conditional logic to look up a corresponding value in a list, avoiding having to nest several IF functions within each other.

5 Very Useful Text Formulas – Power Query Edition

Years ago I published an article on my site called Five Very Useful Functions For Working With Text.  The article is geared to explaining five functions specific to working with Text in Excel, and are a set of the most under-utilized functions in Excel (in my opinion).  It even lets you try them out live in the web page without even having to open Excel at all.

Now, over the past few months I’ve been working with Power Query, and one of the things that’s been driving me a little crazy is that the formula names in Power Query and not the same as they are in Excel.  I personally think this is a bit of a mistake, and that the formula names in Power Query should have been a bit more congruent with standard Excel formulas (Power Pivot’s DAX functions are similarly named to Excel, so why not Power Query’s M language?)

Some critical background

Before you bang your head against the wall, there are two things that are really different between Excel formulas and Power Query formulas:

Case sensitivity:

While Excel formulas are not case sensitive, Power Query formulas are.  If the Power Query formula signature says “Text.Start” then “TEXT.START” or “text.start” will NOT work for you.

Base 1 vs Base 0

Excel formulas are what we refer to as “Base 1”.  This means that you count starting at 1.  Power Query, on the other hand starts counting at 0, not 1.  The implications of this are that it is very easy to write your formula referring to a number that is out by 1.  To see the effects of this, check the section below on the alternate for the MID function equivalent.

Excel –> Power Query Formula Equivalents

Let’s take a look at how the five functions I illustrated in that original example differ from Excel to Power Query…

LEFT

To get the left x characters, we basically replace LEFT with Text.Start:

  Syntax Example Result
Excel =LEFT(text,num_chars) =LEFT(“Excel”,2) Ex
Power Query =Text.Start(text,num_chars) =Text.Start(“Excel”,2) Ex

Easy enough once you recognize it, although I would have preferred that the formula name was consistent.

RIGHT

To get the right x characters we have a similar situation.  The function name needs to change from RIGHT to Text.End:

  Syntax Example Result
Excel =RIGHT(text,num_chars) =RIGHT(“Excel”,2) el
Power Query =Text.End(text, num_chars) =Text.Start(“Excel”,2) el

Okay, so we’re getting the hang of this now…  Just change the function name and the rest work the same, right?  Um, no.

MID

This one gets a bit weird.  First we replace MID with Text.Range.  Okay, no problem there.  But look at the results when we pass the same parameters:

  Syntax Example Result
Excel =MID(text,start,num_chars) =LEFT(“Excel”,2,2) xc
Power Query =Text.Range(text, start,num_chars) =Text.Range(“Excel”,2,2) ce

They differ a little, don’t they?  The issue comes down to that base 0 vs base 1 thing I mentioned above.  Where Excel’s formula language counts the word with E being character 1, Power Query considers that character 0.  So in this case, when we tell Power Query to start returning text at character 2, it pulls back c (E is 0, x is 1, c is 2).  Interestingly though, the last parameter needs to be 2 to pull back 2 characters.

LEN

Getting the length of a text string in Power Query is actually a bit more intuitive than Excel’s native function, only because the function name isn’t trimmed off.  Text.Length is what we need instead of LEN.

  Syntax Example Result
Excel =LEN(text) =LEN(“Excel”) 5
Power Query =Text.Length(text) =Text.Length(“Excel”) 5

Notice that the result for this does return five characters, as you’d expect.  So this plainly works as a 1 based result in both Excel and Power Query.

FIND

And finally we come to the FIND function.  This one is again a bit confusing.  We’ve got 3 things to consider here:

  1. The function name changes from FIND to Text.PositionOf
  2. The parameters for the text we want to find and the text we want to search in get flipped around!
  3. The result is 0 based, not 1 based
  Syntax Example Result
Excel =FIND(find_text,within_text) =FIND(“xc”,“Excel”) 2
Power Query =Text.PositionOf(text, find_text) =Text.PositionOf(“Excel”,”xc”) 1

So in the case of FIND, we put the “xc” first, and “Excel” second.  But in the Power Query version, it’s completely opposite.  And look at that result… in Excel the x is treated as the 2nd character.  In Power Query it is too, but because it starts counting at 0 we get a 1.

Some thoughts

I find that even after using Power Query for a while now, I still have to look up the formula names from the Power Query formula categories page, both to find the formula name, and also the syntax.  Part of this is due to the fact that there is no auto-complete/syntax help in the Power Query engine (I’m sure it will come eventually), but part is also that my instinct is to type the Excel function name first.  And then, when I do get it right I’m constantly getting tripped up by the base 0 base 1 conversion.

As this is a tool aimed at Excel users, I am a bit disappointed in the formula naming convention.  I could get used to pre-pending Text. to all of my functions, but I really wish the rest was similar to Excel.  Maybe one day the PQ team can give us duplicate handles into the same back end function so that we can write stuff like this IN ADDITION to what already exists (don’t deprecate, just give us alternate routes):

  • =Text.Right(text,characters)

Or how about:

  • =Text.Mid(text,start,characters)

That would be good too, especially if the signature could be tweaked to require a one based parameter for the start character.  That would be consistent with what the Excel pro would expect due to their experience with the MID function.

And how about this:

  • =Text.Find(find_text,within_text)

How much easier would that be to learn if the function not only accepted parameters in the same order as the Excel function, but returned a 1 based result rather than a zero based result (again, similar to the way the current FIND function returns.)

Despite what I’m suggesting here, it does need to be recognized that there is a way to do the job, which is critical.  I just think it could be better designed in order to make it easier for the seasoned Excel pro to learn because they’d be able to port their existing knowledge without having to learn a totally new syntax.

Power Query “Trusted” Locations

My last two blog posts detailed some frustration that I ran into when working with Power Query.  First the issue that I couldn’t trust folders, but got stuck trusting files, and then the issue where I overloaded my credentials area.  I’ve actually got a solution, in a way, for both.

One thing to be aware of here… there are actually two distinct entities that show up in the Data Source Settings dialog: Security Credentials and Data Privacy Settings.  You can see this in the following picture:

image

Notice how Exchange and the Database don’t show a privacy level on the right?  Those are security credentials.  Interestingly here, you can see two entries for the same thing (like a website), where one is the security credential, the other the privacy setting.

Ok, with that out of the way, let’s get on to the solutions…

Clearing out Security Credentials

In my last post I had over 700 security credential entries for ip-api.com that needed to be cleared.  To clear them manually took clicking the item, clicking delete and clicking ok before moving on to the next item.  Even if I could do one click per second (I find this UI slow and sometimes I missed the target), it would still take me 35 minutes.  I really wanted a “remove all” feature.  Sure I’ll lose Exchange, sure I’ll lose my database, but so what.  Small pain vs spending 30-60 minutes clicking Delete.

So I coded something to do it.  The download link is at the end of the post, but one of the features of the tool is to let you clear out your entire list of security credentials with on click:

image

And voila!

image

Now, are you going to use this every day?  Hell no!  But if you screw up like I did, you just may need a weapon like this in your arsenal.  :)

“Trusted” Folders/Files and URLs

As I mentioned in my last post, Ehren, a developer on the Power Query team messaged me on Twitter to tell me of one solution to the “Trusted Folder” problem.

If you set the privacy level for the folder, it applies to everything within that folder, including files in subfolders

Now that sounded cool.  So I set out to test it out.  First thing I did was went to add a new Folder to my Data Source Settings dialog and… there’s no option to do that. Hmm… okay… so how?

Well, you could go and build a solution that references a folder, just so that you can get a folder in there, I suppose.  But that’s awkward and contrived really.  You have to waste a bunch of time concocting something you hope will work.  And the worst part to me is that I don’t just have to do it on my system, I somehow have to deploy it to other people as well.  Ugh.

Screw it, I’m building a tool.  Here’s the features of what we’ve got:

Clear out all privacy levels

To be fair, this was probably more useful for me while testing, but I included it in case you run into the same issues.  Basically you click this button:

image

And now all the Data Privacy Settings are gone too:

image

Add Folder

So this one is way more useful to solving my issues.  I’m going to click “Add Folder” on the left.  It will let me browse for a folder:

image

And prompt me for the Privacy Level:

image

And once I click OK, it sets it up as a folder in the Data Source Privacy settings.  And yes, it really does trust all files in all subfolders:

image

Add Files

Same thing really, it just allows you to select a specific file.  This is a bit redundant to just declaring them as you go along but hey, once I already wrote the code for the folder this was a snap to add:

image

Add URLs

I love this one.  Easy to use, just provide the URL and the privacy level:

image

And boom!  There you are:

image

And again, this trusts all sub-sites of the main domain.  This was particularly key for me as I’m querying data from this site and have to provide a different URL for each year.

Now, again, on this one, the first time you query the site you’ll get another entry for each URL you touch:

image

This is because you have declared the privacy level, but not the security (unfortunately I can’t get in to modify the security files, for obvious reasons.)  So here’s what I’m going to do to fix this:

  • Select the first URL in the list
  • Click Edit Credential
  • Change the setting to apply to the root domain:

image

  • Click Save

You’ll notice that the first one disappears now.  What actually happened was that it merged the security credentials with the privacy level in one entry.  Cool.  So now I’ll just go back and delete the other two. And it never asks me again on refresh.  :)

The “Big Red Button”

So when it all goes really wrong, and you want to reset Power Query to a default state, what then?  That’s what this button is for.  It will wipe out all of your security credentials, data privacy settings, reset your Power Query formula bar and more.  It’s like a total factory reset of Power Query.

image

Some More Technical Stuff

I’ve now had this installed on Office 2010, Office 2013 Professional Plus and Office 365 Pro Plus versions of Excel, both 32 and 64 bit.  In addition, it’s been installed on Windows 7 Pro and Windows 8.1 systems as well.

Interesting enough, despite being an .MSI installer file, in my tests it has NOT required admin privileges to install.  (On one machine I installed this first without prompting, then got prompted for admin credentials when I went to update Power Query.)

Some Final Observations

I found this project pretty interesting, and it’s given me some ideas for some more useful tools to work with Power Query and other Power BI add-ins.  While I can’t promise a timeline on delivery, I do plan on adding a bunch of new useful stuff to this add-in and releasing a Pro Tools version at some point.

Before you download and install this, I also want to make something very clear.  THIS IS BETA SOFTWARE.  I’m pretty sure it’s stable, and shouldn’t affect anything else, BUT YOU DOWNLOAD AND INSTALL AT YOUR OWN RISK.

The Installer

You can download the installer from this link.

Pain Points – Power Query Credential Management

After my last post on Power Query Security Woes, Ehren – a developer on the Power Query team – sent me a message on Twitter about it.  I will blog about that later, after I’ve had a chance to test it out, but before I did, I thought I’d go and clean up some PQ stuff that I messed up.

A little history…

A while back I got the bright idea to pull down our web leads (at my day job).  I then though, “Hey, wouldn’t it be cool if I could plot all of them on Power Map?”  Sure it would.  But all I had was their IP address.  “No problem!”, I figured, “I’ll just use Power Query to feed out the IP to a web service and return a function to turn it into a real address!”

Success!

I cooked up a little script fed the IP to http://ip-api.com, which did exactly that. Here’s a sample from a random address:

image

I actually tried several services before I settled on this one, but ultimately  it seemed to generate the most accurate results.  Perfect, so I let the script fly, and it was awesome!

Success turns to failure…

Except… somewhere during my batch of 3,500 addresses it stopped working.  And when I went back to the site, I found out that I’d been blocked.  Ooops.  Apparently if you send over a certain threshold of queries in a certain amount of time you’re not a good person. (That’s why I’m not sharing the script.)  Sorry about that ip-api.com, I promise I’ll be good in future!

… and creates a mess in the process…

At any rate, the issue I then ran into is that I ended up with an entry for each URL in my Data Source Settings.  I was a while back, so I don’t remember if it happened by default, or if it happened because I was learning and just clicked the wrong thing.  End result is that I have an entry in my list for each IP I queried.

image

Since I’m going to be a good boy and not steamroll their site any more, I kind of want to clean them up.  No problem, right?  Click on the first address on screen, hold down SHIFT and click on the last address on screen and… huh?  I’ve only got the last address?  Okay, let’s try with CTRL… click, click.. ARGH!!

… and the mess turns into major frustration!

This interface only allows you to select one item at a time?  Really?  So to delete my 200 entries I have to click the line item, click Delete (at the top), click “Yes I’m sure” and then move on to the next one.  Ouch.  I see a lot of wasted time in my future if I really want to clean up now.  :(

The missing feature(s)

Power Query team, (if you’re still listening,) please give us the ability to multi-select in this dialog, and multi-delete items.  We need to be able to clean up, as we’re all human and make mistakes as we go along.  This feels like a severe punishment right now!

In addition, while I’m here, why isn’t there an “Add” button in this interface?  When I go to edit one of these addresses, I see the following:

image

I’m guessing that if I’d just set that to the root at the beginning, I wouldn’t have an entry for every IP in that query.  Man!  If only I’d been able to declare this up front and realized that this was an option!  Yet there is no way to do this from a simple UI.  Instead, I (believe) I have to:

  • Create a query to one page at the domain
  • Set it’s security level as I pull data
  • Save the query
  • Go back to this interface and edit the anonymous access

Life would be SO much easier if I just had the ability to go into this UI and create my favourite and most used sites.  (And bonus points if I could push them out to users through group policy like I can with Office Trusted Locations!)

Okay, enough for now.  I’ll be back later once I’ve tested Ehren’s suggestion.  :)

Power Query Security Woes

Recently I’ve been working on automating a business process for a client.  It’s become a really interesting project that uses a mix of Excel tables, Power Query, Power Pivot and VBA, as well as a WinAutomation script.  Before I talk about the issue that is facing me, it would be helpful to provide a bit of background on what we’re doing, and why so many of the Power BI pieces are needed.

Solution Background

My client has a business in which they outsource employees to other companies.  Each of those companies maintains the records of hours, and lets my client download a spreadsheet version of the hours logged by the subcontracted employee.  This is fairly helpful, as my client is the one that pays all of the employees, so getting the hours lets him know who gets paid for how many hours.  Easy enough, right?  But each spreadsheet is in a different format, and that causes a lot of manual entry to try and standardize it into a file that can be uploaded to the payroll processor.

So here we enter Power Query.  With Power Query we can import each of the files individually, reformat the data into a consistent set of columns, and output it into a table.

Each pay period we start a new Excel file, and import the data files for that pay period by running a WinAutomation script.  The script logs in to the remote systems, passes the correct parameters to the system, and downloads the Excel and CSV files required.  It then saves them in a subdirectory of our application under the pay period end date.  My Power Query scripts then use the pay period end date, dynamically build a file path to the file and import the data.

It’s beautiful… except…

The Issue

Here’s where the pain begins…  Every time you touch a new data file in Power Query that you haven’t used before you get a security message asking you what type of data (Public, Private or Organizational) you are using.  (You can learn more about those here:  http://office.microsoft.com/en-us/excel-help/privacy-levels-HA104009800.aspx) The issue I have is that each payroll I create new files in a subdirectory, so Power Query sees them as unique.  To that end I can’t just trust the data files once and be done with it.

Now, there is a workaround… just enable the Fast Combine feature.  That kills off the warning and lets them go, but it also has an issue.  This setting is workbook specific, which means that when my VBA saves the master payroll control file under a new name (we preserve history this way) the setting doesn’t stay selected.  Grrr.  Given that there is no way to touch this setting via code, my user has to remember to click the Fast Combine button every time they run the update.  Is it minor?  Sure.  But it’s ridiculous, it gets forgotten, and concerns them when they get the permissions messages.

The Solution We Need

Now don’t get me wrong, I’m all for security, but where it makes sense.  In Office 2007 we got a new macro paradigm that allowed us to trust folders on our computers/networks.  This setting is set on an application level basis, and persists between Excel sessions.  Beautiful, as I can set it to a specific folder and forget it.  I throw trusted files in there, I don’t throw in the ones I don’t know.  It actually allows me to practice safer computing overall.

In my opinion, this setting is drastically lacking in Power Query.  I really need the ability to set my Power Query add-in to have trusted root folders with trusted subfolders.  This would allow me to trust my application’s data directory and not have to remember to click Fast Combine each time I create new files.  It also means that I might pay attention to the message when it does come up in future, as it would be unusual.  Currently I see the error so much I just cancel it and go straight to Fast Combine.  Hmm… kind of like how we set macro security to “Low” in Excel 2003 and earlier to bypass the annoying message, which essentially left us unprotected.

The Solution (Some of us) Can Implement

To be fair, I believe I do have a workaround for this now, but I don’t think it should be necessary.

Basically what I could do is use VBA to drive the refresh of my tables, and therefore the Power Query scripts.  Given that, I could use VBA to copy and replace the files in a central repository where the hierarchy does not change.  I.e. I could set up a folder like AppData\Timesheets\Current and use VBA to copy the required data files from my dated subfolders, replacing the ones in the “current” folder.  I would then target my Power Query scripts against the files in the “Current” folder, and build my solution off that.

Because Power Query holds a list of trusted files at an application level (rather than workbook level), these files should stay trusted even if I do create a new file, removing the need to constantly click the “Fast Combine” button.  So I think this should work.

So what’s the problem?  It only involves creating a VBA macro to do that.  Fine for me, as I know how to code with VBA and make this happen.  But for most of the Power Query target market I would suggest that this is probably outside of their normal skill set.

Final Thought

Even though it is technically possible to work around this issue, I still argue should not be necessary.  We need a proper option to trust the files in a local folder so that dynamically referring to a file in Power Query can be scripted without invoking a painful security paradigm.

Importing Dates in Power Query

A couple of weeks ago I was teaching a course on Power Query.  I distributed a bunch of sample files, started to walk the attendees through the steps, and a hand went up.

“I get an error in my date column…”

For a second it stumped me… I tested my examples 15 times before pushing them out to the class.  Why now?

As it turns out it’s that pesky issue that drives many non-North Americans crazy.  I keep my regional settings set to use the MM/DD/YYYY format.  I just find it makes life a lot easier for me.  What we ran up against here was a user who was using the Canadian standard of DD/MM/YYYY.  Yuck.

I promised a way to show how to fix it, and am finally getting around to posting it…

Replicating the Issue

The file I’m working with here is a csv file (saved in a txt format) that uses the DD/MM/YYYY format, as that replicates the issue for me.  So I import it and end up with this:

image

So what’s the problem?  Well, if you look at the areas I circled in red, the ChangedType step changed the data type for the TranDate column to Text.  That’s no good.  So I highlight the TranDate column and change it to Date:

image

Great.  An error.

So this is what I’m talking about.  The dates above have been converted to dates, but not the 1st of February, 2nd of February and so on.  Instead it’s January 2nd, February 2nd, etc.  As soon as it gets to the 13 it gags as there is no 13th month.

This is not an uncommon issue, by any means.  If you database exports in one format and your system settings are in another, you’ll run into this.  Multinationals that exchange files across borders see this all the time (just look in any forum for the amount of help posts on the topic.)

I REALLY wish that Power Query were smarter about this.  I also wish there was an easy way to just tell Power Query which date format your data (not your system) was using, or an easy way to convert it.  Anything to let me do something in one click versus what we need to do to fix this.  (As an added insult, the old text import wizard DID let you declare this.  Go figure!)

The Solution

All right, so how do we fix it?  I’m sure there are a variety of ways, and hopefully a new ribbon will come along to make this obsolete, but here is ONE way.

Step 1: Remove the garbage

First I deleted the “Changed Type” step of the query.

Step 2: Split by the / character

Next I selected the TranDate column, went to the Transform Tab (if you don’t have it download the latest Power Query update), chose Split Column, then By Delimiter, then Custom.  I put in a / and clicked OK.

As an aside here, I wish Power Query suggested what it thought you wanted to use as the delimiter, without actually clicking OK for you.  In this case I think it’s pretty obvious, even though it’s not in the list of defaults.  Granted, I might want to override it with something crazy like a 0, which is why I wouldn’t want it to just click OK for me, but it might save me time if it made an intelligent suggestion.

At any rate, I end up with this:

image

Step 3: Rename the columns

This part is important for the next piece.  Rename your columns to reflect which part of the date they belong to.  You can do this by right clicking the column header and choosing Rename…  (Alternately you can edit the M code in the formula bar manually, but you need to edit the “SplitColumnDelimiter” step first, then edit the ChangedType step to reflect the new column names.  Probably easier to just right click and rename them:

image

Step 4: Putting the date together… correctly

OK, now the tricky part.  We need to insert a column and write a Power Query formula.  To do this, go to the Insert tab, and click Insert Custom Column.  You’ll be presented with a dialog to write your formula.  And here’s the formula you need:

=#date([Year],[Month],[Day])

Sadly, there really isn’t a lot of documentation yet that explains this stuff.  I believe that the # sign tells PowerQuery that this is a data type, the “date” portion determines what type it is, and then we feed it the components that make it up (for each row).  I also changed the name at the top of the dialog to “Date” and clicked OK:

image

Step 5: Force the data type

I’m not sure why this is, but even after declaring this as a date, it formats the column visually as a date, but the data type is left blank.  For whatever reason, I had to select the column and force the data type to a date in order to make it recognize it properly as such.

Step 6: Cleanup

So this part is optional, but you can now delete the Day, Month and Year columns if you don’t need them any more.  Unlike standard Excel, this move won’t leave you with a bunch of #REF! errors down the Date column.  :)

Implications/Modification

The implications of this are somewhat interesting to me.  Let’s assume that your data goes the other way.  I send you a file in MM/DD/YYYY format, and you use DD/MM/YYYY format.  You should be able to follow all of the above steps, with the only difference being which column you identify as which when you rename them.  (In this case it would go Month Day Year after splitting them.)

I’d love to get some feedback from others on this, but I believe the formula in the custom column should work for you without modification as long as the columns are named correctly.

I’ve attached two files to this post for testing purposes.  Right click the links below and SaveAs to get copies to play with:

Try them both, you should be able to generate a correct date for your format from either using these steps.

PowerXL Course Live in Victoria, BC

We are very excited to announce that we will be hosting an “Introduction to Power Excel” session in Victoria, BC on June 6, 2014.

PowerPivot is revolutionizing the way that we look at data inside Microsoft Excel. Allowing us to link multiple tables together without a single VLOOKUP statement, it enables us to pull data together from different tables and databases where we never could before. But linking data from multiple sources, while powerful, only scratches the surface of the impact that it is making in the business intelligence landscape. Not only do we look at PowerPivot in this session, but we’ll also explore the incredible companion product Power Query; a tool that will surely blow your mind. Come join Ken as he walks you through the process of building a Business Intelligence system out of text files, databases and so much more.

Full details, including an early bird signup offer, can be found at http://www.excelguru.ca/forums/calendar.php?do=getinfo&e=42&day=2014-6-6