June 2016 Power Query Update

Yesterday, Microsoft released the June 2016 Power Query update.  Even though there are only four items on the list of new features, some of them are quite impactful.

What’s new in the June 2016 Power Query Update

The four new features are:

  • Conditional Columns
  • Column type indicator in Query Editor preview column headers
  • Reorder Queries and Query Groups inside Query Editor via drag and drop gestures
  • Query Management menu in Query Editor

Microsoft has a blog on this here, but let me hit these quickly in reverse order to give my comments as well:

Query Management Menu in Query Editor

Honestly, to me this is kind of a throw away waste “button for the sake of a button” kind of feature.

image

Does it make things more discoverable?  Maybe.  But we can get to all these features by right clicking the query in the Queries pane on the left of the editor.  Personally, I would have rather seen them give me a feature to “pin” the Queries pane open and set that as a default, as I find the navigation from that area much more useful:

image

 

Reorder Queries via Drag and Drop

This is great… so great in fact, that the only real question is why it hasn’t worked in the past.  Time & resources is the answer, but it’s now working the way you’d expect it to work.

image

PS, if you don’t know how to group your queries, right click on one, say “Move to Group” and select New Group.  Pretty handy for keeping things organized.

Column Type Indicator

This is BY FAR the most important of the upgrades.  The reason is that this has been a deadly area of weakness since day one.  If you’ve ever been burned by an “any” data type, you know why.  And if you haven’t… hopefully this will help ensure you don’t.

We can now plainly see which columns have been defined with each data type:

image

Notice how easy it is to tell that the “Client, Task and Notes” fields are text (as shown by the ABC icon in the column header.)  Hours is a decimal number, rate is a whole number, and Date… is undefined.  That one needs attention as indicated by the question mark.  Very visual, and very badly needed for  a long time.  This one feature is, in my opinion, worth the upgrade.

Conditional Columns

This is also a pretty cool feature, as it lets a non-coder build an if then else (if) statement.  Full caveat here: this is the image from the official Microsoft blog, not one of mine, but it shows you the general idea:

June-2016-updates-for-Get-Transform-in-Excel-2016-1.png (1282×809)

As cool as this is, there are some issues here:

  1. You can only feed out full columns as outputs, not formulas/equations.  So if I wanted to check a column and return [Hours]*[Rate] in one case and [Hours]*1.5*[Rate] in others, it won’t work.  (Instead I’ll get text.)  To do that you’ll still need to write your formulas manually.
  2. You can’t provide IFERROR style logic to check if something errors and react accordingly.  To do that you’ll still need to create your own custom column formula using the “try otherwise” formula.
  3. Assume you created a custom column using the “Add Custom Column” button, and manually wrote your “if then else” formula.  You then committed it and want to change the logic, so you click the gear icon in the applied steps window… and you’ll be taken to the Conditional Column interface shown above, not the original window where you can create more complex logic.  So if you want to modify that formula to be more complex than this new interface allows, you’re now going to have to go to the Advanced Editor window.  I have suggested to Microsoft that they need a button to return is to the previous interface for this scenario.

Despite the shortcomings, we should recognize that this is a great new feature.  You can test if one column compares (match, doesn’t match, greater than, etc) another column or specific value without having to manually write any M code formulas.  You also aren’t obligated to feed out a column’s value, but rather can feed out text or values too.  So as long as your logic needs are fairly simple, you can use this feature.

Download the June 2016 Power Query update

You can pick it up from Microsoft’s site here:  https://www.microsoft.com/en-ca/download/details.aspx?id=39379

Also, I’ve started holding on to the previously released installers should you ever need to regress to a prior version.  You can find the installers I have in my forum here:  http://www.excelguru.ca/forums/showthread.php?5745-Installing-Power-Query

Display Last Refreshed Date in Power BI

One of my favourite tricks to use on a dashboard is to show the Last Refreshed Date.  This is important as it lets you know how stale your data is.  I think it’s even more important with Power BI, as it tells you if your automatic refresh is actually working!

Last Refreshed Date for Power Pivot

I’ve actually covered this topic in the past for Power Pivot solutions. You can find that article here if you’re interested.

Display the Last Refreshed Date in Power BI

Generate Last Refreshed Date with Power Query

Unlike the previous article, to work with Power BI, we need to generate the Last Refresh date ourselves using Power Query.  No big deal, it is as simple as this:

  • Open PowerBI Desktop
  • Get Data –-> Blank Query
  • Go to Home –> Advanced Editor and replace the code in the window with this:

let
Source = #table(type table[LastRefresh=datetime], {{DateTime.LocalNow()}})
in
Source

  • Click Done and rename the query “LastRefresh_Local”
  • Click Close & Apply

This code makes a nice little table that returns the current date and time each time we refresh the solution.

Create the Last Refreshed Measure

To finish it off, we need to create a simple measure and add it to our dashboard.

  • Go to your Report window –> New Measure –> Define it as follows:

Last Refreshed (Local) = FORMAT(LASTDATE(LastRefresh_Local[LastRefresh]),"mmm dd, yyyy hh:mm:ss AM/PM")

  • Now let’s create a Card visual to hold it:

image

So far everything looks really good here.  The card is formatted nicely, and I updates to the current date and time every time I hit the refresh button.  It’s a thing of beauty!  What could possibly go wrong with this?

Publishing to the Power BI Service

Encouraged with our visual, we’ll now publish it to Power BI:

  • File –> Publish –> Publish to Power BI
  • Login to the Power BI service and scroll down to the Datasets area

At this point you should click the little … to the right of the dataset you uploaded (whatever you called it) and click Refresh.  It will prepare for a bit, then should refresh.  If your experience is anything like mine, your report shows this:

image

At first, you think “Hey cool!” until you realize that this is the data – to the second – of your original upload.  So even after you refresh the data set, you need to refresh the report:

image

And when you do, you get this:

image

Um… WHAT?  That’s 7 hours different!  How is this possible?

So I’m in the Pacific time zone, which is currently UTC-7.  (We are UTC-8 when we are not on daylight savings time.)  So apparently the date that is showing here is giving me the UTC date.

This really sucks, obviously. I was really hoping that the dates and times would show based on my browser settings or something, but obviously not.

I tried a bunch of things to fix this, including forcing the column to a DateTimeZone data type, but that made the DAX formula choke, as it only accepts DateTime data types.  I tried various combinations using DateTimeZone.UtcNow(), DateTimeZone.LocalNow(), adding periods, subtracting them and more.  The killer though was that as soon as the time zone got removed (which is necessary for the DAX measure to work), Power BI displayed the time assuming it was UTC. So the issue I’m seeing here is that – no matter the DateTime you feed Power BI – it thinks it has the same time zone as the server.  Not so good.

How to Display the (Correct) Last Refreshed Date with Power BI

Just to be clear here, the title above is actually a heinous misrepresentation.  The timestamp was correct, it’s just being interpreted incorrectly by Power BI (but not Power BI desktop).  So I had to fake it out.

I reached out to Chris Webb for his thoughts, and he suggested hitting an API.  I’d already tried looking for a page, but picked up on this API which turned out to be quite useful:  http://www.timeapi.org  Even more interesting is that both www.timeapi.org/pst and www.timeapi.org/pdt return the same thing.  I’m hoping that this means it is going to allow me to survive daylight savings time with no adverse impacts…

So armed with that, I decided that the only way I was going to get this to work was to pull back the UTC offset, and add it to my date.  That effectively returns a bogus date that Power BI will represent in UTC to match my time zone.  Complicated, no?

Collecting the Current PST Date/Time

I don’t think I’m going to clearly document all the steps for this, as that would make this article really long.  Instead, here’s how you can implement this:

  • Create a new Blank query and paste this code in the Advanced Editor

let
Source = Web.Page(Web.Contents("http://www.timeapi.org/pst")),
Data0 = Source{0}[Data],
Children = Data0{0}[Children],
Children1 = Children{1}[Children],
#"Removed Other Columns" = Table.SelectColumns(Children1,{"Text"}),
#"Inserted Last Characters" = Table.AddColumn(#"Removed Other Columns", "Last Characters", each Text.End(Text.From([Text], "en-US"), 6), type text),
#"Changed Type2" = Table.TransformColumnTypes(#"Inserted Last Characters",{{"Text", type datetimezone}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type2",":00",".0",Replacer.ReplaceText,{"Last Characters"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",":30",".5",Replacer.ReplaceText,{"Last Characters"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Replaced Value1",{{"Last Characters", type number}, {"Text", type datetime}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "LastRefresh", each [Text]+#duration(0,Number.From([Last Characters]),0,0)),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom",{"LastRefresh"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Other Columns1",{{"LastRefresh", type datetime}})
in
#"Changed Type"

  • Click Done and rename the query as “LastRefresh_API”
  • Click Close & Apply

The basic gist is that it does the following:

  • Pulls the web page and drills into the single cell we need
  • Splits off the time zone information, replacing the minutes with fractional hours and turns it into a decimal number
  • Extracts the original date/time (without the time zone) and adds the time zone offset to that value.  This essentially subtracts 7 hours from the original date for me right now
  • Performs as bit of cleanup to drill into the faked date/time we need

(If you want to use this for your time zone, I believe all you should need to do is change the “pst” portion of the URL to your time zone name.)

Create a new Last Refreshed Measure

Creating this measure is pretty much the same as the last one.

  • Go to your Report window –> New Measure –> Define it as follows:

Last Refreshed (API) = FORMAT(LASTDATE(LastRefresh_API[LastRefresh]),"mmm dd, yyyy hh:mm:ss AM/PM")

  • And create a Card visual to hold it

You should now have this odd looking contrast:

image

Great, except that… it is 3:22 PM for me, not 8:22 AM.  But fine, whatever, let’s publish this and see what happens.

Publishing to the Power BI Service

To publish the file, we take the following steps again:

  • File –> Publish –> Publish to Power BI
  • Confirm that we want to overwrite the data set
  • Login to the Power BI service and scroll down to the Datasets area
  • Click Refresh Now

At this point you might be told you can’t as you haven’t provided credentials to the data set.  Just Sign In as Anonymous and you’ll be fine:

image

And of course, now we can refresh the Report as well.  And when we do…

SNAGHTMLf308f8e

As you can see, the API version is showing the correct time here, not the Local version.

Takeaways

The biggest takeaway here is that you have to care about how your audience is using your files.  If you are going to deploy your reports by emailing *.pbix files around the company, the setting up a table using the DateTime.LocalNow() will work great.  But if you are going to publish your files to the web… that obviously throws in some wrenches.  Here’s a side by side view of those two alternatives:

SNAGHTMLf368679

Probably the most frustrating thing to me is that I was trying to find a solution that will show it correctly in BOTH Power BI Desktop and Power BI Online.  There’s nothing worse than working with a report you know is broken depending on where you look at it.  🙁

PBIX File

If you want the file so that you can try it out in your own PowerBI service… here you go.  🙂

New Vancouver Power BI User Group

I’ve been thinking about this for a while and, after discussing it with a couple of others who are passionate about Power BI…  I’m pleased to announce that we have created a new Power BI User Group in Vancouver, BC!

What is the Power BI User Group about?

The goals of this user group are fairly simple:

We plan to meet monthly, and have a presentation on using Power BI technologies.  (This could be Power BI Desktop, Excel, Power Query or Power Pivot.)  Whatever it the presentation, and no matter how focussed it is on a specific area, it will ultimately be relevant to the over-arching Power BI path of taking your data from raw form to a published dashboard.  This user group is basically dedicated to bringing you content to inspire you and make you an expert in the Power BI technologies in your company.

Our secondary goal is to be a networking group for Power BI professionals.  If you’ve ever felt like the only one in your company that actually understands what you do… well that’s why we are here.  To give you someone to swap stories with, get ideas and maybe even change your career goals.  🙂

Oh… and did I mention that another goal we have is to keep these events free for attendees?

How can you get involved?

There’s actually a few ways you can get involved with us…

If you’re looking to attend…

Then it’s simple.  Sign up at our Meetup site.  Then attend a meeting. That’s it.  No cost, no fuss.  All we ask is that you register in advance and attend if you say you’re coming.  (We have limited space in our venue right now, only able to seat about 25 people.)

If you’d like to sponsor the event…

We are looking for a sponsor to cover the cost of pizza and soft drinks for our user group attendees.  It shouldn’t be much, and we’d be happy to tell everyone how awesome your company is.  If you’d like to come on board as a sponsor, please get in touch with me via my contact form.

If you’d like to speak…

Got something cool that you’ve built using the Power BI technology stack?  Would you like to talk about how to actually get Power BI traction in a corporate environment?  Got some other relevant topic that you’re passionate about?  Come to an event and chat with us.  One of our key goals is to make sure we have good variety in our speakers!

When is the first meeting?

Great question!  We’re going to be meeting Thursday, July 14 at 5:30pm in downtown Vancouver.  I’ll be presenting on how to build this self updating Power BI dashboard which is originally sourced from PDF files.

Keep The Most Recent Entry

This week’s post was inspired by a question in my Power Query help forum.  The poster has a set up data, and needs to keep the most recent entry for each person from a list of data.

Background

I never saw the user’s real data, but instead mocked up this sample, which you can download here:

image

Obviously it’s fairly simple, and the key thing to recognize here is what we’re after.  What the user needed was this:

image

As you can see, we want to keep the most recent entry only for each person.  In the case of Fred and Mark that is Mar 31, but Jim didn’t have any activity for March, with his last entry being Feb 29.  So how do we do it?

1st attempt to keep the most recent entry

I figured this was pretty easy, so advised the poster to do the following:

  1. Pull the data into Power Query
  2. Sort the Date column in Descending order
  3. Use the Remove Duplicates command on the Student column
  4. Give the query a name (I called mine “Unbuffered” for reasons that will become clear)
  5. Load it to the worksheet

Easy enough, right?  Except that we actually got this:

image

 

 

 

Huh?  What the heck is going on?  I tried changing the dates to text in an attempt to steal away Power Query’s ability to sort based on dates.  (Okay, it was a shot in the dark, and it didn’t work.)

As it turns out, the “Table.Distinct” command that is used to remove duplicates IGNORES previous sorts, going back to the original data sort order.  I’ll admit that this completely shocks me, and is not at all what I’d expect.

So how do you keep the most recent entry?

There’s a few potential ways to deal with this:

  • Sort the data before it comes into your query.  This could potentially be done in a staging query, via a SQL sort command or some other method.  The challenge is that this isn’t always practical (using that custom SQL query breaks query folding, right?)
  • Issue some kind of command (like a group by) that creates a new table which is already sorted in the correct order.  Again, this would work, but really seems unnecessary unless you have some other need to do so.
  • Sort the table, then buffer it before removing duplicates.

Huh?  “Buffer” it?

Using Table.Buffer() to help keep the most recent entry

I’m not a master of explaining Table.Buffer() (yet), but basically you can look at it like this:  It pulls a copy of the table into memory, preventing Power Query from recalculating it.  This can be super useful if you’re passing tables to functions, but in this case can help us lock down the previous query steps before applying the duplicate removal.  When the query state is buffered, that is the “most recent” copy that Power Query will revert to.

Rather than adjust the previous query, here’s what I did in order to create the working solution:

  • Duplicated the “Unbuffered” query
  • Renamed the new query, calling it “Buffered”
  • Selected the “Sorted Rows” step we generated (just before the “Removed Duplicates” step
  • Clicked the fx icon in the formula bar

image

As I’ve mentioned a few times on this blog, this creates a new step that simply refers to the previous query step.  I then just wrapped the text for the new Custom1 step in the formula bar with Table.Buffer():

image

 

And when you hit Close & Load, you get a different result that our previous query… you get the result we actually wanted:

image

So what’s happening here?

First, just to be clear (before Bill or Imke call me out on this), inserting the new step wasn’t entirely necessary.  I only did this to demonstrate the key difference in a distinct step of it’s own.  I could have easily just wrapped the Sorted Rows step in Table.Buffer() and it would have worked fine.  🙂

The key difference here is that the Table.Distinct() command we use the Removed Duplicates step will go now only go back so far as the buffered table.  From the Excel user’s perspective, it’s kind of like we’ve been able to copy all the steps before this, and lock them in with a PasteSpecial command, and point Power Query to that version of the data instead of looking back at the original.

Cool!  I’m going to use Table.Buffer everywhere!

Um… don’t.  That’s actually a really bad idea.

Table.Buffer() needs to be used with a bit more care than that.  Every time you buffer a table, it needs to be processed and written into memory.  That takes resources.  You only want to use this command when it makes sense.  Some places where it does:

  • When you need to lock down previous steps to prevent things being ignored, like in the case above
  • When you want to pass a table to a function.  If you don’t buffer it first, the table may get re-calculated/refreshed before being passed into the function.  If you’re doing this for every row, that can be a lot of re-calculations.  In this case it may make sense to Buffer the table, then send the buffered table into the function.  Even though the buffering takes overhead, it only happens once, which can speed things up

Just also remember that the instant you buffer your table, you break any query folding ability, as the data is now in Excel’s memory space.  Something that is worth consideration if you’re doing large operations against a database.

I Need Beta Testers

Today’s post isn’t of a technical nature, although it does have a pretty long technical background… I’m now at the point where I need beta testers for a new add-in I’m planning to release.

What Does the Add-in Do?

In short, this add-in is intended to help you do two things:

  1. Inject useful Power Query scripts into your workbook.  I’m working on adding more later, but basically my vision is to provide modellers with Rapid Application Development toolsets in order to take the grunt work out of model setup, allowing you to focus on real work.
  2. Audit Power Query scripts.  If you work with Power Query today and build complex models, you’ll know that there is no easy way to trace your queries through complex models.  I also aim to fix that for you.

And one of the MAJOR design points I follow here is this:  Your end users will NOT need my add-in in order to refresh any solutions built using my tool.  All the scripts are embedded in the workbook and are completely independent of my tool.

Current Features

Here’s what the user interface looks like today:

image

Parameter Table

I use parameter tables in virtually every solution I build (as we demonstrate in Chapter 23 of M is for Data Monkey.)  For that reason, the Add Parameter Table/Query feature is one of the first I hit.  It inserts a new worksheet, adds the required table (including the formula to return the current file path), and also injects the correct queries (set up to avoid the formula firewall.)  From there, your only job is to call the fnGetParameter() where needed in your queries.

Look at all those Calendars!

There are two methods for each of the calendars, but both are predicated off a core concept.  We inject a table into your workbook where you can control the key properties using the Excel formulas you know and love:

image

We also automatically inject the required Power Query scripts, and even link it to a worksheet or into Power Pivot’s data model, if you choose that option.  The “Pre-Defined Columns” version just loads a calendar with the columns we choose, the “Dynamic Columns” also inserts a Excel table which allows you to toggle which columns you want to include (and allows you to change your mind at any time.)

I’ve put a LOT of time into building these up to try and cover the most frequently used columns.  I know there are still some holes (like ISO week numbers and InCurrentPeriod for the 4-4-5 variants), but hopefully you’ll find them useful anyway.  Believe me when I tell you that this will be WAY more efficient than trying to write these yourself.

Copy Queries

This adds a quick interface to copy queries from one workbook to another:

image

 

Show Load Destinations

Ever wonder where your queries end up?  This view helps indentify where the loading end points are for your queries:

image

Trace Precedents/Dependents

This is something that I have found particularly useful so far.  There is still a LOT more I want to do with it, but as it stands today this tool will help you walk through the various queries to see what feeds into a query and where it goes; something that can only be done via reading your M code manually today.  Here’s a view of the dependency tracker, which shows all dependent queries, right through to the data model/worksheet table that it lands in:

image

 

Supported Excel Versions

This tool has been designed to run in both 32 and 64 bit versions of Excel 2016.  I know, you want it for 2010/2013, and so do I.  The challenge is that there aren’t any hooks into Excel that I can use to do this in 2010/2013.  It’s a bummer, but I don’t ever see that changing.  🙁

Anyway… I need beta testers!

I’m looking for a small crew of beta testers for this product.  But there are some requirements:

  1. You MUST have Excel 2016 (I’m actually interested in both subscription and non-subscription users here.)
  2. I need to know your Excel “bitness” (found under File –> Account –> About  Excel

image

In addition, by participating, you are acknowledging/agreeing that:

  1. You understand that this is BETA software and USE IT AT YOUR OWN RISK.  (I.e. run it on a backup copy of your workbook, not your main one!)
  2. You’ll submit any bugs you find to me so I can fix them. 🙂
  3. I have the rights to say yes/no to your inclusion based on my reasons/whims/whatever and I don’t have to disclose what they are.

Understand that my basic goal here is to get a good cross section of systems to test on.

What do you get?

Access to the add-in and any updates I release for at least a year.  I am looking at making a free/pro version, so you’d be entitled to the pro version.

How do you sign up?

UPDATE:  I'm closing this offer at this point, as I believe I've got a good cross section in the comments section.  Watch this blog to see when I release a public version.

Leave a comment below with the following info:

  • Your Excel SKU (found in File –> Account (this could be Office Pro Plus, Microsoft Office 365 ProPlus, or something else.  It’s found right under the big Office logo on that page)
  • Your Office “bitness”
  • Your operating system

Do NOT post your email.  I can access those through the blog control panel, and don’t want you getting spammed.

How do you know if I said yes?

I’ll email you and let you know.  🙂

Cartesian Product Joins (for the Excel person)

While I was at the PASS BA conference in San Jose, CA last week, I got an email from a reader asking if Power Query could create a Cartesian Product join.

Now I’m an Excel guy, and I’d never heard this term before.  Fortunately, I got the email while I was sitting around the table with a few of my geeky friends, many of whom came from the database world.  This was cool as their answer wasn’t “What is that?”, it was “Why would you want to?”  (As it turns out, there are some VERY good uses for this technique.)

Regardless, mine is not to wonder why, but rather to see if things can be done.  And, as you might expect, we can absolutely create a Cartesian Product (or Cartesian Square)using Power Query.  And actually, it’s REALLY easy when you know how.

Cartesian Product for the Excel person

So what the heck is a Cartesian Product anyway?  (Besides being really hard to spell!)

Picture you have two lists:

  • Automobile make
  • Paint colours

Plainly the two are not related in any classic kind of term.  (How do you match red to Dodge Ram?)  But assume that we can paint any vehicle we have any colour of paint we have.  Of course, that’s done at the factory, so we need to make a product list that shows all of our possible combinations:  Dodge Ram – Red, Dodge Ram – Blue, Dodge Ram – Black, etc…

So basically, for each row on the Vehicle Make table, we need to assign every colour that exists in the Paint Colours table.

If you’d like to read more about this join/math, there is a good article on Wikipedia explain it.

Creating a Cartesian Product in Power Query

To illustrate this, I’m actually going to use a deck of cards, as shown in the Wikipedia article I referenced above.  So we have two tables, as shown below:

image

(That’s table “Cards” on the left and table “Suits” on the right)

And now, what we want to do is create a join so that we get each suit assigned to each card.  (We could do this the other way around too, or we could just sort it after.  Either way gets us to the same place in the end.)

Setting up the Tables

So the first step is to set up the tables.  To do this we simply pulled each table into Power Query and set up the query as a connection only query.

  • Click in the appropriate table
  • Create a New Query –> From Table
  • Right click the only column –> Change Type –> Text
  • Home –> Close & Load –> Close & Load To… –> Only Create Connection

We should now have two queries in our workbook that are pointers to the underlying data:

image

Now, let’s set up a new query that references the cards query:

  • Workbook Queries Pane –> right click Cards –> Reference
  • Rename the query to “52 Card Deck”

Awesome, we’ve now got a simple query all ready to go:

image

Creating the Cartesian Product

The trick now is to create the Cartesian Square.  You’d think this would take some weird Voodoo magic, but it’s actually SUPER simple… just different than normal.  We can’t fall back on the whole “merge tables” experience, as we’d need to pick matching values between the columns… and their aren’t any.  For this reason, none of the join types I discuss in either of these articles will work:

So how do we do it?  Like this:

  • Add Column –> Add Custom Column
  • In the formula area, enter “Suits” (with no quotes)

Did you see what we did there?  We asked Power Query to provide the Suits table for each row of our cards table.  The result is a table of tables which – when you click in the whitespace beside the Table keyword – you can see contains our suits:

image

The final step is to click that little expand icon on the top right of the Custom column (clear that default prefix checkbox as you do) to expand those records.  And the result is a completed table where each card has all four suits.

image

 

 

Not bad… no need to write any funky formulas, fill up or anything.  🙂

Sample Workbook

If you’d like to download the sample workbook, you can find it here.

(Now I just hope that when I want to find this article that I can remember how to spell Cartesian correctly!)

32 Bit Excel Memory Limit Increase!

So this is just huge, especially if you work with Power Pivot models and are stuck in 32 bit Excel… Microsoft has just released a 32 bit Excel Memory Limit increase for users of Excel 2016, effective build 16.0.6868.2060 (which is the current build for the Insiders program.)

image

UPDATE:  Effective June 7, 2016 (and build 15.0.4833.1000), there is now a patch available for Excel 2013 (both MSI and subscription versions).  More info here:  https://support.microsoft.com/en-us/kb/3115162

Why a 32 Bit Excel Memory Limit increase?

Users stuck on 32 bit Excel were limited to only using 2GB of RAM for their Excel/Power Pivot models, no matter how much memory was available on the PC.  The answer to this in the past was to install the 64 Bit version  of Excel, as that could address up to 8 TB  of memory (if you had it, of course.)

There has been a hack/patch available for a while, (see below,) and I spoke to a user at the PASS BA summit who told me that without that he simply couldn’t use Power Query at all.

How big an increase is it?

Before you start thinking that you’ll now get the same memory access as with 64 bit Excel, let’s disabuse you of that notion.  It’s better, but not parity.  How much you get actually depends on the bitness of your operating system.

  • 32 bit Windows:  up to 3 GB
  • 64 bit Windows:  up to 4 GB

I suspect the first is an operating system limit and that the second is probably more of an internal architecture decision.  The world needs to move to 64 bit, but this will help give companies (even more) time to make that move.

What about non-Power applications?

This change doesn’t just benefit Power Pivot and Power Query; it benefits anyone who has been running into memory constraints.  So if you’ve been running out of memory because you’ve been pushing huge data sets via VBA/SQL, you’ll love this too.

How about Excel 2010/2013?

Yeah, no.  Sorry.  This is part of the benefit of being current… Microsoft is building for the current version of Office.  Excel’s biggest competitor is previous versions of Excel, so by providing a fix like this to a prior versions they’d actually be giving you reasons NOT to upgrade.  You’re in business, and I’m sure you understand that – as much as this sucks for you right now – you’d probably make the same call.

Having said that, if you want to install “the patch” to get your access in previous version, Rob Collie has a link to it in point 3 of this article.

The “Official Word” from Microsoft…

You can find that be reading KB3160741 for more details.

Map Columns Between Data Sets

Image

Ken is at the PASS BA conference this week, so it seemed like a perfect time for me to publish my first Power Query post here.  In this installment, I'm going to show how to map columns between data sets.

Exploring how to map columns between data sets

I’m on a great new project where Power Query is the bread and butter of the solution. We’re pulling design information from Engineering Design Systems and building transforms to load into another engineering application. This application has a very strict requirement for data layout. Needless to say, the data structures and field names are seldom consistent between the two applications, so a key part of the transform is to map columns from one data set to the other.

Generally, Power Query’s ability to insert, rename, and move columns is useful in a case like this, however we are doing this for a large number of different data transfers, developing the maps in an ongoing process, and I REALLY don’t want to rewrite the Power Query steps for every change in every transform. (Also being able to document the transform is important for design and debugging).

Here’s a simplified example:

Source Data: The Flintstones

The Flintstones Sample data in a small table provides the source of data to be mapped 1

Target Data Structure: M*A*S*H

What does M*A*S*H have to do with the Flintstones? Not much really, that’s the point. But I want to convert data to this layout.

2

The "Map Table": The key piece needed to map columns between data sets

I set a goal to write a power query transform that was agnostic of specific column names and field counts, and so would not use Table.AddColumn, Table.RenameColumns, Table.RemoveColumns, or Table.ReorderColumns operations.

Mapping Strategy/Assumptions:

  • Data will not necessarily end up in the same column order between Source and Target.
  • Not all the columns map from Source to Target.
  • Not all the Target columns will be filled from the Source.

The solution: A "Map" table on an Excel sheet; A simple list of Source field names and Target Field names (I like using a column format for readability).

3

The Transforms

The "TblMap" Transform

The query reads the "Map" table data and flips it around so that the Source names are the table’s column names:

4

The complete M code used for this solution is shown here:

5

The "Output" Transform

This query references the tblMap transform and appends the original source data, giving something like this:

6

Now just promote the first row to Headers, overwriting the existing column names, and the new Target data structure is in place:

7

Dealing with un-mapped columns

But what about those pesky un-mapped columns (Column7 and Column8)? Normally I would use Table.RemoveColumns. I don't want to do that here, though, as this would hard code column names into the M code that might not exist next time, resulting in errors.

Instead, we just transpose the table and filter out any columns that begin with “Column”, and transpose it back.  The complete M code for the query is shown here:

8

And here is the output in Excel once we load it to a table:

9

Closing Thoughts

So there you go. One of the best things I like about this approach is how flexibly it can be modified. Spell “Klinger” wrong? Just modify the spelling in the Map table. Forget to add Rizzo or Nurse Able to the Target? Just add them to the table on the Target side and they are in the result. Forgot to include Dino in the Source data? Just add him to the list.

The sample file is attached. Give it a try. Hope it can be useful.

Column Name Translate

A thought on Data Types

I have not done a lot of testing with data types on this approach. My work will not do any math on the contents until after the re-mapping (I hope), so data typing can be done at the end. If there is any math to be done in the middle of the process, you would need to be careful not to have power Query treat numbers as integers (this has bitten me before).

Performance

The last step where extra column names are removed uses a transpose which could be really slow for long data sets. Another solution that could fix this would be to create a list from the Map table to automate a RemoveOtherColumns function.

 

Pass Parameters to SQL Queries

One of the questions I get quite frequently is how we can pass parameters to SQL queries, allowing us to make them dynamic. Someone asked a question in the forum on the topic, so that finally inspired me to write it up.

Before you do this…

There are a lot of reasons that you should NOT go this route.

Power Query is designed to take advantage of query folding when targeted against a database.  While this isn’t technically correct, you can look at query folding like this… As you connect to the database and start filtering and grouping via the user interface, Power Query passes these commands to the database to execute.  This means that the majority of the heavy lifting is done at the database, not your workstation.  This continues to occur until you hit a command that the database doesn’t know, at which point it returns the data to Power Query, and the rest is done in your instance of Excel.

Why is this an issue?  Dynamic parameters use custom lines of code, which can’t be folder.  In the case of the poster’s question, this means that he just asked to bring 40 million records into Excel, rather than filtering them down on the server.  Take a guess as to which one is faster?

But what if I want to Pass Parameters to SQL Queries dynamically?

Listen, I totally get it.  I love dynamic queries too.  And anyone who knows me is aware that I’m super obsessed with making my data tables as short and narrow as possible.  This almost always involves filtering based on parameters that the user needs to pass at run time.  So are we dead in the water?  No, not by a long shot.

How to Pass Parameters to SQL Queries – Method 1

I’m a big fan of query folding, and would encourage you to use it wherever possible.  In order to do that, you should use the user interface to connect to the database and drive as many filter, sort and group by operations as you possibly can.  The goal is to push as much work to the server as possible, resulting in the narrowest and shortest data table that you can accomplish.  Once you have that, land it in a Connection Only query.  And from there use your dynamic parameters to filter it down further to get just what you need.

I have no idea what was in the 40 million row data set that the user was working with, but let’s assume it was 40 years of data related to 4 divisions and 30 departments.  Assume that our user (let’s call him Mark) wants to bring in the last 2 years data, is focussing only on the Pacific division, and wants to give the user choice over which department they need to work with.  For ease of assumption, we’ll assume that each year is 1/40 of the annual record load and each division provides 1/4 of the total records.  (Yes, I’m aware that nothing is that easy… this is an illustration of concept only!)

The recommended steps would be to do this:

  • Create the Staging query
    • Connect to the raw database table
    • Filter to only the Pacific Division – Server reduces 40m to 10m records
    • Filter to only the 2 years of data – Server reduces 10m to 500k records (10/40*2)
    • Land this output into a staging query – 500k records total
  • Create the parameter table and the fnGetParameter query
  • Create a query that references the Staging query and filters the department to the one pulled via the fnGetParameter query

That should take as much advantage as possible, and means that Power Query only needs to run the processing of 500k records against our dynamic criteria.

Where Method 1 breaks down

But what if the data set is still too big?  What if you need to parameterize the Division, the Date Range and the Department?  In order to avoid issues from the formula firewall, you would have to do this:

  • Create the Staging query
    • Connect to the raw database table
  • Create the parameter table and the fnGetParameter query
  • Create a query that references the Staging query and…
    • Collects the Division, Date and Department variables
    • Filters to only the Pacific Division
    • Filters to only the 2 years of data
    • Filters to only the selected department

Seems about the same, doesn’t it?  Except that this time we can’t take advantage of query folding.  To pass a parameter to the database, we have to separate it from the parameters in order to avoid the formula firewall.  This means that we break query folding.  And this means that Power Query needs to pull in all 40 million records, and process them.  Not the server, your Excel instance.

I don’t know how much RAM you have (and don’t care unless you’re on 64 bit), or how many processor cores you have (as Power Query is single threaded), you’re in for a LOOONNNGGG wait… if it doesn’t just tip over on you.

So how do we fully parameterize this stuff?

How to Pass Parameters to SQL Queries – Method 2

The good news that it can be done, the bad news is that you need:

  1. SQL Skills, and
  2. An adjustment to the default Power Query load behaviour

Let’s talk SQL Skills

The reason you need SQL skills is that you need to be able to write the most efficient query you possibly can, and pass this into the query when you connect to the database. (Thos italics, as you’ll see, are key.)  So, let’s assume that I want to connect to the old AdventureWorks database and pull records from the Sales.SalesOrderHeader table where CustomerID = 29825.  You need to be able to write this:

SELECT * FROM Sales.SalesOrderHeader WHERE CustomerID='29825'

Why?  Because you need to include that query when you’re building/testing your code.  It goes in the advanced options, below:

image

(You may need to trust the Native Database Query to get to the next step.)

So that created the following code for me:

let
Source = Sql.Database("azuredb.powerqueryworkshop.com", "AdventureWorks2012", [Query="SELECT * FROM Sales.SalesOrderHeader WHERE CustomerID='29825'"])
in
Source

Modifying the hard coded query to use dynamic parameters

I then set about fully parameterizing my query using the fnGetParameter query detailed here, and the table shown below:

image

Here’s the code I cobbled together to do this:

let
//Pull in a values from the parameter table
dbURL = fnGetParameter("Database URL"),
dbName = fnGetParameter("Database Name"),
dbTable = fnGetParameter("Database Table"),
sFilterField = fnGetParameter("Filter Field"),
sFieldValue = Text.From(fnGetParameter("Field Value")),

//Create the query
dbQuery = "Select * FROM " & dbTable & " WHERE " & sFilterField & "='" & sFieldValue & "'",

//Get the data
Source = Sql.Database(dbURL,dbName,[Query=dbQuery])
in
Source

 

I won’t go through the fnGetParameter function, as it’s quite well detailed in the blog post, but the key to understand here is that we are pulling a variety of items from the worksheet table, and putting them all together to feed the line starting with dbQuery.  This line dynamically sources the database path, database name and the SQL query.  Wouldn’t this be heaven if it worked?

Here comes the Formula Firewall again…

But no:

image

Now if that’s not enough to make you scream… and here’s the issue…

What I really wanted to do was create a power query that connects to the database without declaring the SQL query up front.  Something like this:

Source = Sql.Database("azuredb.powerqueryworkshop.com", "AdventureWorks2012")

My thought is that I could load this as a connection only query, then reference it and add the dynamic query afterwards.  Unfortunately, I got nowhere.  I’m not saying it can’t be done, but I couldn’t figure this out.  It seems that the only way to pass a query to the database is to pass it during the initial connection.  But doing so, by its very nature, violates the formula firewall.

So how can we get past it?

Bypassing the Formula Firewall

Yes, we can make this work.  But doing so involves turning off Privacy settings.  I’m not generally a big fan of turning off firewalls, but this one just aggravates me to no end. Normally we turn this off to avoid prompting messages.  In this case it just flat out prevents us from running.  So let’s kill it:

  • Go to Query Settings –> Privacy
  • Select Ignore the Privacy Levels and potentially improve performance

And at this point you’ll find out that it works beautifully… for you.  You’ll most likely need to have anyone else who uses the file set the above option as well.

Security Settings

The security setting we changed above is workbook specific.  Now I’m not recommending you do this, but if you get tired of the privacy settings overall, you can turn them off for all workbooks in the Query Options –> Security tab.  The second arrow is pointing at the relevant option:

image

And how about that?  Check out the first option… that one lets you avoid prompting about Native Database Queries.  (I seriously have to question why a SELECT query should trip a warning.)

Sample Workbook

Yes, I’ve got one, which you can download here, but it comes with a catch…

I’ve set it up to talk to a SQL Server in order to demo this, as it’s the big databases with query folding that cause us this issue.  You might notice that not too many people provide free access to a SQL server since they cost money to run.  For this reason, you need a login and password to refresh the solution and really see it work.

As it happens, if you’ve bought a copy of M is for Data Monkey, or attended our PowerQuery.Training workshop, then you’ve already got it the credentials. (They are on page 66 of the book.)  And if you haven’t… what are you waiting for?  🙂

Excel 2016: A fix and a new bug

Well, there’s good news and bad this time.  I just updated my Excel 2016 to 16.0.6868.2048 (First release version) and there is a fix and a new bug evident.

First, the fix

As I reported a couple of weeks ago, there was a new bug introduced in the April 2016 Excel update, as related to Power Query.

… if you are sourcing from a named range that doesn’t have an equal offset of rows/column.  I.e. if your source range doesn’t start in A1, B2, C3, D4, etc… then it pulls the wrong range.  Tables are fine, named ranges are the issue…

The issue was reported, fixed, and build 16.0.6868.2048 (which I finally got today) has fixed the issue.

I have to say that this is pretty cool.  Even though I was frustrated having to wait 2 weeks for a fix, the fact that it was only 2 weeks is pretty darned amazing.  In past cycles, this would have been several years until a new build of Excel came out. So even though we see new bugs, we also need to recognize that the team is working very hard to try and be responsive to them and get the fixes pushed VERY quickly.

Update: As of version 16.0.6868.2067 I can no longer reproduce this bug.  So that's more good news!

Unfortunately, a new bug

I’m not actually sure if this is new in a more recent update, or if it was there in the previous build and I just didn’t notice.  (While I do check updates almost daily, I don’t actually use every feature of Power Query every day.)  While this bug doesn’t prevent you from using your models, it is pretty irritating… Since I know this was a pretty major contention point for many in the past, I figured I should talk about it.

Once again, this is Excel 2016 specific, and doesn’t affect Excel 2010/2013.

So let’s assume you have some data as shown below:

SNAGHTML158303b7

Here’s what’s going on with these tables:

  • The blue table is our raw data.
  • The green table is a simple query that imports the original data, and sets the first column to a Date data type, then loads it to the table.  (Nothing fancy, it’s just that simple.)
  • The orange table was created by right clicking the green table’s query and choosing “Duplicate”, then loading it to the worksheet.  It is an EXACT copy of the query that leads to the green table.

Make sense so far?  Now, let’s add a couple of rows to the blue table, then hit refresh all.  What you should expect to get is this:

SNAGHTML1591b06d

What you actually get is this:

SNAGHTML1590e741[4]

Check out the green table… those dates are pretty impressively formatted as serial numbers, not dates.  But yet, the orange table – an exact duplicate – is fine.  Huh?  Doesn’t this feel like a throw-back to early Power Query days, where tables didn’t hold the formatting properly?

Here’s the best we have from a temporary workaround point of view (courtesy of the Excel/Power Query team):

  • Select the green table –> Table Tools –> Design –> Properties
  • Un-check the Preserve Cell Formatting box

image

When we refresh now, we get…

 

SNAGHTML1591b06d[5]

And let’s just add some more data, then refresh again to make sure it sticks…

SNAGHTML15953acd

Stellar!  The number formats have remained, but the table style formatting has changed to a different one.  Ugh.

Now, it IS still a table.  But unfortunately the style and the number formats all seem to be controlled by that one selection.  So until they fix this, it appears that you can either have your tables pretty, or you can have your number formatting correct.

Or maybe you can create your query, immediately duplicate it for your reports, then delete the original, as the second one seems to behave properly.  (I have no idea why this is.)

Final thoughts on “a fix and a new bug”

The subscription model is a new thing for us, and personally, I’m pretty high on it, despite these kinds of issues.  My hope is that – with the connections I have at Microsoft – that I’m in the first ring of testers, and can get this stuff fixed before it hits you.  I’d highly suggest you also have one person in your company in the “First Release” program for this reason as well.

My understanding of this method is that the fixes we get into the First Release band are implemented before that version is shipped to the General Release band of users.  That’s a good thing, as the last thing we want to see is our end users having to experience two months of the first issue listed here!

With regards to the new bug, I’ve again reported this to the Power Query team.  They’re aware, and we are having some active dialog about it.  I know they are going to fix it, but I’m not sure how soon.  (I really hope it’s as quickly as the last one, as this is pretty visible!)