Sharing Power Query Solutions

One of the attendees at today’s Power Query Workshop asked about a specific scenario today.  His company doesn’t have a solid IT infrastructure for sharing data, and he’s trying to figure out the best method for sharing Power Query solutions with a colleague so that he’s not the only one with the power to update.  Let’s take a look at this and explore a solution.

Background

The scenario is based around high level soccer (football) players.  The National Team coach wants a report every week on the status of the current National Team players. For instance he wants to know things like - have they been injured, how many training sessions they’ve done, their training data (GPS, Heart rates etc), and other wellness information.

Of course, the players play in different clubs, each of which has an analyst who records this information into their team database on a daily basis.  Our client then uses Power Query to aggregate the data from these different team databases with the national database, to push out their dashboard.

Current Solution Architecture

Because each team is independent, they manage their own player databases.  Our client set up a system where those teams upload their databases to a DropBox folder on a weekly basis so that he can get access to their databases.

He then installed DropBox on his local PC, allowing him to sync those databases inside the corporate firewall.  The reason this is important is that he now has “local” access to their databases, as well as access to his own corporate database.  Visually, the system looks like this:

image

The Issue

Now this is great, all works well except for one issue… with the exception of the corporate database, all the databases needed in order to update the solution are stored in our client’s personal DropBox folder, and he uses a hard coded path to grab the data.  Using DropBox, that path is something like this:

C:\Users\<username>\Dropbox\Solution\filename

Why is that an issue?  Because he wants to go on vacation, and the dashboard needs to be updated while he’s gone.  Short of sharing his password, how can he fix this?

Goal

We need to figure out a way to have the file path dynamically updated to the correct username.  But the problem is that there isn’t an Excel function that returns the Windows username used in the DropBox file path.  Power Query also can’t query the Windows username.

But guess what?  Several years ago I just happened to publish a VBA routine to work out the Windows username.  You can find that article right here.

Sharing Power Query Solutions

Solution Adjustment Overview

Here’s the overall steps we’ll need to do in order to adjust his solution:

  • Invite his colleague to the DropBox folder
  • Make sure his colleague installs the DropBox app on his work PC so that the data syncs
  • Add the VBA function listed above to the dashboard
  • Implement a Parameter table to dynamically change the path to the DropBox folder

We’ll take it for granted that he’s taken care of the first two steps

Adding the VBA Function

The first step we need to do is add the VBA code.  That’s fairly straight forward, we just need to follow these steps:

  • Press Alt+F11 to open the Visual Basic Editor
  • Locate the solution workbook in the Project Explore
  • Right click the solution name –> Insert –> Module
  • Paste in the code from the article as listed here.  (If the module you created has the line Option Explicit at the top already, then avoid copying that line from the article.)
  • Close the visual basic editor

We can test that the function worked by entering the following in an Excel cell:

=GetName(“Windows”)

Add the Parameter table

Based on the steps listed in this blog post, we need to:

  • Create a new table parameter table as follows:

SNAGHTMLb689aae

The final column here isn’t required for the end solution, but shows how I set things up on my side to test this out:

  • The VBA function returns my username
  • The DropBox Root shows the root folder to my DropBox location, dynamically created with my username
  • The DropBox folder is a test folder I set up for this purpose (but would be replaced with the name/path to the shared folder)
  • The Filename is a simple file I set up for my test

Of course, we shouldn’t forget to copy in the fnGetParameter function as listed in the article either, as this is pretty important to make the solution work.

Adjust the queries to use dynamic paths

Now, I didn’t have the full set of logic from our client, so I cooked up a simple example to emulate this.  Basically I just connected to the text file in my test folder, so my M code looks like this:

let
Source = Csv.Document(File.Contents("C:\Users\Ken\Dropbox\Test\Test.txt"),null,"",null,1252)
in
Source

At this point, I simply started substituting in variables into the code using the fnGetParameter function:

let
dropboxroot = fnGetParameter("Dropbox Root"),
dropboxpath = fnGetParameter("Dropbox Folder"),
filename = fnGetParameter("Filename"),
fullfilepath = dropboxroot & dropboxpath & filename,

Source = Csv.Document(File.Contents(fullfilepath),null,"",null,1252)
in
Source

And at this point, the query is dynamic.  When I save my workbook it will refresh on his side, providing he’s saved the text.txt file in his DropBox\Test folder.

Adjusting this for Databases

Our client isn’t actually retrieving data from a text file, but that doesn’t really matter.  In order to make this work, for each of his queries, he needs to do three things:

First, he needs to copy the first four lines after the let statement above into his query.

Second, he needs to update the filename to refer to the name of the database he’s connecting to:

filename = “FootballClubOne.Accdb”,

Third, he needs to locate the hard coded file path in his query and replace it with the fullfilepath variable.

And that’s it.  At that point, the file path is dynamically set.  Providing his co-worker has set up DropBox, the solution should refresh for him as well.

Split Power Queries

There was a really cool new feature added in the latest Power Query update:  The ability to split Power Queries.  This is something that has always been possible by editing the M code manually, but this makes it super simple.

Where this is useful

Where can this be super helpful is when you’ve built a nice query to reshape your data.  It runs fine for a bit, then you realize that you need to create a larger data model out of the data.  For example, assume we have this data:

SNAGHTML380197d

And we run it through these steps:

image

To come up with this:

SNAGHTML383a207

All of this is fairly easy as Power Query goes, but now the model needs to grow.  In order to expand it, we also want to create a table of unique Inventory Items and a table of unique Sales people.  Basically we want to de-aggregate the data that should have come in from separate tables in the first place.

Methods to Split Power Queries

Like always, there are a variety of ways to do this.  You could create new queries to load the data from the original table, then cut it down to the columns needed in each case.  But that causes and extra load requirement.

You could manually cut the code up to the step required, create a new blank query, then reference the new query from the previous.  But that takes some knowhow and tinkering that many people won’t be comfortable with.

Starting in Power Query version 2.26 (released today), we have a MUCH easier way.  Let’s assume that we want to split this query right after the Changed Type step, so that we can create an Items table and a Saleperson table in addition to the Transactions query that we already have.

How to Split Power Queries – the easy way

To start, we need to understand the steps and what they give us.  We can step through each step of the query, and find the step that gives us the jumping off point we need.  In my case, this is the Changed Type step.  We then right click the step AFTER Change Type, and choose Extract Previous:

SNAGHTML38c7f32

You’ll be prompted to enter a name (I’ll use “Base Load”), and upon clicking OK, you’ll see a couple of things happen:

  • A Base Load query is created
  • The Queries Navigator opens on the left, showing you’ve now got multiple queries
  • The Transactions query (the one I’ve been working on) gets shorter
  • The Transactions query’s Source Step gets updated to #”Base Load”

 

 

You can see these changes here:

SNAGHTML38fe6c7

So the Transactions query still contains all the subsequent steps, but the Source step changed, and the Changed Type step is now in the Base Load query:

image

The biggest piece of this whole thing, however, is that the Base Load query is still pointing to the raw source table, but the Transactions query now loads from Base Load, NOT the original data source.  So it’s following the staging/loading approach I talk about in this blog post.

Now, how can we use this…?

Making Use of the Split Power Queries

So far, very little is different to the overall goal, except that we load in two stages.  Let’s change that by creating a new query that references the Base Load query:

  • Right click the Base Load query in the Query Navigator (at left)
  • Choose Reference
  • Change the query name to Salespeople
  • Right click the Sold By column –> Remove Other Columns
  • Select the Sold By column –> Home –> Remove Duplicates
  • Right click the Sold By column –> Rename –> SalesPerson

And we’ve now got a nice query that shows our unique list of sales people:

SNAGHTML396d11a

Now let’s build the SalesItems table:

  • Right click the Base Load query in the Query Navigator
  • Choose Reference
  • Change the query name to SalesItems
  • Right click the Inventory Item column –> Remove Other Columns
  • Select the Inventory Item column –> Home –> Remove Duplicates
  • Right click the Inventory Item column –> Rename –> SalesItem

And this table is done now as well:

SNAGHTML399b5f2

Loading the Split Power Queries to the Data Model

The final step is to load these to the Data Model.  We’ve actually created three new queries in this session, but we don’t get the liberty of choosing a new destination for one of them.  Instead, we get to choose a single loading style that will be applied to ALL of them.  (If in doubt, I’d suggest that you load queries as Connection Only first, then change them after if you need to pick different destinations.  This will save you waiting while Power Query litters your workbook with extra worksheets and loads the data to them.)

For our purposes here, I’ll load them directly to the Data Model:

  • Home –> Close & Load To…
    • Select Only Create Connection
    • Select Add to the Data Model
    • Click Load

The only side effect here is that that the Base Load query was also loaded to the data model, and I don’t need that.  So I’ll now quickly change that.

  • Go to the Workbook Queries pane –> right click Base Load –> Load To…
  • Uncheck “Add this data to the Data Model” –> Load

And I’ve now got my tables where I need them so that I can relate them and build my solution.

image

Final Thoughts

This is a fantastic feature, and I was actually going to blog on how to do this the old way, until they surprised me with this update.  I’m a huge fan of the staging/loading approach, and this will certainly make it easier to retrofit a query after it’s already been built.

Performing OR logic tests

When working with Power Query, it’s actually pretty simple to restrict records in a table using an AND criteria.  As filters on different columns are applied in a cumulative fashion, we just keep applying more to filter down the data.  But what about performing OR logic tests?

The issue

Assume we’ve got a list that looks like this:

SNAGHTML37633db

And we want to preserve any record where the inventory item is Talkative Parrot OR the item was sold by Fred.  So how would you do that?

We’d be tempted to start by filtering the Inventory Item column to just the Talkative Parrot item.  But that would remove our very first row, which shows an item sold by Fred.  And if we started by filtering the Sold By column to Fred, we’d lose the Talkative Parrot in row 2.

Performing OR logic tests – via merging columns

We could merge the Inventory Item column together with the Sold By column, then filter for any item that contains Talkative Parrot or Fred.  While this might work in this situation, it could also lead to cross contamination if there were names like Manfred or Wilfred, or if the name we were searching for appeared in one of our Inventory items (like “Fred the dog”.)

Performing OR logic tests – via list functions

Using list functions is a far better way to make this kind of comparison. In fact, there is a function that is specifically built for this purpose, which works exactly like Excel’s OR() function; List.AnyTrue().

To use this, we’ll set up a new custom column with the following formula:

=List.AnyTrue(
{[Inventory Item]="Talkative Parrot",

[Sold By]="Fred"}
)

The formula breaks down as follows:

  • The function name is List.AnyTrue()
  • The function requires a list, which means that you need the {} inside the parenthesis.
  • Each list item (comparison), is nested inside the curly braces, and is separated by commas

Knowing this, we then just need to include the comparisons we want to make, with the field name surrounded in square brackets, and the comparison surrounded in quotes (as we are making a text comparison here.)

And the result, as with Excel’s OR() function, is a list of TRUE/FALSE values:

SNAGHTML3853922

Of course, we can do better than this, by wrapping the original function in an if statement:

=if List.AnyTrue(
{[Inventory Item]="Talkative Parrot",[Sold By]="Fred"}
)
then "Meets Criteria!"
else "No Match"

Which returns useful messages instead of just TRUE/FALSE values:

SNAGHTML387069f

Naturally, you could filter these records or keep them, as your need dictates.

Performing AND logic tests

If you’re simply looking to filter out records based on an AND criteria, then applying successive filters will work.  But what if you only want to tag the records, so that you can use this field in a slicer?  In that case you need to preserve both the matching and non-matching items.

Fortunately, Power Query has a list function for this as well:  =List.AllTrue().

This function is created in the same manner as List.AnyTrue(), where the following would only show a “Meets Criteria!” message where the Inventory Item is Talkative Parrot AND the Sold By field contains “Fred”:

=if List.AllTrue(
{[Inventory Item]="Talkative Parrot",[Sold By]="Fred"}
)
then "Meets Criteria!"
else "No Match"

A Quick Note

This topic is contained in the forthcoming M is for Data Monkey book.  And it’s not too late to get YOUR name inside!

Get Your Name in M is for Data Monkey

I’m pleased to say that our new book “M is for Data Monkey” is now complete, through the technical edit, and into the copy edit stage.  While there is still more work to go, the book is on schedule for release in November (with Amazon shipping in December, I believe.)

At this point we’re just finalizing the “Thank You” section, and wanted to invite everyone who has supported and pre-ordered the book the chance to get your name in M is for Data Monkey:

How to get your name in M is for Data Monkey

If you’d like to get your name inside, we need you to submit the following information via email to info@powerquery.training

  • A copy of your receipt for the pre-order of the book - if you haven’t pre-ordered it then you can do so here
  • The name that you want to be displayed in the book – for example, it can be Miguel ‘Mike’ Escobar, Ken ‘Power Query’ Puls, your company name, or whatever you want to see there.

We only have space for a limited amount of names (around 80), so please submit your information as soon as possible!

The deadline to submit this information is September 16th, 2015 at 23h59m US Eastern time, so don’t miss out!