Building a SelectQuery Function

For a while now, I have been wanting to have the capability in Power Query to select a query by name PROGRAMMATICALLY.  Why?  Building a SelectQuery Function would allow me to execute one of multiple “Transform” queries depending on a user selection on the Excel sheet. This will help me process log files from multiple vendors which each have different contents and field names.

Here is a fairly simple example with only three input queries (although my true setup actually has seven potential queries to select from:

Illustration of the query chain with three queries that pull from a single data source, and another query that feeds the SelectQuery function to choose which to execute

How the setup is intended to work:

I have a named range called “User_Select” on my sheet that has a Data Validation dropdown with the names of my source queries:

And what I want to do is read the value from the User_Select named range into my query named Selection_Query. This provides a scalar value that matches the query I’d like to execute.

The Issues

  • I do not want to land the multiple input queries on sheets (too much data involved).
  • I am not a fan of Power Query Parameters for this approach, as they must be changed from within the Power Query user interface (I don’t really want my users going in there.)
  • I do not want to use “brute force” – I want to do this programmatically so that it is easy to maintain in future.

What is the “brute force” method?

The brute force method is essentially coding a great big IF/THEN statement that contains each possible query.  Looking at the M code, you’d end up with something like:

let
Source = if SELECTION_Query = "Query 1" then #"Query 1" else
if SELECTION_Query = "Query 2" then #"Query 2" else
if SELECTION_Query = "Query 3" then #"Query 3"
// (and so on, and so on, and shoobie-doobie-doo ?)
else null
in
Source

The problem however is that each time I create a new “Transform” for a new vendor, or retire it from production, I would also need to come back and update my brute force query to reflect these changes.  It would be MUCH simpler if I only had to add/remove the query name from my drop-down list, and not worry about messing with the M code of my Selector query.

Some hope for building a SelectQuery function

On 19 Feb, Gasper Kamensek presented a session at VANPUG’s Power BI track that got me excited. In his presentation, he showed how to programmatically select from some LANDED queries using the Excel.CurrentWorkbook() statement in Power Query:

let
Source = Excel.CurrentWorkbook({[Name=Selection_Query]}[Content]
in
Source

Enter Expression.Evaluate

Now, that worked great for items that had been landed to a worksheet table and got me thinking about this some more.  The challenge I’ve been facing is that I need to select from queries which were NOT landed to a worksheet table and therefore don’t show up via the Excel.CurrentWorkbook() function.  Wondering if this was even possible, I asked my friend Ken Puls. And guess what he Puls-ed out of his bag of tricks?

Source = Expression.Evaluate("some text string", #shared)

Now, I had encountered Expression.Evaluate() in the Power Query M function reference, but it was not clear to me what it was intended to do. But after Ken and I bashed this back and forth a bit… WOW!  Does this ever have potential!

Ken explained that Expression.Evaluate() works very similarly to Excel’s INDIRECT() function - it takes an input and tries to evaluate it at run-time.  Unlike Excel, which seems to just evaluate the provided term against any and all Excel items, Expression.Evaluate() requires you to specify the library you want to use to interpret the code.  And that’s where the #shared parameter comes in, as this parameter provides a list of not only all Power Queries in the solution, but also all of the available Power Query functions.

So Ken’s suggestion was to pass the name of the query I wanted in to the Expression.Evaluate function, and evaluate it against the shared library. At that point – he told me – it should give me the results of that query.

Armed with this theory, I was eager to plug it in to my SELECTOR query, which gave me this:

let
Source = Expression.Evaluate(Selection_Query, #shared)
in
Source

AND IT DIDN’T WORK. ?

Expression.Identifier to the Rescue!

Turns out, it’s not Ken’s fault – I like to name my queries with spaces and leading numbers.  After a little digging, it became apparent that the Expression.Evaluate() needed me to refer to “Query 1” with a pound sign and quotes.

In other words, this DOESN’T work:

=Expression.Evaluate(Query 1, #shared)

But this DOES:

=Expression.Evaluate(#"Query 1", #shared)

So now I just needed to figure out how to automatically “escape” the query with the #” “ requirement where necessary.  I suppose I could have put those into my Excel drop down, but that would make the list values look kind of ugly, so I went hunting something a bit more elegant.

After poking around in the M manual, I found Expression.Identifier(“some text”), and guess what it does? It converts the name we see in the Queries & Connections panel into the correct “# and quote” syntax.

So that gives me:

let
qName = Expression.Identifier(Selection Query),
Source = Expression.Evaluate(qName, #shared)
in
Source

AND IT WORKS!

Completing the Solution

To make this as flexible as possible (and allow me to use it in other projects), I decided that building a SelectQuery function was the way to go.  So here’s what I ended up with:

The fxSelectQuery function:

(qName) =>
let
Source = Expression.Evaluate( Expression.Identifier(qName) ,#shared)
in
Source

And at that point, I can invoke whatever query I need by passing the results of the Selection_Query to the fxSelectQuery function like this:

The Output Query:

let
Source = fxSelectQuery(#"Selection_Query")
in
Source

And the end result is that I select the query I want to run from an Excel data validation list, click update, and I’m done.  How cool is that?

One caveat that I should probably mention here is that you must disable the formula firewall in order to use this setup.  You can do this by going to Get Data -> Query Options -> Current Workbook -> Privacy ->  Ignore.

You can download the example file here if you’d like to see the results of building a SelectQuery function in action.

Nuthin’ ain’t nuthin’ in Power Query

There are two kinds of nuthin' in Power Query: null, and blank. I tripped on this issue the other day, and Ken thought it would be a good idea for a blog post.

Let's just call out the two types of nuthin' in Power Query:

  • null is literally "no value" for any data type from text to table.  In other words, the cell is completely empty.
  • A blank also has looks like "no value", but it is equivalent to a cell formula of  ="" in Excel.  In other words, the cell holds a value that renders as blank.

Why is this important?  It's because, inside Power Query (and indeed many programming languages) null and blank are not equal!  And it turns out that nuthin' matters more than getting the right nuthin' in Power Query!

Burned by nuthin' in Power Query

Some time ago I built a set of Excel Power Query transforms which report on data extracted from a client system. For some time the client data has been extracted into Excel files, but there were some problems. My solution was to extract the client data files as CSVs instead.

What I found, unexpectedly, was that blank data values are treated differently by the PQ import functions depending on whether the file being imported is an Excel file or a CSV file!

Here's the rules:

  • For an Excel import, blanks are converted to null - always.
  • For CSVs, blanks are imported as blanks (not nulls). But when a field type is changed in a Power Query step the numeric and date column blanks are converted to nulls, and text column blanks remain as blanks (not nulls)

Seeing nuthin' in Power Query

What does it matter?  Have a look at the following cases...

1. Excel data with blanks loaded into PQ. Blank cells are imported as null.

2. CSV data with blanks imported to Power Query: Blanks are read as blanks.

3. CSV data with blanks: dates and numbers change to null after type change. Text blanks remain blank

The impacts of nuthin' in Power Query

Nuthin' in Power Query could have caused me more issues here... my transforms used conditional columns to check for nulls in text columns, a test which failed when the value is blank.

As Ken pointed out, another key issue is that the Fill Down and Fill Up functions in Power Query are used to fill null values. But blanks are not nulls, so the functions do not work as intended in my case.

How to deal? Caveat Emptor! It appears Power Query treats blank data differently in the CSV import and Excel import functions.

So, if you are building transforms based on a consistent source of imported data, then there is little impact. But if you should need to change the type of data source like I did then beware. I had to do some detail testing, and I was looking to re-write some longstanding and well tested transforms.

Instead, for my specific situation, I added a value change step to the CSV import in PQ to replace nuthin' (blank) with the keyword null for the whole data set. That allowed me to maintain my downstream logic, but cost me some processing speed.

I hope this alerts PQ users to a potential issue in their transforms. If there are any alternate solutions to the problem let's see them in the comments.

Solving Business Problems – Power Query Does That

This week's blog post on solving business problems with Power Query is a guest post by Alex Jankowski

Does your business or work group have manual tasks that are repeated regularly, such as data re-entry, copy/paste functions, sending the same old email, cross-referencing lists, etc.? We can automate these kinds of activities with a wide variety of tools, such as Excel (including VBA, Power Query and Power Pivot), Access, SharePoint and MS Power BI, as well as with other, more sophisticated software packages.

Solving Business Problems You Didn't Even Know Existed

I have found the following to be common themes that can inhibit identifying and solving business problems:

  1. Organizations are illiterate about their data.
  2. Organizations are allergic to cost.
  3. Many managers lack technical imagination. They can’t even conceive of the problems that the organization’s data could solve, let alone conceptualize potential solutions.
  4. Organizations think that only financial data is worthwhile.
  5. Organizations roll out massive ‘suites’ without a concept about how to get their money’s worth from the licenses. My guess is that on average less than 25% of the capabilities of MS Excel are utilized due to lack of training/imagination, which is not a good ROI.
  6. Organizations think that the IT group has a handle on this, whereas IT thinks that data belongs to the business or prevents anyone from accessing important data (IT acting as a “department of productivity prevention”).

And I’m not even talking about the mission critical data that resides in big ERP systems (including HR, Corporate Finance, Time Records, and Project Financials) or “Big Data” that everyone is looking over their shoulders at.

What about:

  • Detailing a list of technical drawings on a project?
  • Tracking hours spent on a project activity, or the employee vacation schedule against entitlements?
  • Creating a dataset of employee skills and performance reviews?
  • Linking that scheduling software package to a list of real deliverables?

The Basic Win - Owning Your Data

Every business has what can be termed ‘islands of data’ which are often generated and managed manually.

Islands of Data

There are 'islands of data' that exist in every organization

Let’s say that your business splurges to automate and standardize some elements of the work process to be more effective. What are the benefits? Well, there will be a reduction in repetitive costs that offset the application implementation costs. People can then graduate from data makers to data owners (where they spend more time analyzing report data rather than creating the report itself), and begin solving business problems by making smarter, data-driven decisions based on questions like, "Is that it? Was it worth it? Where’s the big payback?"

The BIG payback is in linking the ‘islands of data’ with each other to gain unexpected insights. Don’t believe me? Let's take a look at a simple real-life example.

The Unexpected Win

A few years back, I was working in an engineering firm with a very large project. The Operations Manager (thankfully, a guy with great data imagination) wanted to a way to track piping isometric drawings as they were reviewed by various project leads. The drawings would get lost in the review process, like when the head of electrical left them on his chair when he went away for a month in the summer. So we came up with a solution to identify every isometric in the review process, who had approved it, when it was approved, and where it was going next. We met the application goal and stopped losing the isometrics in the review process.

Sample Isometric Drawing

Sample Isometric Pipe Drawing (from http://www.svlele.com/drawings.htm)

Was that the BIG win? Nope. The big win was when we started matching the list of drawings that had completed the review cycle with the list of drawings issued for manufacturing in the document control system (which would have been a great opportunity for Power Query – I wish we had it then). Guess what? We found a small count of approved drawings which had not been issued to manufacturing.

The Real Cost of Not Doing the Data Work

You're thinking, "So what?" Well, each isometric drawing represents a section of pipe that is fabricated by the manufacturer with flanges, valves, fittings, etc. The fabricated pipe is then sent to the construction site holding yard to wait its turn to be assembled in the pipe rack by the construction team. The impact of the drawing not being issued to the manufacturer is that the pipe section is not on site when it’s time to construct, and nobody recognizes it is missing because the drawing was never issued.

What’s the cost of that?

  • extra cost: Rush to find and issue the missing isometric drawing
  • Extra cost: Construction team has to re-sequence its work to allow for the missing pipe section
  • Extra Cost: Rush fabrication by the pipe manufacturer, possibly resequencing their production schedule and possibly charging overtime rates
  • EXTRA Cost: Rush shipping charges to get the late pipe section to the construction site
  • EXTRA COST: Possible overall delay in the construction schedule
  • BIG EXTRA COST + LAWSUIT: Possible overall delay in the refinery startup schedule

That extra cost could be thousands of times more than the cost of doing the data work, and possibly end up eating all of the project’s margin and more. (If you’re going to be allergic to cost, this is the one to be allergic to). Creating the application for isometric tracking was a win, but matching it to the document control system drawing list was the BIG win and nobody expected it!

Delivering the BIG wins

Let's take a look at some ways at solving business problems in Power Query using the data you already have. What if you:

  • Need to find files, audit folder contents, or document a folder structure on the server? Power Query does that.
  • Need to develop an email filing system for your project? Power Query can list the emails, their contents, and attachments.
  • Need to open server files from an Excel list without navigating through the File Explorer? Power Query can help with that. (Eliminating the practice of people navigating from "My Computer" into the bowels of the server structure for each individual file is one of my personal lifetime goals.)
  • Need to create a templated report about projects, employees, or hours from an ERP system extract? Power Query does that.
  • Need to pull data from SQL for templated operations reports? Power Query does that.
  • Need to read multiple CSV or Excel files into a single data set? Power Query does that (check out some of Ken's other blog posts, like this one on Combining Excel Files).
  • Need to report on the status of drawings in your document control system? Power Query does that.

Once the 'islands of data' are being processed, Power Query does an awesome job of combining the data sets together, especially if all the column names are different and the data formats need to be aligned.

What data problems do you need to solve? How are you currently solving business problems, and are there opportunities for BIG wins in your business? Give us some feedback in the comments!

 

 

Map Columns Between Data Sets

Image

Ken is at the PASS BA conference this week, so it seemed like a perfect time for me to publish my first Power Query post here.  In this installment, I'm going to show how to map columns between data sets.

Exploring how to map columns between data sets

I’m on a great new project where Power Query is the bread and butter of the solution. We’re pulling design information from Engineering Design Systems and building transforms to load into another engineering application. This application has a very strict requirement for data layout. Needless to say, the data structures and field names are seldom consistent between the two applications, so a key part of the transform is to map columns from one data set to the other.

Generally, Power Query’s ability to insert, rename, and move columns is useful in a case like this, however we are doing this for a large number of different data transfers, developing the maps in an ongoing process, and I REALLY don’t want to rewrite the Power Query steps for every change in every transform. (Also being able to document the transform is important for design and debugging).

Here’s a simplified example:

Source Data: The Flintstones

The Flintstones Sample data in a small table provides the source of data to be mapped 1

Target Data Structure: M*A*S*H

What does M*A*S*H have to do with the Flintstones? Not much really, that’s the point. But I want to convert data to this layout.

2

The "Map Table": The key piece needed to map columns between data sets

I set a goal to write a power query transform that was agnostic of specific column names and field counts, and so would not use Table.AddColumn, Table.RenameColumns, Table.RemoveColumns, or Table.ReorderColumns operations.

Mapping Strategy/Assumptions:

  • Data will not necessarily end up in the same column order between Source and Target.
  • Not all the columns map from Source to Target.
  • Not all the Target columns will be filled from the Source.

The solution: A "Map" table on an Excel sheet; A simple list of Source field names and Target Field names (I like using a column format for readability).

3

The Transforms

The "TblMap" Transform

The query reads the "Map" table data and flips it around so that the Source names are the table’s column names:

4

The complete M code used for this solution is shown here:

5

The "Output" Transform

This query references the tblMap transform and appends the original source data, giving something like this:

6

Now just promote the first row to Headers, overwriting the existing column names, and the new Target data structure is in place:

7

Dealing with un-mapped columns

But what about those pesky un-mapped columns (Column7 and Column8)? Normally I would use Table.RemoveColumns. I don't want to do that here, though, as this would hard code column names into the M code that might not exist next time, resulting in errors.

Instead, we just transpose the table and filter out any columns that begin with “Column”, and transpose it back.  The complete M code for the query is shown here:

8

And here is the output in Excel once we load it to a table:

9

Closing Thoughts

So there you go. One of the best things I like about this approach is how flexibly it can be modified. Spell “Klinger” wrong? Just modify the spelling in the Map table. Forget to add Rizzo or Nurse Able to the Target? Just add them to the table on the Target side and they are in the result. Forgot to include Dino in the Source data? Just add him to the list.

The sample file is attached. Give it a try. Hope it can be useful.

Column Name Translate

A thought on Data Types

I have not done a lot of testing with data types on this approach. My work will not do any math on the contents until after the re-mapping (I hope), so data typing can be done at the end. If there is any math to be done in the middle of the process, you would need to be careful not to have power Query treat numbers as integers (this has bitten me before).

Performance

The last step where extra column names are removed uses a transpose which could be really slow for long data sets. Another solution that could fix this would be to create a list from the Map table to automate a RemoveOtherColumns function.