Making Transformations Re-Usable

We're back with post two of Miguel's week.  This time, he is going to take his previous solution and show us how versatile Power Query can be, leveraging code he's already written and Making Transformations Re-Usable.

Have at it, Miguel!

Making Transformations Re-Usable

In my previous post, I was able to create a Query that would transform form Data to Rearranged Data with Power Query:

(Image taken from Chandoo.org)

...but what if I had multiple sheets or workbooks with this structure and I need to combine all of them?

Well, you could transform that PQ query into a function and apply that function into several other tables. Let’s dive in and see how to accomplish that!

You can dowload the workbook and follow along from here

Step 1: Transform that query into a function

The first step would be transforming that query into a function and that’s quite simple. You see, the query needs an input – and in order to undestand what that input is you need to understand how your main query works.

So our query starts by grabbing something from within our current workbook – a table Sonrisa

This means that our first input in order for our query to work is table so we’re going to replace that line that goes like this:

Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

To look like this:

Source = MyTable,

What is "MyTable"?  It is a variable, which will return the contents that we've fed it.  Except we haven't actually declared that variable yet...

The next step would be to declare that variable at the top of the actual query, making the query start like this:

(MyTable as table) =>
let
Source = MyTable, [….]

And we now have the MyTable variable set up to receive a table whenever we call it (That is the purpose of specifying the as table part.)

After you do that, this is how the PQ window will look:

image

Now you can just save this function with the name of your choice.

Step 2: Get the files/sheets <From Folder>

From here, you’ll go to the "From Folder" option of Power Query and grab all the workbooks that contain the tables that you want to transform.

Then you’re going to use the Excel.Workbook function against the binary files of the workbooks that you need like this:

image

Next, you’re going to expand that new Custom column that we just created, giving us the sheets, tables, defined ranges and such that are within each of those workbooks.  The results should look like this:

image

You have to choose the data that you want to use which, in this case,  is stored on the Data Column as a table.  So let's create new column against that column.

Step 3: Apply the function

In order to create such column, we are going to apply the function to the Data column (which is a table) like this:

image

... and we'll get a new Custom column that contains a bunch of Table objects.  We can now compare that table from data against the newly created custom column:

image

(table preview from the data column)

image

after we perform the function against the data column

Final Steps:

Expand the new column (by clicking that little double headed arrow at the top,) reorder the columns and change the data type, until we have a final table that looks like this:

image

We're now got a table that is the combination of multiple sheets from multiple workbooks which all went through our transformation, and then were all combined into a single table.  All thanks to Power Query.

If you want to see the step by step process then you can check out the video below:

Transforming Data with Power Query

This week we've got a guest post from Miguel Escobar, who is going to tell us why transforming data with Power Query is his first preference versus other tools and methods.  In fact, this is part 1 of 2, and we're going to turn this week over to Miguel in full (I'll be back next week with a new post of my own.

Miguel, the floor is yours!

Transforming data with Power Query

Looking for some cool scenarios to test the power of Power Query in terms of transformation I stumble upon a really cool post that Chandoo made a few weeks ago where he basically needed this:

(Image taken from Chandoo.org)

he used a really cool formula as option #1 and for option #2 a VBA approach. Now it’s time to make an option #3 that I think will become your #1 option after you finish reading this! Sonrisa

Step #1: Make this a table

In order for this to work, you’ll need to make the range a table with no headers and that should give you a table like this one:

Column1 Column2 Column3 Column4 Column5 Column6 Column7
29-Sep 30-Sep 1-Oct 2-Oct 3-Oct 4-Oct 5-Oct
58,312 62,441 60,467 59,783 51,276 23,450 23,557
6-Oct 7-Oct 8-Oct 9-Oct 10-Oct 11-Oct 12-Oct
53,194 60,236 62,079 66,489 59,258 27,140 25,181
13-Oct 14-Oct 15-Oct 16-Oct 17-Oct 18-Oct 19-Oct
56,231 62,397 60,978 60,928 52,430 24,772 25,630
20-Oct 21-Oct 22-Oct 23-Oct 24-Oct 25-Oct 26-Oct
59,968 61,044 57,305 54,357 48,704 22,318 23,605
27-Oct 28-Oct 29-Oct 30-Oct 31-Oct 1-Nov 2-Nov
56,655 62,788 62,957 64,221 52,697 25,940 27,283
3-Nov 4-Nov          
53,490 5,302          

Also, if you want to follow along, you can download the workbook from here.

Since now we have a table, we can use that as our datasource in order to work with Power Query:

image

(Choose From Table in the Power Query Ribbon)

Step 2: The actual transformation process

Note: Before we start, I don’t want you to be scared of some M code that we’ll be showing you later. Instead, I’m going to show you how NOT to be scared of this crazy looking M code and tell you how it was originated. And no, I didn’t manually go into the advanced editor to do anything – I did all of my steps within the actual UI of Power Query Sonrisa 

So we start with 3 simple steps:

Add an Index column that starts from 0 and has an increment of 1:

image

image

Then we move forward with defining the odd and even rows with a custom column using the Index as a reference and the Number.IsOdd function:

image

After this, we will create another custom column based on the one we just created with a simple if function;

image

we can see that our table so far has 3 custom columns:

  • Index
    • that starts from 0 and has increments of 1
  • A TRUE/FALSE column that tells us if the row is odd or not
    • we used the Number.IsOdd function here
  • A column that basically gives the same number for the records in every 2 rows. So esentially we are creating a pair – a couple Sonrisa

The next step would be to remove the Index column as we no longer need it.

image

and now select the Custom and Custom.1 columns and hit the unpivot other columns option:

image

and now our query should look like this:

image

its all coming along great so farSonrisa

Now we create another custom column that will help us along the way. This will work as our column ID for the components of a row.

image

and now here’s where the interesting part comes. You need to hit the Add Step button from the formula bar:

image

and that should give us a query with the results from the last step which in our case is the “Added Custom 2” step. In this section we will filter the custom column to only give us the rows with the value FALSE.

image

You’ll go back to the “Added Custom 2” step and basically repeat the “Add a custom Step” process but this time you’ll filter it to be TRUE instead of false. I’ve renamed this 2 steps in my query to be DataSet1 and DataSet2.  (Right click the step in the "Applied Steps" box and choose Rename to do this.)

image

Tip: You can always check what’s going on in a step by hitting the gear icon right next to the step

and now we want to combine those 2. DataSet1 and DataSet2....but how?

image

so we hit the Merge Queries button but you’ll notice that it will only show you the tables and queries that have been loaded to our workbook or that are stored as connections. So for now we’re going to trick the system and just choose a merge between the tables that PQ tell us to choose from like this:

image

and that should give you this:

image

and we need to change what we have in our formula bar in either one of the DataSet2  to be DataSet1 so we can merge the data from DataSet1 and DataSet2 which should give me this:

image

And, as you can see, we have a column that needs our help. Let’s expand that new column but only the column with the name Value like this:

image

and the result of that should be:

image

We are almost there

So we have the data how we want it, but now we want to clean it as it has some columns that we don’t really need, so just remove the columns using the Remove Columns button from the Power Query UI (User Interface):

image

This looks more like it! but wait, we need to add some data type and perhaps rename those columns so it can be readable for an end-user:

So first we change the data type for both columns as desired:

image

and later we proceed with renaming the columns just by double clicking each column name:

image

in Chandoo’s solution he adds an index column for the end result so we’ll do the same:

image

and this is our final result after we reorder our columns just by drag-n-drop:

image

and once its loaded into our Worksheet it’ll look like this

image

Things to take in consideration

  1. This is a dynamic solution. You can add new columns and/or more rows and it will still work as desired.
  2. This WILL work on the cloud with Power BI so you can rest-assured that what we just described is something that we can refresh on the cloud. (VBA doesn't work on the cloud.)
  3. We could potentially transform this into a function and execute that function into multiple sheets, tables and even defined names within one or multiple workbooks. (Read more about that here.)
  4. If the result of this transformation has an extensive amount of rows, we could load it directly into our Data Model instead of experiencing the performance drawback of loading it into a worksheet. (Things get seriously slow when a workbook gets way too many rows.)

Power Query is a tool that was created specifically for scenarios that needed transformation like this one.  Microsoft is on a roll by delivering monthly updates of Power Query and you can bet that it will continue to get better and better each month, as it already has for the past year and a half to date.

Try Power Query if you haven’t done so yet. Download the latest version here

Force Power Query to Import as a Text File

I’ve run into this issue in the past, and also got an email about this issue this past week as well, so I figured it’s worth taking a look.  Power Query takes certain liberties when importing a file, assuming it knows what type of file it is.  The problem is that sometimes this doesn’t work as expected, and you need to be able to force Power Query to import as a text file, not the file format that Power Query assumes you have.

IT standards are generally a beautiful thing, especially in programming, as you can rely on them, knowing that certain rules will always be followed.  CSV files are a prime example of this, and we should be able to assume that any CSV file will contain a list of Comma Separated Values, one record per line, followed by a new line character.  Awesome… until some bright spark decides to inject a line or two of information above the CSV contents which doesn’t contain any commas.  (If you do that, please stop.  It is NOT A GOOD IDEA.)

The Issue in the Real World

If you’d like to follow along, you can click here to download MalformedCSV.csv (the sample file).

If you open the sample file in Notepad, you’ll see that it contains the following rows:

SNAGHTML35a6fc

Notice the first row… that’s where our issue is.  There are no commas.  Yet when you look at the data in the rows below, they are plainly separated by commas.  Well yay, so what, right?  Who cares about the guts of a CSV file?  The answer is “you” if you ever get one that is built like this…

Let’s try importing the sample file into Power Query:

  • Power Query –> From File –> From CSV
  • Browse to MalformedCSV.csv

And the result is as follows:

SNAGHTML390f9a

One header, and lots of errors.  Not so good!

The Source Of The Error

If I click the white space beside one of those errors, I get this:

image

What the heck does that even mean?

Remember that CSV is a standard.  Every comma indicates a column break, every carriage return a new line.  And we also know that every CSV file has a consistent number of columns (and therefore commas) on every row.  (That’s why you’ll see records in some CSV’s that read ,, – because there isn’t anything for that record, but we still need the same number of commas to denote the columns.

And now some joker builds us a file masquerading as a CSV that really isn’t.  In this case:

  • Our first row has no commas before the line feed.  We therefore must have a one column table.  Power Query sets us up for a one column table.
  • But our second row has three commas, which means three columns… That’s not the same number of columns as our header row, so Power Query freaks out and throws an error for every subsequent record.

So Now What?

If we can’t rely on the file format being correct, why don’t we just import it as a text file?  That would allow us to bring all the rows in, remove the first one, then split by commas.  That sounds like a good plan.  Let’s do it.

  • Power Query –> From File –> From Text
  • Browse to the folder that holds MalformedCSV.csv

Uh oh… our file is not showing.  Ah… we’re filtered to only show *.txt and *.prn files…

  • Click the file filter list in the bottom right and change “Text File (*.txt;*.prn)” to “All Files (*.*)”
  • Open MalformedCSV.csv

And the result…

SNAGHTML390f9a

Damn.  See, Power Query is too smart.  It looks at the file and says “Hey!  That’s not a text file, it’s a CSV file!” and then imports it as a CSV file... which we already know has issues.  Grr…

Force Power Query to Import as a Text File

Let’s try this again, this time from scratch.  We need two things here…

  1. The full file path to the file you want to import (including the file extension).  On my system it is "D:\Test\MalformedCSV.csv"
  2. A little bit of a code template, which is conveniently included below.

What we’re going to do is this:

  • Go to Power Query –> From Other Sources –> Blank Query
  • View –> Advanced Editor
  • Paste in the following code

let
/* Get the raw line by line contents of the file, preventing PQ from interpreting it */
fnRawFileContents = (fullpath as text) as table =>
let
Value = Table.FromList(Lines.FromBinary(File.Contents(fullpath)),Splitter.SplitByNothing())
in Value,

/* Use function to load file contents */
Source = fnRawFileContents("D:\Test\MalformedCSV.csv")

in
Source

  • Update the file path in the “Source” step to the path to your file.
  • Click Done

And the output is remarkably different… in fact, it’s the entire contents of the file!

SNAGHTML4af3ce

This is awesome, as we now have the ability to clean up our data and use it as we were hoping to do from the beginning!  So let’s do just that…starting with killing off that first line that’s been screwing us up:

  • Home –> Remove Rows –> Remove Top Rows –> 1
  • Transform –> Split Column By Delimiter –> Comma –> At each occurrence of the delimiter
  • Transform –> User First Row As Headers

And now we’ve got a pretty nice table of data that actually looks like our CSV file:

SNAGHTMLa3388b

The final steps to wrap this up would essentially be

  • set our data types,
  • name the query. and
  • load the query to the final destination.

Final Thoughts

This seems too hard.  I know that Power Query is designed for ease of use, and to that end it’s great that it can and does make assumptions about your file.  Most of the time it gets it exactly right and there is no issue.  But when things do go wrong it's really hard to get back to a raw import format so that we can take control.

I really think there should be an easy and discoverable way to do a raw import of data without making import/formatting assumptions.  A button for “Raw Text Import” would be a useful addition for those scenarios where stuff goes wrong and needs a careful hand.

I should also mention that this function will work on txt files or prn files as well.  In fact, it also works on Excel files to some degree, although the results aren’t exactly useful!

image

The key here is, whether caused by one of more extra header lines in a csv file, tab delimited, colon delimited or any other kind of delimited file, the small function template above will help you get past that dreaded message that reads “DataFormat.Error:  There were more columns in the result than expected.”

Addendum To This Post

Miguel Escobar has recorded a nice video of the way to do this without using any code.  You can find that here:

 

 

Building a Parameter Table for Power Query

One of the things that I’ve complained about in the past is that there is no built in way for Power Query to be able to pull up the path for the current workbook.  Today I’m going to show you how I solved that problem by building a parameter table for Power Query in Excel, then link it into my queries.

Quick Note!  WordPress is determined to drive me crazy and is replacing all of my straight quotes (SHIFT + ') with curly quotes.  Since curly quotes are different characters, they will cause issues if you copy/paste the code directly.  I'll get this sorted out, but in the mean time, just make sure you replace all "curly quote" characters with straight quotes.

To do this successfully, we need to pieces; an Excel table, and a Power Query function.  So let’s get to it.

Building a Parameter Table

The table is quite simple, really.  It’s a proper Excel Table, and it has a header row with the following two columns:

  • Parameter
  • Value

Once created, I also make sure that I go to the Table Tools tab, and rename the table to “Parameters”.

SNAGHTML192eb95

Pretty bare bones, but as you’ll see, this becomes very useful.

Now we add something to the table that we might need.  Since I’ve mentioned it, let’s work out the file path:

  • A8:     File Path
  • B8:     =LEFT(CELL("filename",B6),FIND("[",CELL("filename",B6),1)-1)

Now, as you can see, column A essentially gives us a “friendly name” for our parameter, where the value ends up in the second column:

SNAGHTML195b12d

While we’re here, let’s add another parameter that we might have use for.  Maybe I want to base my power query reports off the data for the current day.  Let’s inject that as well:

  • A9:     Start Date
  • B9:     =TODAY()

SNAGHTML197a4ef

Good stuff.  We’ve now got a table with a couple of useful parameters that we might want when we’re building a query.

Adding the Parameter Function

Next, we’re going to add the function that we can reference later.  To do that:

  • Go to Power Query –> From Other Sources –> Blank Query
  • Go to View –> Advanced Editor
  • Replace all the code with the following:

(ParameterName as text) =>
let
ParamSource = Excel.CurrentWorkbook(){[Name="Parameters"]}[Content],
ParamRow = Table.SelectRows(ParamSource, each ([Parameter] = ParameterName)),
Value=
if Table.IsEmpty(ParamRow)=true
then null
else Record.Field(ParamRow{0},"Value")
in
Value

  • Click Done
  • Rename the function to “fnGetParameter”
  • Go to Home –> Close & Load To…
  • Choose to create the connection only, avoiding loading to a table or the Data Model

And just in case you think this means it didn’t work, we expect to see that it didn’t load in the queries window:

image

Making Use of the Parameter Table

Okay, so now that we have this in place, how do we go about using it?

Let’s create a ridiculously simple table:

SNAGHTML1a313c0

Now, click in the table and go to Power Query –> From Table.

We’ll be taken into the Power Query window and will be looking at our very simple data.  Let’s pull the contents of my parameters into columns:

  • Go to Add Column –> Add Custom Column
  • Change the title to “File Path”
  • Enter the following formula: =fnGetParameter("File Path")

Check it out!

image

Do you see that?  This is the path to the folder that holds the workbook on my system.  The formula we used in the table retrieves that, and I can pass it in to Power Query, then reference it as needed!

How about we do the same for our date?

  • Go to Add Column –> Add Custom Column
  • Change the title to “First date”
  • Enter the following formula: =fnGetParameter(“Start Date”)

image

The key here is to make sure that the value passed to the parameter function is spelled (and cased) the same as the entry in the first column of the parameter table.  I.e. You could use “FilePath”, “File Path”, “Folder”, “My File Path” or whatever, so long as that’s the name you gave it in the first column of the Parameters Excel table.

And what happens if you pass an invalid value?  Say you ask for fnGetParameter(“Moldy Cheese”) and it’s not in the table?  Simple, you’ll get null returned instead.  :)

Implications of Building a Parameter Table for Power Query

The implications for this technique are huge.  Consider this scenario… you create your workbook, and store it in a folder.  But within that folder you have a subfolder called “Data”.  Your intention is to store all of your csv files in that folder.  And, for argument’s sake, let’s say that it’s a mapped drive, with the path to your solution being “H:\My Solution\”

No problem, you build it all up, and it’s working great.  You keep dropping your text files in the data folder, and you can consolidate them with some M code like this:

let
Source = Folder.Files("H:\My Solution\Data"),
#"Combined Binaries" = Binary.Combine(Source[Content]),
#"Imported CSV" = Csv.Document(#"Combined Binaries",null,",",null,1252),
#"First Row as Header" = Table.PromoteHeaders(#"Imported CSV")
in
#"First Row as Header"

Things run along for ages, and that’s awesome, but then you need to go on vacation.  No worries, it’s Power Query and easy to use, you can just get your co-worker to update it… except… on your co-worker’s machine that same drive is mapped not to the H:\ drive, but the F:\ drive.  Doh!

We could recode the path, but what a pain.  So how about we use the parameter table to make this more robust so that we don’t have to?  All we need to do is modify the first couple of lines of the query.  We’ll pull in a variable to retrieve the file path from our parameter table, then stuff that into the file path, like this:

let
SolutionPath = fnGetParameter("File Path"),
    Source = Folder.Files(SolutionPath & "Data"),
#"Combined Binaries" = Binary.Combine(Source[Content]),
#"Imported CSV" = Csv.Document(#"Combined Binaries",null,",",null,1252),
#"First Row as Header" = Table.PromoteHeaders(#"Imported CSV")
in
#"First Row as Header"

How awesome is that?  Even better, the SolutionPath shows as a step in the Applied Steps section.  That means you can select it and make sure the value is showing up as you’d expect!

Practical Use Case

Several months back I built a solution for a client where we stored his Power Query solution in a folder, and we had data folders that were created on a bi-weekly basis.  Each of those folders were named based on the pay period end data (in ISO’s yyyy-mm-dd format), and were stored in a path relative to the master solution.

Naturally, I needed a way to make the queries dynamic to the master folder path, as I did the development in my home office, then emailed the updated solution files to him in his New York office.  He had different drive mappings than I do, and his team had different drive mappings than he did.  With half a dozen Power Queries in the solution, having them manually update the code each time a new user wanted to work with the file just wasn’t an option.

This technique became invaluable for making the solution portable.  In addition, by having a formula to generate the correct end date, we could also pull the data files from the right place as well.

I still want a Power Query CurrentWorkbook.FilePath method in M, but even if I get it this technique is still super useful, as there will always be some dynamic parameter I need to send up to Power Query.

I hope you find this as useful as I have.

Combine Multiple Worksheets Using Power Query

In last week’s post we looked at how to combine multiple files together using Power Query.  This week we’re going to stay within the same workbook, and combine multiple worksheets using Power Query.

Background Scenario

Let’s consider a case where the user has been creating a transactional history in an Excel file.  It is all structured as per the image below, but resides across multiple worksheets; one for each month:

image

As you can see, they’ve carefully named each sheet with the month and year.  But unfortunately, they haven’t formatted any of the data using Excel tables.

Now the file lands in our hands (you can download a copy here if you’d like to follow along,) and we’d like to turn this into one consolidated table so that we can do some analysis on it.

Accessing Worksheets in Power Query

Naturally we’re going to reach to Power Query to do this, but how do we get started?  We could just go and format the data on each worksheet as a table, but what if there were hundreds?  That would take way too much work!

But so far we’ve only seen how to pull Tables, Named Ranges or files into Power Query.  How do we get at the worksheets?

Basically, we’re going to start with two lines of code:

  • Go to Power Query –> From Other Sources –> Blank Query
  • View –> Advanced Editor

You’ll now see the following blank query:

let
Source = ""
in
Source

What we need to do is replace the second line (Source = “”) with the following two lines of code:

FullFilePath = "D:\Temp\Combine Worksheets.xlsx",
Source = Excel.Workbook(File.Contents(FullFilePath))

Of course, you’ll want to update the path to the full file path for where the file is saved on your system.

Once you click Done, you should see the following:

image

Cool!  We’ve got a list of all the worksheets in the file!

Consolidating the Worksheets

The next step is to prep the fields we want to preserve as we combine the worksheets.  Obviously the Name and Item columns are redundant, so let’s do a bit of cleanup here.

  • Remove the Kind column
  • Select the Name column –> Transform –> Data Type –> Date
  • Select the Name column –> Transform –> Date –> Month –> End of Month
  • Rename the Name column to “Date”

At this point, the query should look like so:

image

Next we’ll click the little double headed arrow to the top right of the data column to expand our records, and commit to expanding all the columns offered:

SNAGHTML52dcfe5

Hmm… well that’s a bit irritating.  It looks like we’re going to need to promote the top row to headers, but that means we’re going to overwrite the Date column header in column 1.  Oh well, nothing to be done about it now, so:

  • Transform –> Use First Row As Headers –> Use First Row As Headers
  • Rename Column1 (the header won’t accept 1/31/2008 as a column name) to “Date” again
  • Rename the Jan 2008 column (far right) to “Original Worksheet”

Final Cleanup

We’re almost done, but let’s just do a bit of final cleanup here.  As we set the data types correctly, let’s also make sure that we remove any errors that might come up from invalid data types.

  • Select the Date column
  • Home –> Remove Errors
  • Set Account and Dept to Text
  • Set Amount to Decimal Number
  • Select the Amount column
  • Home –> Remove Errors
  • Set Original Worksheet to Text

Rename the query to “Consolidated”, and load it to a worksheet.

Something Odd

Before you do anything else, Save the File.

To be fair, our query has enough safe guards in it that we don’t actually have to do this, but I always like to play it safe.  Let’s review the completed query…

Edit the Consolidated query, and step into the Source line step.  Check out that preview pane:

image

Interesting… two more objects!  This makes sense, as we created a new table and worksheet when we retrieved this into a worksheet.  We need to filter those out.

Getting rid of the table is easy:

  • Select the drop down arrow on the Kind column
  • Uncheck “Table”, then confirm when asked if you’d like to insert a step

Select the next couple of steps as well, and take a look at the output as you do.

Aha!  When you hit the “ChangedType” step, something useful happens… we generate an error:

image

Let’s remove that error from the Name column.

  • Select the Name column –> Home –> Remove Errors

And we’re done.  We’ve managed to successfully combine all the data worksheets in our file into one big table!

Some Thoughts

This method creates a bit of a loop in that I’m essentially having to reach outside Excel to open a copy of the workbook to pull the sheet listing in.  And it causes issues for us, since Power Query only reads from the last save point of the external file we’re connecting to (in this case this very workbook.)  I’d way rather have an Excel.CurrentWorkbook() style method to read from inside the file, but unfortunately that method won’t let you read your worksheets.

It would also be super handy to have an Excel.CurrentWorkbookPath() method.  Hard coding the path here is a real challenge if you move the file.  I’ve asked Microsoft for this, but if you think it is a good idea as well, please leave a comment on the post.  (They’ll only count one vote from me, but they’ll count yours if you leave it here!)

Merge Multiple Files With Properties

This post illustrates a cool technique that I learned at the MVP summit last week, allowing us to use Power Query to merge multiple files with properties from the file in the output.  A specific example of where this is useful is where you have several files with transactional data, saved with the month as the file name, but no date records in the file itself.  What we’d want to do is merge all the contents, but also inject the filename into the records as well.

The funny thing about this technique is that it’s eluded me for a long time, mainly because I way over thought the methods needed to pull it off.  Once you know how, it’s actually ridiculously simple, and gives us some huge flexibility to do other things.  Let’s take a look at how it works.

If you’d like to download the files I’m using for this example, you can get them here.  You'll find that there are 3 files in total: Jan 2008.csv, Feb 2008.csv and Mar 2008.csv.

Step 1: Create your query for one file

The first thing we need to do is connect to the Jan 2008.csv file and pull in it’s contents.  So let’s do that:

  • Power Query –> From File –> From CSV
  • Browse to the Jan 2008.csv file and import it
  • Rename the “Sum of Amount” column to “Amount”

Perfect, we now have a basic query:

SNAGHTML63ab03

Notice here how the January file has no dates in it?  That’s not good, so that’s what we’re trying to fix.

Step 2: Turn the query into a function

At this point we need to do a little code modification.  Let’s go into the Advanced editor:

  • View –> Advanced Editor

We need to do two things to the M code here:

  • Insert a line at the top that reads:  (filepath) =>
  • Replace the file path in the Source step (including the quotes) with:  filepath

At that point our M will read as follows:

image

We can now:

  • Click Done
  • Rename our Query to something like fnGetFileContents
  • Save and close the query

Power Query doesn’t do a lot for us yet, just giving us this in the Workbook Queries pane:

image

Step 3: List all files in the folder

Now we’re ready to make something happen.  Let’s create a new query…

  • Power Query –> From File –> From Folder
  • Browse to the folder that holds the CSV files
  • Remove all columns except the Name and Folder Path

Wait, what?  I removed the column with the binary content?  The column that holds the details of the files?  You bet I did!  You should now have a nice, concise list of files like this:

image

Next, we need to add a column to pull in our content, via our function.  So go to:

  • Add Column –> Add Custom Column
  • Enter the following formula:  =fnGetFileContents([Folder Path]&[Name])

Remember it is case sensitive, but when you get it right, some magic happens.  We get a bunch of “Table” objects in our new column… and those Table objects hold the contents of the files!

I know you’re eager to expand them, but let’s finish prepping the rest of the data first.

Step 4: Prep the rest of the data you want on the table rows

Ultimately, what I want to do here is convert the file name into the last day of the month.  In addition, I don’t need the Folder Path any more. So let’s take care of business here:

  • Remove the “Folder Path” column
  • Select the “Name” column –> Transform –> Replace Values –> “.csv” with nothing
  • Select the “Name” column –> Transform –> Date Type –> Date
  • Select the “Name” column –> Transform –> Date –> Month –> End of Month

And we’ve now got a pretty table with our dates all ready to go:

image

Step 5: Expand the table

The cool thing here is that, when we expand the table, each row of the table will inherit the appropriate value in the first column.  (So all rows of the table in row 1 will inherit 2/29/2008 as their date.)

  • Click the little icon to the top right of the Custom column
  • Click OK (leaving the default of expanding all columns)
  • Rename each of the resulting columns to remove the Custom. prefix

And that’s it!  You can save it and close it, and it’s good to go.

A little bit of thanks

I want to throw a shout out to Miguel Llopis and Faisal Mohamood from the Power Query team for demonstrating this technique for the MVP’s at the summit.  I’ve been chasing this for months, and for some reason tried to make it way more complicated than it needs to be.

What’s the next step?

The next logical step is to extend this to working with Excel files, consolidating multiple Excel files together; something we can’t do through the UI right now.  Watch this space, as that technique is coming soon!

Name Columns During a Merge

The timing for the release of the October Power Query update couldn’t really have been much better for me, especially since it’s got a cool new feature in that you can now name columns during a merge.  Why is this timing so cool?

The reason is that I’m at the MVP Summit in Redmond this week, an event where the MVP’s and Microsoft product teams get together to discuss the things that are/aren’t working in the products, and discuss potential new features.  The tie in here is that I already blogged about the new method for Merging Columns With Power Query that came into effect with the August update and, in that blog post, I said:

I do wish this interface had a way to name the column in advance (like exists when you create a custom column.)  Hopefully the PQ team will retrofit us with that ability at some point in the future.

The October Power Query update (released Oct 27, 2014), includes the ability to name columns during a merge, rather than making you do this in two steps.  How cool is that?

image

While I’m under no illusions that this change was made based on my feedback alone, I do know that the Power Query team reads my blog, so it played a part.  This is one of the big reasons I go to the summit every year, to share ideas that make the product more intuitive/usable, so it’s cool to see one of those changes make it into the product a few days before I arrive.

By the time this post is published on Wednesday morning, we’ll already be into our final day of the 2014 Summit.  I’m going in pretty jazzed though, as I know that the product teams are listening and reacting to our feedback.  :)

Tame Power Query Workbook Privacy Settings

I recently built a cool Excel solution at work that uses Power Query to reach out and grab some weather data, then upload it into database.  We use that data in our dashboards, as weather is pretty important to the golf industry in which I work.  But then I went to deploy the file, and needed to find a way to tame the Power Query Workbook Privacy settings.

The Issue

What happens is, every time a user runs a Power Query that was last saved by another user, they are prompted to set the Workbook’s Privacy level.  This is maddening, as we have two staff that use this workbook, where one covers for the other while they’re away.  Naturally, long lapses of time can occur in between… just long enough to forget what to do when you’re prompted by this frustrating message:

image

So while I can (and have) set the privacy level for the web data they are going to retrieve (see my tool here), I have no way to permanently set the Workbook’s Privacy level.  Worse is that, if the user clicks Cancel, or chooses the wrong privacy level, (even trying to protect the output table structure using Chris Webb’s technique here,) fails.  The table generates an error, and all business logic in the workbook is blown apart.  The only recourse is to exit the file without saving and try again.

Naturally, this concerns them, and they call to make sure they do the right thing.  That’s awesome, and I wouldn’t change that at all.  But the deeper underlying problem is that Power Query’s Workbook security is engineered to require developer interaction.  And that is bad news!

How to Tame Power Query Workbook Privacy Settings

Unfortunately I can’t use VBA to set this in the workbook (or at least, I haven’t tried, anyway), but I can work a little trick to at least warn my users when they are about to be prompted, and remind them which privacy level they need to select.  Here’s what I need to do in order to make that work:

Step 1: Create a range to hold the last user

  • Create a named range somewhere in the workbook called “rngLastUser”  (I put mine on the “Control Panel” worksheet and hid it.)
  • Drop the following code in a Standard VBA Module:

Public Sub UpdateLastUser()
With This
Workbook.Worksheets("Control Panel")
.Range("rngLastUser") = Application.UserName
End With
End Sub

Step 2: Create a macro to update your Power Queries

Public Sub GetWeather()
With ThisWorkbook
'Check if privacy level will need to be set
If .Worksheets("Control Panel").Range("rngLastUser") <> Application.UserName Then
MsgBox "You are about to be prompted about Privacy Levels" & vbNewLine & _
"for the Current Workbook. When the message pops up, " & vbNewLine & _
"you'll see an option to 'Select' to the right side of the Current Workbook." & vbNewLine & _
vbNewLine & _
"Please ensure you choose PUBLIC from the list.", vbOKOnly + vbInformation, _
"Hold on a second..."
End If

'Refresh the Power Query table
.Worksheets("Weather").ListObjects("WeatherHistory").QueryTable.Refresh BackgroundQuery:=True
End With
Call UpdateLastUser
End Sub

Step 3: Link the macro to a button for your users

  • Link my GetWeather() routine to a button
  • And we’re good!

What Did We Do?

So basically, what I did here was this:

  • Every time the user clicks the button…
  • Excel checks the contents of rngLastUser to see if the username is the same as the current user
    • If it is, it just goes on to refresh the table
    • If it’s not, it kicks up the following message:

image

    • After the user clicks OK (if necessary), then it prompts the user to set the security level. Yes, they can still get it wrong, but at least they have a chance now!
    • Once the security level is set, the macro goes on to refresh the table
  • After the table is refreshed, Excel updates the rngLastUser cell to the name of the current user.

And that’s it.  We now have a system that will prompt our users with the correct answer, so that they don’t have to come back and ask us every time.

Thoughts On The Security Model

Ideally it would be nice to not have to do this, and there is – in fact – a way.  Microsoft’s answer is “Yes, just enable Fast Combine.”  That’s great and all, but then it ignores all privacy levels and runs your query.  What if, as a developer, I need a way to ensure that what I’ve built stays Public, Private or Organizational?  What it if actually matters?

To me, Fast Combine is akin to setting Macro Security to Low in Excel 2003 and earlier.  Basically, we’re saying “I don’t like nagging messages, so let’s run around in a war zone with no bullet proof vest.”  Sure, you might come out alive, but why should you take that risk?

In my opinion, the Power Query security model needs work.  Even if we could assign a digital certificate to the query to prove it (and the privacy levels) had not been modified, that would be great.  I’m not sure exactly what it is we need, but we need something better than what we have today.

Refresh Power Query With VBA

When I’ve finished building a solution in Excel, I like to give it a little polish, and really make it easy for my users to update it.  The last thing I want to do is to send them right clicking and refreshing everywhere, so I program a button to refresh Power Query with VBA.

The interesting part about the above statement is that Power Query doesn’t have any VBA object model, so what kind of black magic trick do we need to leverage to pull that off?  As it turns out, it’s very simple… almost too simple in fact.

A Simple Query

Let’s just grab the sample data file from my post on pulling Excel named ranges into Power Query.  Once we’ve done that:

  • Click in the blue table
  • Go to Power Query –> From Table
  • Let’s sort Animal ascending (just so we know something happened)
  • Next save and Exit the query

At this point, we should get a new “Sheet2” worksheet, with our table on it:

SNAGHTML34cf68

 The Required VBA Code

Next, we need to build our VBA for refreshing the table.  Rather than record and tweak a macro, I’m just going to give you the code that will update all Query Tables in the entire workbook in one shot.  But to use it, you need to know the secret handshake:

  • Press Alt + F11

This will open the Visual Basic editor for you.  If you don’t see a folder tree at the left, then press CTRL+R to make it show up.

  • Find your project in the list (It should be called “"VBA Project (Selecting Data.xlsx)”
  • Right click that name and choose “Insert Module”
  • In the window that pops up, paste in the following code:

Public Sub UpdatePowerQueries()
' Macro to update my Power Query script(s)

Dim cn As WorkbookConnection

For Each cn In ThisWorkbook.Connections
If Left(cn, 13) = "Power Query -" Then cn.Refresh
Next cn
End Sub

Now, I’ll admit that I find this a little looser than I generally like.  By default, all Power Query scripts create a new connection with the name “Power Query –“ the name of your query.  I’d prefer to check the type of query, but this will work.

Speaking of working, let’s prove it…  But first, close the Visual Basic Editor.

Proving The Refresh Works

The easiest way to do this is to go back to the table on Sheet 1 and add a new row to the table.  I’m going to do that first, then I’m going to:

  • Press Alt + F8
  • Choose “UpdatePowerQueries”
  • Click Run
  • Go back to Sheet2 to verify it’s updated

If all goes well,  you should now have another row of data in your table, as I do:

image

Adding Polish

Let’s face it, that’s probably harder than going to Data –> Refresh All.  The goal here was to make it easier for my users.  So let’s do that now.

  • Return to Sheet 1
  • Go to the Developer Tab (if you don’t see it, right click the ribbon, choose “Customize Ribbon” and check the box next to the Developer tab to expose it)
  • Click Insert and select the button in the top left

image

  • Left click and drag a button onto your worksheet

When you let go, you’ll be prompted to assign a macro.

  • Choose “UpdatePowerQueries” and click OK
  • While the dots are still on the corners, click in the text
  • Backspace it out and replace it with something helpful like “Update Queries” (if you click elsewhere, you’ll need to right click the button to get the selection handles back.)
  • Click in the worksheet to de-select the button

SNAGHTML57873a

That’s it.  Test it again by adding some more data to the table then clicking the button.

Ramifications

I write a lot of VBA to make my users lives easier, and generally use this kind of technique as part of a bigger goal.  But regardless, it can still be useful as a stand alone routine if you want to avoid having to train users on how to do things through the ribbon.

Create Dynamic Table Headers With Power Query

In my last post, I looked at how to pull a named range into Power Query and turn it into a table of data.  Today we’re going to look at how to create dynamic table headers with Power Query, but using a slightly different, slightly more complicated way to do the same task.  Why?  Well, the reality is that sometimes the simple method won’t work for us.

Background

In this scenario, we’re going to use the same file I used in my last post, we’re just going to build the output differently.  The key that I’m after here is that I want to pull the data range into Power Query, but I want to use a table.  The issue, however, as I mentioned in my last post, is:

If I set up the table with headers in row 3, it would convert the dates to hard numbers, thereby blowing apart the ability to easily update the dynamic column headers next year.  That would defeat the purpose of making the input sheet dynamic in the first place.

I’ve really struggled with this feature in Tables, where it converts your headers to hard values.  So many of the tabular setups I create use dynamic headers, it’s actually more rare that they don’t.  So how do we work around this?  How do we create dynamic table headers in Excel?

Setting Up The Table For Success

It’s actually a LOT easier than you might think.  I use this very simple trick with both PivotTables and regular Tables, allowing me to take advantage of their power, but still control my headers and make them dynamic.  Here’s how:

Step 1: Add Static Headers Manually

  • Insert 2 rows below the current dynamic date headers in Row 3
  • Put some static headers in Row 5 that are generic but descriptive.  (In this case, CYM# means “Current Year, Month #”)

SNAGHTML1a520e81 The important part here is to make sure these static headers are each unique so that you can “unwind” them later with Power Query.

Step 2: Create The Table

Next, we need to create the table.  It’s going to cover A5:N24, and will therefore inherit the CYM column headers.  Since they are static values, it won’t make any changes to them, and my dynamic dates are still showing up top.

Step 3: Build a Translation Table

Huh?  A what?  Bear with me, as this will come clear a bit later.  Here’s how we do it:

  • Enter the following in B4:  =B3
  • Copy this across to cover B4:N4
  • Make sure B4:M4 is selected and go to Home—>Editing—>Find & Select –> Replace
  • Using the dialog, make the following two replacements:
    • Replace = with =$
    • Replace 3 with $3

This has the effect of making the formulas all absolute, which is important for our next step.

  • Select B4:N5 –> Right Click –> Copy
  • Select a cell down below your data range somewhere (I used A34)
  • Right click the cell and choose Paste Special
  • In the Paste Special options, choose the following:
    • Paste:  All
    • Check the Transpose box

Very cool, we’ve now got the beginnings of a table that is linked to the formulas in row 3.  We just need to finish it.

  • Add the column header of “Date” in A33
  • Add the column header of “Period” in B33
  • Format the data range from A33:B46 as a table
  • Rename the table to “DateTranslation”

SNAGHTML1a6575f9

Step 4: Final Header Cleanup

Row 4 has now served it’s purpose for us, so you can delete that, and then hide (the new) row 4.  The end result is that we have a header on our table that looks like it’s part of the table, but isn’t really.  The benefits here are that we can preserve the dynamic nature of it, and we have a wider variety of formatting options. SNAGHTML1a67304b We also have a completely separate date translation table too…

Setting Up The Power Query Scripts

So now we need to get the data into Power Query.  Let’s do that.

Step 1: Import and Reformat the Rounds Table

To do this we will:

  • Click somewhere in the main table
  • Power Query –> From Table
  • Remove the “TOTAL” column
  • Filter the first column to remove text that begins with “Total” and values that equal null
  • Filter the second column to remove null values
  • Right click the first column and Un-Pivot Other Columns
  • Rename the “Month” column to “Round Type”
  • Rename the query to “Budget”

At this point, you should have the following: SNAGHTML1afa967c This is great, but what about the Attribute column.  This is the whole big pain here about using a table, in that we don’t have dates.  Yes we could have hard coded them, but then it would be very painful to update our query when our year changes.  So while we have something flexible (in a way) here, it isn’t really all that readable.  How can we change that? Save and close the query, and let’s deal with this.

Step 2: Create Another Power Query

Let’s add the other table we built:

  • Click inside the DateTranslation table we created
  • Go to Power Query –> From Table
  • Click Close & Load –> Load To…
  • Click Only Create Connection –> Load

This will create a new Power Query that basically just reads your original table on demand.  It won’t refresh unless it’s called from another source.  Now we’re set to do something interesting.

Step 3: Re-Open The Budget Power Query

If the Workbook Queries pane isn’t open on the right, then go to Power Query –> Workbook Queries to show it.

  • Right click the Budget query and click Edit
  • On the Power Query Home tab, click Merge Queries (in the Combine group)
  • Select the Attribute column
  • From the drop down box, choose Date Translation
  • Select the Period column
  • Make sure “Only Include Matching Rows” is checked and click OK

image At this point you’ll get a new column of data in your Query.  Click the icon in the top right of the column to expand it: image Excellent… now we only need to pick the column we need, which is Date.  So uncheck the Period column and click OK. Finally we can remove the Attribute column and rename the “NewColumn.Date” column to Date and we’ve got a pretty clean query: SNAGHTML1b05acce At this point we could call it a day, as we’ve pretty much accomplished the original goal.  I can update the Year cell in B1 and my Table’s “Headers” will update.  In addition, my Power Query will show the correct values for the dates as well.  Pretty cool, as I could now link this into Power Pivot and set a relationship against a calendar table without having to worry about how it would be updated.

Going One Level Deeper

One thing I don’t like about this setup is the need for the extra query.  That just seems messy, and I’d prefer to see just one query in my workbook.  The issue though, is that I’m pulling data from two tables.  What’s cool though, is that with a little editing of the M code, I can fix that. Here’s the M for the query I’ve built, with a key line highlighted (twice): image As you can see, the “Merge” line is coloured yellow, but the name of the Query being merged is in orange.  Well guess what, we don’t need to reach to an external query here, we can reach to another named step in M.  Try this: Immediately after the “Let” line, enter the following:

TranslationTable = Excel.CurrentWorkbook(){[Name="DateTranslation"]}[Content],

Now, modify the “Merge” line to update the name of the table from “DateTranslation” to “TranslationTable”.  (The reason we’re doing this is that the original query still exists, so we can’t just name the first step “DateTranslation”, as it will conflict. Once we’ve made our modifications, the script will look as follows: image When you click “Done”, the query will reload and you’ll see an extra step in the “Applied Steps” box on the right.  What you won’t see though, are any changes, as the data comes out the same.  Very cool, as we are now referencing both tables in a single step.  To prove this out, save this query, drop back to Excel and delete the “DateTranslation” query.  It will still work! (The completed file can be downloaded here.)

Ending Thoughts

I really like this technique.  It let’s me dynamically change the column names, yet still use those to link them into my data model tables.  But even more I like the ability that, with a minor edit to the M code, I can keep my workbook from being littered with extra queries.  :)