Dynamic Database Queries

Some time ago I posted a technique to pass parameters to Power Query via an Excel table.  In a comment on that post, Anand asked “Can I pass the parameter to a SQL Table as well?”  The question could mean one of two things, but the answer to both is “yes”.

Dynamic Database Queries – Native Queries

The first thing it could mean is where you connect to a SQL database and feed it a native query like this:

SELECT  [ClubActivityTypeCode], [ClubMemberCategoryCode]
FROM  [tblClubFeeBillingSchedules]
WHERE  [ClubActivityTypeCode]='Golf'

And where we’d like to make the “Golf” part in the WHERE clause dynamic.  Power Query gives us code for this step as follows:

Source = Sql.Database("servername", "datbasename", [Query="SELECT  [ClubActivityTypeCode], [ClubMemberCategoryCode]#(lf)FROM  [tblClubFeeBillingSchedules]#(lf)WHERE  [ClubActivityTypeCode]='Golf'"])

Pretty ugly again, but we can absolutely have it compile if we use our parameter function.  Without using the parameter function at all, it might look like this:

typecode=fnGetParameter(“ActivityType”),

Source = Sql.Database("servername", "datbasename", [Query="SELECT  [ClubActivityTypeCode], [ClubMemberCategoryCode]#(lf)FROM  [tblClubFeeBillingSchedules]#(lf)WHERE  [ClubActivityTypeCode]='” & typecode & “'"])

Notice that we’ve just inserted a variable to hold the value we pulled via the fnGetParameter function, and then replaced the existing Golf term with the variable between quotes, joining it with the & characters on both sides.

One caveat here: In the case of the query above, we’re passing a custom Select query, which gets flagged as a Native Database Query by Power Query.  This has security permission implications, and Power Query won’t even attempt to fold more steps into this query.

Dynamic Database Queries – Folded Queries

And here’s the other option… with a caveat up front: I’m probably going to butcher the explanation of Query Folding.  My intent isn’t to get this perfectly factually correct, but rather to give you a general sense of what’s going on.  (In my impression, us Excel folks don’t actually NEED to understand the inner working of this piece to make use of it.)

So here goes… In contrast to the Native Database Query, we can avoid security prompts and let Power Query take advantage of query folding at the database level (pushing the process back to the database to complete.)  In layman’s terms this means we connect to the database, we pull in the data, then we do some operations.  Anything Power Query can get the database to do for us, it will “fold” into a single query.  And it will continue to fold more steps in until you try to do something that the database can’t do.  For that and all subsequent steps, Power Query’s engine takes over.

Dealing with these kind of queries works in exactly the same fashion as what you saw in the original parameter table article.  We connect to the database, then perform subsequent steps in Power Query, like filtering out records.  And we use variables and get fnGetParameter function to feed those variables in to the subsequent query steps.

An example of this (using a different database) is as follows, where the original code is:

Source = Access.Database(File.Contents("D:\Excelguru Courses\_Example Files\pos.accdb")),
_tblChits = Source{[Schema="",Item="tblChits"]}[Data],
FilteredRows = Table.SelectRows(_tblChits, each ([POSCategoryCode] ="Beer"))

And the modified example where we’ve made it dynamic:

category = fnGetParameter("POSCatCode"),
Source = Access.Database(File.Contents("D:\Excelguru Courses\_Example Files\pos.accdb")),
_tblChits = Source{[Schema="",Item="tblChits"]}[Data],
FilteredRows = Table.SelectRows(_tblChits, each ([POSCategoryCode] =category))

In this case we are still making dynamic database queries, we’re just feeding Power Query the info in a way that’s a bit easier for us to construct/review, and Power Query will encourage the database to fold it up and execute a query that is equivalent to SELECT * FROM tblChits WHERE POSCategoryCode = “Beer”.

Which Route Should You Go?

Unless you’re a SQL Ninja, use the latter route.  We know that Power Query won’t fold into the most efficient query in all cases.  On the other hand, providing a native database query as your first step immediately shuts down query folding so that any subsequent steps can’t be folded.  So unless you can write wickedly efficient SQL, you could actually hurt your performance.

Connecting to the database, exploring your tables and using Power Query’s filtering is easier for most Excel pros anyway.  Easier to build, and easier to audit.  So why not go down the path of least resistance?

What the #? Literally…

I was chatting with my friend Scott of www.tinylizard.com today, discussing the structure of the M language, and how it’s kind of weird compared to VBA, C# and such. Well, “what the #?”, I figure.  I’m going to do a blog post on what the # sign actually signifies today, as it’s kind of confusing when you’re just starting with M.

A Little M Code:

I knocked up a bit of M code today to illustrate the challenge for someone just getting into this language.  Here’s a snap of the beginning of each line:

SNAGHTML1cfb30

Okay, so every line starts with something followed the = character… but…

What the #?

The question that I’m looking at specifically is… why do some lines start with # and some don’t?  (The second lines starts Source = and the fourth start ReplacedValue = but all others start with the # character.)

Literally…

The secret is all in the step name.  Check out the recorded steps for this query:

image

Notice anything about the two highlighted entries compared to the others?  No spaces!

When Power Query comes up against a space in the code it expects that there is another command or parameter coming next.  To deal with this, we need to tell Power Query that “Hey, I want you to literally use this…”  I’ve always heard this referred to as “escaping literals”, and M isn’t the only language that this happens in.

But it’s not sufficient to just use the # sign either.  It needs to be the # sign followed by two sets of quotes… kind of like how we refer to text in formulas in Excel.  So basically what you get is:

#”Filtered Rows” =

I Don’t Like It!

Yeah, me either.  I think it makes the code look horribly ugly.  It’s easy to avoid if you want to though, although it takes a little work.  Just right click the step, choose Rename, and remove the space:

image

At this point your M code will be updated to remove the literals, and you’ll be back to cleaner code.

You could also edit the M code manually to remove the # and the leading and trailing quotes.  If you do this, however, just make sure that you get all instances of them, as there will always be two in your recorded code, and possibly more if you’ve been manually tweaking it:

SNAGHTML744002

While I’m not a big fun of ugly code, I’ve also got to accept that I’m one of a very small percentage of users who will actually read it.  Most people will use the UI for this kind of stuff, so making it read easier there is probably the right design decision.

At any rate, now you know what the # character is all about.

Date Formats in Power Query

Date formats in Power Query are one of those little issues that drives me nuts… you have a query of different information in Power Query, at least one of the columns of which is a date.  But when you complete the query, it doesn’t show up as a date.  Why is this?

Demonstrating the Issue

Have a look at the following table from Excel, and how it loads in to Power Query:

SNAGHTML157a9ddf

That looks good… plainly it’s a date/time type in Power Query, correct?  But now let’s try an experiment.  Load this to the worksheet:

image

Why, when we have something that plainly renders as a date/time FROM a date format, are we getting the date serial number?  Yes, I’m aware that this is the true value in the original cell, but it’s pretty misleading, I think.

It gets even better

I’m going to modify this query to load to BOTH the worksheet and the Excel data model.  As soon as I do, the format of the Excel table changes:

image

Huh?  So what’s in Power Pivot then?

image

Curious… they match, but Power Pivot is formatted as Text, not a date?

(I’ve missed this in the past and spent HOURS trying to figure out why my time intelligence functions in Power Pivot weren’t working.  They LOOK so much like datetimes it’s hard to notice at first!)

Setting Date Formats in Power Query

When we go back and look at our Power Query, we can discover the source of the issue by looking at the Data Type on the Transform tab:

image

By default the date gets formatted as an “Any”.  What this means to you – as an Excel user – is that you could get anything out the other end.  No… that’s not quite true.  It means that it will be formatted as Text if Power Pivot is involved anywhere, or a Number if it isn’t.  I guess at least it’s consistent… sort of.

Fixing this issue is simple, it’s just annoying that we have to.  In order to take care of it we simply select the column in Power Query, then change the data type to Date.

Unfortunately it’s not good enough to just say that you’ve set it somewhere in the query.  I have seen scenarios where – even though a column was declared as a date – a later step gets it set back to Any.

Recommendations

I’ve been irritated by this enough that I now advise people to make it a habit to set the data types for all of their columns in the very last step of the query.  This ensures that you always know EXACTLY what is coming out after all of your hard work and eliminates any surprises.

Creating a VLOOKUP Function in Power Query

Tonight I decided to actually follow through on something I’d been musing about for a while:  building a full fledged VLOOKUP function in Power Query.  Why?  Yeah… that’s probably a good question!

Replicating VLOOKUP’s exact match is REALLY easy in Power Query.  You simply take two tables and merge them together.  But the approximate match is a little harder to do, since you don’t have matching records on each side to merge together.

Now, to be fair, you could go the route of building a kind of case statement, as Chris Webb has done here.  In actual fact, you probably should do that if you want something that is lean and mean, and the logic won’t change.  But what if you wanted to maintain a table in Excel that holds your lookup values, making it easy to update? Shouldn’t we be able to take that and use it just like a VLOOKUP with an approximate match?  I don’t see why not.  So here’s my take on it.

Practical Use Cases

I see this as having some helpful use cases.  They’ll mostly come from Excel users who are experienced with VLOOKUP and maintain lookup tables, and reach back to that familiarity.  And they would probably be tempted to do something like this:

image

The concern, of course, is that landing data in the worksheet during this cycle contributes to file size, memory usage and ultimately re-calc speed, so if you can avoid this step on the way to getting it into Power Pivot, you plainly want to do that.

The cool thing is that by building this the way I’ve done it, you’re not restricted to landing your data in the worksheet to use VLOOKUP with it.  You can pull data into Power Query from any source (csv, text file, database, web page) and perform your VLOOKUP against your Excel table without that worksheet round trip.

Let’s Take a Look…

Now, I AM going to use Excel based data for this, only because I have a specific scenario to demonstrate.  You can download a sample file – containing just the data – from this link.  (The completed file is also available at the end of the post.)

So, we have a series of numbers, and want to look them up in this table:

image

I really used my imagination for this one and called it “LookupTable”.  Remember that, as we need that name later.  Note also that the first record is 1, not 0.  This was done to demonstrate that an approximate match can return a #N/A value, as you’ll see in a minute.

Now here’s what things would look like using standard Excel VLOOKUP formulas against that table:

image

Hopefully this makes sense.  The formulas in columns 2, 3 and 4 are:

  • =VLOOKUP([@Values],LookupTable,2,TRUE)
  • =VLOOKUP([@Values],LookupTable,3)
  • =VLOOKUP([@Values],LookupTable,2,FALSE)

Just to recap the high points here… column 2 declares the final parameter as ,TRUE which will give us an approximate match.  Column 3 doesn’t declare the final parameter, which will default to ,TRUE and give an an approximate match.  Column 4 declares the final parameter as ,FALSE which means we’ll want an exact match.  The end result is that only one value matches, which is why we get all those #N/A results.

Standard VLOOKUP stuff so far, right?

Creating the VLOOKUP function in Power Query

Before we get to using the function, we need to create it.  To do that we’re going to go to:

  • Power Query –> From Other Sources –> Blank Query
  • View –> Advanced Editor

Highlight all the code in that window and replace it with this… (yes, it’s not short)

let pqVLOOKUP = (lookup_value as any, table_array as table, col_index_number as number, optional approximate_match as logical ) as any =>
let
/*Provide optional match if user didn't */
matchtype =
if approximate_match = null
then true
else approximate_match,

/*Get name of return column */
Cols = Table.ColumnNames(table_array),
ColTable = Table.FromList(Cols, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ColName_match = Record.Field(ColTable{0},"Column1"),
ColName_return = Record.Field(ColTable{col_index_number - 1},"Column1"),

/*Find closest match */
SortData = Table.Sort(table_array,{{ColName_match, Order.Descending}}),
RenameLookupCol = Table.RenameColumns(SortData,{{ColName_match, "Lookup"}}),
RemoveExcess = Table.SelectRows(RenameLookupCol, each [Lookup] <= lookup_value),
ClosestMatch=
if Table.IsEmpty(RemoveExcess)=true
then "#N/A"
else Record.Field(RemoveExcess{0},"Lookup"),

/*What should be returned in case of approximate match? */
ClosestReturn=
if Table.IsEmpty(RemoveExcess)=true
then "#N/A"
else Record.Field(RemoveExcess{0},ColName_return),

/*Modify result if we need an exact match */
Return =
if matchtype=true
then ClosestReturn
else
if lookup_value = ClosestMatch
then ClosestReturn
else "#N/A"
in Return
in pqVLOOKUP

Now:

All right… the function is there.  Now let’s go make use of it… (we’ll come back to how it works in a bit.)

Using the VLOOKUP function in Power Query

Now, before we go any further, I want to ask you a favour.  I need you to pretend for a second.  Pretend that the data we are connecting to next is a database, not an Excel table.  You’ll see how this can be useful if you’ll play along here.  (The only reason I’m using an Excel table for my source data is that it’s easier to share than a database.)

Let’s go click in the DataTable table.  (This one:)

image

Now, let’s upload this “database” into Power Query…

  • Go to Power Query –> From Table

You should have something like this now:

image

Funny how Power Query reads the #N/A values as errors, but whatever.  Let’s get rid of those columns so that we’re left with just the Values column.

  • Right click Values –> Remove Other Columns

Now, we’re going to make a really small M code edit.

  • Go to View –> Advanced Editor
  • Copy the second line (starts with Source =…)
  • Paste it immediately above the line you just copied
  • Modify it to read as follows:
    • Source –> LookupSource
    • DataTable –> LookupTable

Your M code should now look as follows:

image

  • Click Done

Nothing really appears to look different right now, but you’ll notice that you have an extra step called “LookupSource” on the right.  If you switch back and forth between that and Source, you’ll see we are looking at the original DataTable and the LookupTable.  The reason we do this is to make the next step really easy.

  • Go to Add Column –> Add Custom Column
  • Call the column 2 True
  • Enter the following formula:
    • pqVLOOKUP([Values],LookupSource,2,true)

Okay, so what’s what?

  • pqVLOOKUP is the name of our function we added above
  • [Values] is the value we want to look up
  • LookupSource is the table we want to look in to find our result
  • 2 is the column we want to return
  • true is defining that we want an approximate match

And, as you can see when you click OK, it works!

image

Let’s do the next two columns:

  • Go to Add Column –> Add Custom Column
  • Call the column 3 default
  • Enter the following formula:
    • pqVLOOKUP([Values],LookupSource,3)

So this time we asked for a return from the 3rd column, and we omitted the final parameter.  Notice that it defaulted to true for us:

image

Last one…

  • Go to Add Column –> Add Custom Column
  • Call the column 2 false
  • Enter the following formula:
    • pqVLOOKUP([Values],LookupSource,2,false)

And how about that, all but one comes back with #N/A:

image

And with that you can load this into a table in the worksheet:

image

Notice that the results are identical to that of the original Excel table, with one exception… the #N/A I have provided is text, not an equivalent to the =NA() function.

The completed file is available here.

How Does the VLOOKUP Function in Power Query Actually Work?

This VLOOKUP actually has some advantages over the VLOOKUP we all know and love.  The most important is that we don’t need to worry if the list is sorted or not, as the function takes care of it for you.  It essentially works like this:

  • Pull in the data table
  • Sort it descending by the first column
  • Remove all records greater than the value being searched for
  • Return the value in the requested column for the first remaining record UNLESS we asked for an Exact match
  • If we asked for an Exact match then it tests to see if the return is a match and returns #N/A if it’s not

Some key design principles I used here:

  • The parameters are all in EXACTLY the same order as Excel’s VLOOKUP
  • The required, optional and default parameters match what you already know and use in Excel
  • The function is dynamic in that it will work no matter what your lookup table column names are, how many rows or columns it has
  • It returns results that are in parallel with Excel’s output
  • The function is pretty much a drag’n’drop for your project.  The only thing you need to remember is to define the lookup table in the first part of your query

So how cool is that?  You love VLOOKUP, and you can now use it in Power Query to perform VLOOKUP’s from your Power Query sourced database queries against tables of Excel data without hitting the worksheet first!  (In fact, if your database has an approximate table, you could VLOOKUP from database table against database table!)

Slicers For Value Fields

Earlier this week I received an email asking for help with a Power Pivot model.  The issue was that the individual had built a model, and wanted to add slicers for value fields.  In other words, they’d built the DAX required to generate their output, and wanted to use those values in their slicers.  Which you can’t do.  Except maybe you can…  :)

My approach to solve this issue is to use Power Query to load my tables.  This gives me the ability to re-shape my data and load it into the data model the way I need it.  I’m not saying this is the only way, by any means, but it’s an approach that I find works for me.  Here’s how I looked at it in Excel 2013.  (For Excel 2010 users, you have to run your queries through the worksheet and into Power Pivot as a linked table.)

Background

The scenario we’re looking at is a door manufacturer.  They have a few different models of doors, each of which uses different materials in their production.  The question that we want to solve is “how many unique materials are used in the construction of each door?”  And secondarily, we then want to be able to filter the list by the number of materials used.

The first question is a classic Power Pivot question.  And the setup is basically as follows:

image

  • Create a PivotTable with models on rows and material on columns
  • Create a DAX measure to return the distinct count of materials:
    • DistinctMaterials:=  DISTINCTCOUNT(MaterialsList[material])
  • Add a little conditional formatting to the PivotTable if you want it to look like this:

image

The secret to the formatting is to select the values and set up an icon set.  Modify it to ensure that it is set up as follows:

image

Great stuff, we’ve got a nice looking Pivot, and you can see that our grand total on the right side is showing the correct count of materials used in fabricating each door.

Creating Slicers For Value Fields

Now, click in the middle of your Pivot, and choose to insert a slicer.  We want to slice by the DistinctMaterials measure that we created… except.. it's not available.  Grr…

image

Okay, it’s not surprising, but it is frustrating.  I’ve wanted this ability a lot, but it’s just not there.  Let’s see if we can use Power Query to help us with this issue.

Creating Queries via the Editor

We already have a great query that has all of our data, so it would be great if we could just build a query off of that.  We obviously need the original still, as the model needs that data to feed our pivot, but can we base a query off a query?  Sure we can!

  • In the Workbook Queries pane, right click the existing “MaterialsList” query and choose Edit.
  • You’ll be taken into the Power Query editor, and on the right side you’ll see this little collapsed “Queries” window trying to hide from you:

image

  • When you expand that arrow, you’ll see your existing query there!
  • Right click your MaterialsList query and choose “Reference”.

You’ve now got a new query that is referring to your original.  Awesome.  This will let us preserve our existing table in the Power Pivot data model, but reshape this table into the format that we need.

Building the Query we need

Let’s modify this puppy and get it into the format that will serve us.  First thing, we need to make sure it’s got a decent name…

  • On the right side, rename it to MaterialsCount

Now we need to narrow this down to a list of unique material/model combinations, then count them:

  • Go to Add Column –> Add Custom Column
  • Leave the default name of “Custom” and use the following formula:  [model]&[material]
  • Sort the model column in ascending order
  • Sort the material column in ascending oder

We’ve not got a nicely ordered list, but there’s a few duplicates in it.

SNAGHTML4800f6be[4]

Those won’t help, so let’s get rid of them:

  • Select the “Custom” column
  • Go to Home –> Remove Duplicates

Now, let’s get that Distinct Count we’re looking for:

  • Select the “model” column
  • Go to Transform –> Group By
  • Set up the Group By window to count distinct rows as follows:

image

Very cool!  We’ve now got a nice count of the number of distinct materials that are used in the production of each door.

The final step we need to do in Power Query is load this to the data model, so let’s do that now:

  • File –> Close & Load To…
  • Only create the connection and load it to the Data Model

Linking Things in Power Pivot

We now need to head into Power Pivot to link this new table into the Data Model structure.  Jump into the Manage window, and set up the relationships between the model fields of both tables:

image

And that’s really all we need to do here.  Let’s jump back out of Power Pivot.

Add Slicers for Value Fields

Let’s try this again now. Click in the middle of your Pivot and choose to insert a slicer.  We’ve certainly got more options than last time!  Choose both fields from the “MaterialsCount” table:

image

And look at that… we can now slice by the total number materials in each product!

image

Happy Holidays

I’m quite proud of the fact that I’ve managed to publish a new blog post every Wednesday since September 2014… among others from earlier in the year, but now it’s time to take a quick break.  So this will be the final blog post of 2014, basically to acknowledge that the blog will be closed, but I’ll be back on January 7th, 2015 with a new Power Query post.

Until then we wish you the very best of the holiday season, and a prosperous 2015.

:)

Treat Consecutive Delimiters As One

A couple of weeks back, I received a comment on my “Force Power Query to Import as a Text File” blog post.  Rudi mentioned that he’d really like to see a method that would let us Treat Consecutive Delimiters as One; something that is actually pretty easy to find in Excel’s old Import as Text file feature.

…the option for “Treat consecutive delimiters as one” should be made available too. The text import eventually imported but the info was all out of sync due to the tabbed nature of the data – names longer and shorter cause other data to not line up.

So let’s take a look at this issue today.  You can access and download the sample file I’ll be working with from this link.

The issue at hand

The contents of the text file, when viewed in Notepad, are fairly simple:

image

But the challenge shows up when we go to import it into Power Query.  Let’s do that now:

  • Power Query –> From File –> From Text
  • Browse to where you stored the file and double click it to Open it up

The first indication that we have an issue is that we’ve got a single column, the second is that it’s a bit scattered:

image

Well no problem, it’s a Tab delimited file, so we’ll just split it by Tabs, and all should be good, right?

  • Home –> Split Column –> By Delimiter
  • Choose Tab and click OK

image

Uh oh…  What happened here?

The issue is that we’ve got multiple delimiters acting between fields here.  They look like one, but they’re not.  Between Date and Vendor, for example, there are two quotes.  But between the Date values and the Vendor values only one Tab is needed to line it up in the text document.  The result is this crazy mess.

Approaches to the Issue

I can see three potential routes to deal with this problem:

  1. We could replace all instances of 2 Tab’s with a single Tab.  We’d need to do that a few times to ensure that we don’t turn 4 Tabs into 2 and leave it though.
  2. We could write a function to try and handle this.  I’m sure that this can be done, although I decided not to go that route this time.
  3. We could split the columns using the method that I’m going to outline below.

Before we start, let’s kill off the last two steps in the “Applied Steps” section, and get back to the raw import that we had at the very beginning.  (Delete all steps except the “Source” step in the Applied Steps window.)

How to Treat Consecutive Delimiters As One

The method to do this is a three step process:

  1. Split by the Left Most delimiter (only)
  2. Trim the new column
  3. Repeat until all columns are separated

So let’s do it:

  • Select Column1
  • Split Column –> By Delimiter –> Tab –> At the left-most delimiter

image

Good, but see that blank space before Vendor?  That’s one of those delimiters that we don’t want.  But check what happens when you trim it…

  • Select Column1.2
  • Transform—>Format –> Trim

And look at that… it’s gone!

image

Let’s try this again:

  • Select Column1.2
  • Split Column –> By Delimiter –> Tab –> At the left-most delimiter

image

And trim the resulting column again:

  • Select Column1.2.2
  • Transform—>Format –> Trim

And we’re good.  Now to just do a final bit of cleanup:

  • Transform –> Use First Row as Headers
  • Name the Query

And we’re finished:

image

Easy as Pie?

In this case, yes it was.  Easy, if a bit painful.  But what about spaces?  If this file was space delimited I couldn’t have done this, as my Vendors all have spaces in them too.  So then what?

I’d modify my procedure just a bit:

  1. Replace all instances of double spaces with the | character
  2. Split by the left most | character
  3. Replace all | characters with spaces
  4. Trim the new column
  5. Repeat until finished

Final Thought

I haven’t received too much in the way of data like this, as most of my systems dump data into properly delimited text or csv files, but I can certainly see where this is an issue for people.  So I totally agree that it would be nice if there were an easier way to treat consecutive delimiters as one in Power Query.

I do like the versatility of the choices we have currently, but adding this as another option that works in combination with the existing ones would be fantastic.

Power Query team: here’s my suggestion…

image

:)

Force Power Query to Import as a Text File

I’ve run into this issue in the past, and also got an email about this issue this past week as well, so I figured it’s worth taking a look.  Power Query takes certain liberties when importing a file, assuming it knows what type of file it is.  The problem is that sometimes this doesn’t work as expected, and you need to be able to force Power Query to import as a text file, not the file format that Power Query assumes you have.

IT standards are generally a beautiful thing, especially in programming, as you can rely on them, knowing that certain rules will always be followed.  CSV files are a prime example of this, and we should be able to assume that any CSV file will contain a list of Comma Separated Values, one record per line, followed by a new line character.  Awesome… until some bright spark decides to inject a line or two of information above the CSV contents which doesn’t contain any commas.  (If you do that, please stop.  It is NOT A GOOD IDEA.)

The Issue in the Real World

If you’d like to follow along, you can click here to download MalformedCSV.csv (the sample file).

If you open the sample file in Notepad, you’ll see that it contains the following rows:

SNAGHTML35a6fc

Notice the first row… that’s where our issue is.  There are no commas.  Yet when you look at the data in the rows below, they are plainly separated by commas.  Well yay, so what, right?  Who cares about the guts of a CSV file?  The answer is “you” if you ever get one that is built like this…

Let’s try importing the sample file into Power Query:

  • Power Query –> From File –> From CSV
  • Browse to MalformedCSV.csv

And the result is as follows:

SNAGHTML390f9a

One header, and lots of errors.  Not so good!

The Source Of The Error

If I click the white space beside one of those errors, I get this:

image

What the heck does that even mean?

Remember that CSV is a standard.  Every comma indicates a column break, every carriage return a new line.  And we also know that every CSV file has a consistent number of columns (and therefore commas) on every row.  (That’s why you’ll see records in some CSV’s that read ,, – because there isn’t anything for that record, but we still need the same number of commas to denote the columns.

And now some joker builds us a file masquerading as a CSV that really isn’t.  In this case:

  • Our first row has no commas before the line feed.  We therefore must have a one column table.  Power Query sets us up for a one column table.
  • But our second row has three commas, which means three columns… That’s not the same number of columns as our header row, so Power Query freaks out and throws an error for every subsequent record.

So Now What?

If we can’t rely on the file format being correct, why don’t we just import it as a text file?  That would allow us to bring all the rows in, remove the first one, then split by commas.  That sounds like a good plan.  Let’s do it.

  • Power Query –> From File –> From Text
  • Browse to the folder that holds MalformedCSV.csv

Uh oh… our file is not showing.  Ah… we’re filtered to only show *.txt and *.prn files…

  • Click the file filter list in the bottom right and change “Text File (*.txt;*.prn)” to “All Files (*.*)”
  • Open MalformedCSV.csv

And the result…

SNAGHTML390f9a

Damn.  See, Power Query is too smart.  It looks at the file and says “Hey!  That’s not a text file, it’s a CSV file!” and then imports it as a CSV file... which we already know has issues.  Grr…

Force Power Query to Import as a Text File

Let’s try this again, this time from scratch.  We need two things here…

  1. The full file path to the file you want to import (including the file extension).  On my system it is "D:\Test\MalformedCSV.csv"
  2. A little bit of a code template, which is conveniently included below.

What we’re going to do is this:

  • Go to Power Query –> From Other Sources –> Blank Query
  • View –> Advanced Editor
  • Paste in the following code

let
/* Get the raw line by line contents of the file, preventing PQ from interpreting it */
fnRawFileContents = (fullpath as text) as table =>
let
Value = Table.FromList(Lines.FromBinary(File.Contents(fullpath)),Splitter.SplitByNothing())
in Value,

/* Use function to load file contents */
Source = fnRawFileContents("D:\Test\MalformedCSV.csv")

in
Source

  • Update the file path in the “Source” step to the path to your file.
  • Click Done

And the output is remarkably different… in fact, it’s the entire contents of the file!

SNAGHTML4af3ce

This is awesome, as we now have the ability to clean up our data and use it as we were hoping to do from the beginning!  So let’s do just that…starting with killing off that first line that’s been screwing us up:

  • Home –> Remove Rows –> Remove Top Rows –> 1
  • Transform –> Split Column By Delimiter –> Comma –> At each occurrence of the delimiter
  • Transform –> User First Row As Headers

And now we’ve got a pretty nice table of data that actually looks like our CSV file:

SNAGHTMLa3388b

The final steps to wrap this up would essentially be

  • set our data types,
  • name the query. and
  • load the query to the final destination.

Final Thoughts

This seems too hard.  I know that Power Query is designed for ease of use, and to that end it’s great that it can and does make assumptions about your file.  Most of the time it gets it exactly right and there is no issue.  But when things do go wrong it's really hard to get back to a raw import format so that we can take control.

I really think there should be an easy and discoverable way to do a raw import of data without making import/formatting assumptions.  A button for “Raw Text Import” would be a useful addition for those scenarios where stuff goes wrong and needs a careful hand.

I should also mention that this function will work on txt files or prn files as well.  In fact, it also works on Excel files to some degree, although the results aren’t exactly useful!

image

The key here is, whether caused by one of more extra header lines in a csv file, tab delimited, colon delimited or any other kind of delimited file, the small function template above will help you get past that dreaded message that reads “DataFormat.Error:  There were more columns in the result than expected.”

Addendum To This Post

Miguel Escobar has recorded a nice video of the way to do this without using any code.  You can find that here:

 

 

Building a Parameter Table for Power Query

One of the things that I’ve complained about in the past is that there is no built in way for Power Query to be able to pull up the path for the current workbook.  Today I’m going to show you how I solved that problem by building a parameter table for Power Query in Excel, then link it into my queries.

Quick Note!  WordPress is determined to drive me crazy and is replacing all of my straight quotes (SHIFT + ') with curly quotes.  Since curly quotes are different characters, they will cause issues if you copy/paste the code directly.  I'll get this sorted out, but in the mean time, just make sure you replace all "curly quote" characters with straight quotes.

To do this successfully, we need to pieces; an Excel table, and a Power Query function.  So let’s get to it.

Building a Parameter Table

The table is quite simple, really.  It’s a proper Excel Table, and it has a header row with the following two columns:

  • Parameter
  • Value

Once created, I also make sure that I go to the Table Tools tab, and rename the table to “Parameters”.

SNAGHTML192eb95

Pretty bare bones, but as you’ll see, this becomes very useful.

Now we add something to the table that we might need.  Since I’ve mentioned it, let’s work out the file path:

  • A8:     File Path
  • B8:     =LEFT(CELL("filename",B6),FIND("[",CELL("filename",B6),1)-1)

Now, as you can see, column A essentially gives us a “friendly name” for our parameter, where the value ends up in the second column:

SNAGHTML195b12d

While we’re here, let’s add another parameter that we might have use for.  Maybe I want to base my power query reports off the data for the current day.  Let’s inject that as well:

  • A9:     Start Date
  • B9:     =TODAY()

SNAGHTML197a4ef

Good stuff.  We’ve now got a table with a couple of useful parameters that we might want when we’re building a query.

Adding the Parameter Function

Next, we’re going to add the function that we can reference later.  To do that:

  • Go to Power Query –> From Other Sources –> Blank Query
  • Go to View –> Advanced Editor
  • Replace all the code with the following:

(ParameterName as text) =>
let
ParamSource = Excel.CurrentWorkbook(){[Name="Parameters"]}[Content],
ParamRow = Table.SelectRows(ParamSource, each ([Parameter] = ParameterName)),
Value=
if Table.IsEmpty(ParamRow)=true
then null
else Record.Field(ParamRow{0},"Value")
in
Value

  • Click Done
  • Rename the function to “fnGetParameter”
  • Go to Home –> Close & Load To…
  • Choose to create the connection only, avoiding loading to a table or the Data Model

And just in case you think this means it didn’t work, we expect to see that it didn’t load in the queries window:

image

Making Use of the Parameter Table

Okay, so now that we have this in place, how do we go about using it?

Let’s create a ridiculously simple table:

SNAGHTML1a313c0

Now, click in the table and go to Power Query –> From Table.

We’ll be taken into the Power Query window and will be looking at our very simple data.  Let’s pull the contents of my parameters into columns:

  • Go to Add Column –> Add Custom Column
  • Change the title to “File Path”
  • Enter the following formula: =fnGetParameter("File Path")

Check it out!

image

Do you see that?  This is the path to the folder that holds the workbook on my system.  The formula we used in the table retrieves that, and I can pass it in to Power Query, then reference it as needed!

How about we do the same for our date?

  • Go to Add Column –> Add Custom Column
  • Change the title to “First date”
  • Enter the following formula: =fnGetParameter(“Start Date”)

image

The key here is to make sure that the value passed to the parameter function is spelled (and cased) the same as the entry in the first column of the parameter table.  I.e. You could use “FilePath”, “File Path”, “Folder”, “My File Path” or whatever, so long as that’s the name you gave it in the first column of the Parameters Excel table.

And what happens if you pass an invalid value?  Say you ask for fnGetParameter(“Moldy Cheese”) and it’s not in the table?  Simple, you’ll get null returned instead.  :)

Implications of Building a Parameter Table for Power Query

The implications for this technique are huge.  Consider this scenario… you create your workbook, and store it in a folder.  But within that folder you have a subfolder called “Data”.  Your intention is to store all of your csv files in that folder.  And, for argument’s sake, let’s say that it’s a mapped drive, with the path to your solution being “H:\My Solution\”

No problem, you build it all up, and it’s working great.  You keep dropping your text files in the data folder, and you can consolidate them with some M code like this:

let
Source = Folder.Files("H:\My Solution\Data"),
#"Combined Binaries" = Binary.Combine(Source[Content]),
#"Imported CSV" = Csv.Document(#"Combined Binaries",null,",",null,1252),
#"First Row as Header" = Table.PromoteHeaders(#"Imported CSV")
in
#"First Row as Header"

Things run along for ages, and that’s awesome, but then you need to go on vacation.  No worries, it’s Power Query and easy to use, you can just get your co-worker to update it… except… on your co-worker’s machine that same drive is mapped not to the H:\ drive, but the F:\ drive.  Doh!

We could recode the path, but what a pain.  So how about we use the parameter table to make this more robust so that we don’t have to?  All we need to do is modify the first couple of lines of the query.  We’ll pull in a variable to retrieve the file path from our parameter table, then stuff that into the file path, like this:

let
SolutionPath = fnGetParameter("File Path"),
    Source = Folder.Files(SolutionPath & "Data"),
#"Combined Binaries" = Binary.Combine(Source[Content]),
#"Imported CSV" = Csv.Document(#"Combined Binaries",null,",",null,1252),
#"First Row as Header" = Table.PromoteHeaders(#"Imported CSV")
in
#"First Row as Header"

How awesome is that?  Even better, the SolutionPath shows as a step in the Applied Steps section.  That means you can select it and make sure the value is showing up as you’d expect!

Practical Use Case

Several months back I built a solution for a client where we stored his Power Query solution in a folder, and we had data folders that were created on a bi-weekly basis.  Each of those folders were named based on the pay period end data (in ISO’s yyyy-mm-dd format), and were stored in a path relative to the master solution.

Naturally, I needed a way to make the queries dynamic to the master folder path, as I did the development in my home office, then emailed the updated solution files to him in his New York office.  He had different drive mappings than I do, and his team had different drive mappings than he did.  With half a dozen Power Queries in the solution, having them manually update the code each time a new user wanted to work with the file just wasn’t an option.

This technique became invaluable for making the solution portable.  In addition, by having a formula to generate the correct end date, we could also pull the data files from the right place as well.

I still want a Power Query CurrentWorkbook.FilePath method in M, but even if I get it this technique is still super useful, as there will always be some dynamic parameter I need to send up to Power Query.

I hope you find this as useful as I have.

Combine Multiple Worksheets Using Power Query

In last week’s post we looked at how to combine multiple files together using Power Query.  This week we’re going to stay within the same workbook, and combine multiple worksheets using Power Query.

Background Scenario

Let’s consider a case where the user has been creating a transactional history in an Excel file.  It is all structured as per the image below, but resides across multiple worksheets; one for each month:

image

As you can see, they’ve carefully named each sheet with the month and year.  But unfortunately, they haven’t formatted any of the data using Excel tables.

Now the file lands in our hands (you can download a copy here if you’d like to follow along,) and we’d like to turn this into one consolidated table so that we can do some analysis on it.

Accessing Worksheets in Power Query

Naturally we’re going to reach to Power Query to do this, but how do we get started?  We could just go and format the data on each worksheet as a table, but what if there were hundreds?  That would take way too much work!

But so far we’ve only seen how to pull Tables, Named Ranges or files into Power Query.  How do we get at the worksheets?

Basically, we’re going to start with two lines of code:

  • Go to Power Query –> From Other Sources –> Blank Query
  • View –> Advanced Editor

You’ll now see the following blank query:

let
Source = ""
in
Source

What we need to do is replace the second line (Source = “”) with the following two lines of code:

FullFilePath = "D:\Temp\Combine Worksheets.xlsx",
Source = Excel.Workbook(File.Contents(FullFilePath))

Of course, you’ll want to update the path to the full file path for where the file is saved on your system.

Once you click Done, you should see the following:

image

Cool!  We’ve got a list of all the worksheets in the file!

Consolidating the Worksheets

The next step is to prep the fields we want to preserve as we combine the worksheets.  Obviously the Name and Item columns are redundant, so let’s do a bit of cleanup here.

  • Remove the Kind column
  • Select the Name column –> Transform –> Data Type –> Date
  • Select the Name column –> Transform –> Date –> Month –> End of Month
  • Rename the Name column to “Date”

At this point, the query should look like so:

image

Next we’ll click the little double headed arrow to the top right of the data column to expand our records, and commit to expanding all the columns offered:

SNAGHTML52dcfe5

Hmm… well that’s a bit irritating.  It looks like we’re going to need to promote the top row to headers, but that means we’re going to overwrite the Date column header in column 1.  Oh well, nothing to be done about it now, so:

  • Transform –> Use First Row As Headers –> Use First Row As Headers
  • Rename Column1 (the header won’t accept 1/31/2008 as a column name) to “Date” again
  • Rename the Jan 2008 column (far right) to “Original Worksheet”

Final Cleanup

We’re almost done, but let’s just do a bit of final cleanup here.  As we set the data types correctly, let’s also make sure that we remove any errors that might come up from invalid data types.

  • Select the Date column
  • Home –> Remove Errors
  • Set Account and Dept to Text
  • Set Amount to Decimal Number
  • Select the Amount column
  • Home –> Remove Errors
  • Set Original Worksheet to Text

Rename the query to “Consolidated”, and load it to a worksheet.

Something Odd

Before you do anything else, Save the File.

To be fair, our query has enough safe guards in it that we don’t actually have to do this, but I always like to play it safe.  Let’s review the completed query…

Edit the Consolidated query, and step into the Source line step.  Check out that preview pane:

image

Interesting… two more objects!  This makes sense, as we created a new table and worksheet when we retrieved this into a worksheet.  We need to filter those out.

Getting rid of the table is easy:

  • Select the drop down arrow on the Kind column
  • Uncheck “Table”, then confirm when asked if you’d like to insert a step

Select the next couple of steps as well, and take a look at the output as you do.

Aha!  When you hit the “ChangedType” step, something useful happens… we generate an error:

image

Let’s remove that error from the Name column.

  • Select the Name column –> Home –> Remove Errors

And we’re done.  We’ve managed to successfully combine all the data worksheets in our file into one big table!

Some Thoughts

This method creates a bit of a loop in that I’m essentially having to reach outside Excel to open a copy of the workbook to pull the sheet listing in.  And it causes issues for us, since Power Query only reads from the last save point of the external file we’re connecting to (in this case this very workbook.)  I’d way rather have an Excel.CurrentWorkbook() style method to read from inside the file, but unfortunately that method won’t let you read your worksheets.

It would also be super handy to have an Excel.CurrentWorkbookPath() method.  Hard coding the path here is a real challenge if you move the file.  I’ve asked Microsoft for this, but if you think it is a good idea as well, please leave a comment on the post.  (They’ll only count one vote from me, but they’ll count yours if you leave it here!)