What the #? Literally…

I was chatting with my friend Scott of www.tinylizard.com today, discussing the structure of the M language, and how it’s kind of weird compared to VBA, C# and such. Well, “what the #?”, I figure.  I’m going to do a blog post on what the # sign actually signifies today, as it’s kind of confusing when you’re just starting with M.

A Little M Code:

I knocked up a bit of M code today to illustrate the challenge for someone just getting into this language.  Here’s a snap of the beginning of each line:

SNAGHTML1cfb30

Okay, so every line starts with something followed the = character… but…

What the #?

The question that I’m looking at specifically is… why do some lines start with # and some don’t?  (The second lines starts Source = and the fourth start ReplacedValue = but all others start with the # character.)

Literally…

The secret is all in the step name.  Check out the recorded steps for this query:

image

Notice anything about the two highlighted entries compared to the others?  No spaces!

When Power Query comes up against a space in the code it expects that there is another command or parameter coming next.  To deal with this, we need to tell Power Query that “Hey, I want you to literally use this…”  I’ve always heard this referred to as “escaping literals”, and M isn’t the only language that this happens in.

But it’s not sufficient to just use the # sign either.  It needs to be the # sign followed by two sets of quotes… kind of like how we refer to text in formulas in Excel.  So basically what you get is:

#”Filtered Rows” =

I Don’t Like It!

Yeah, me either.  I think it makes the code look horribly ugly.  It’s easy to avoid if you want to though, although it takes a little work.  Just right click the step, choose Rename, and remove the space:

image

At this point your M code will be updated to remove the literals, and you’ll be back to cleaner code.

You could also edit the M code manually to remove the # and the leading and trailing quotes.  If you do this, however, just make sure that you get all instances of them, as there will always be two in your recorded code, and possibly more if you’ve been manually tweaking it:

SNAGHTML744002

While I’m not a big fun of ugly code, I’ve also got to accept that I’m one of a very small percentage of users who will actually read it.  Most people will use the UI for this kind of stuff, so making it read easier there is probably the right design decision.

At any rate, now you know what the # character is all about.

Date Formats in Power Query

Date formats in Power Query are one of those little issues that drives me nuts… you have a query of different information in Power Query, at least one of the columns of which is a date.  But when you complete the query, it doesn’t show up as a date.  Why is this?

Demonstrating the Issue

Have a look at the following table from Excel, and how it loads in to Power Query:

SNAGHTML157a9ddf

That looks good… plainly it’s a date/time type in Power Query, correct?  But now let’s try an experiment.  Load this to the worksheet:

image

Why, when we have something that plainly renders as a date/time FROM a date format, are we getting the date serial number?  Yes, I’m aware that this is the true value in the original cell, but it’s pretty misleading, I think.

It gets even better

I’m going to modify this query to load to BOTH the worksheet and the Excel data model.  As soon as I do, the format of the Excel table changes:

image

Huh?  So what’s in Power Pivot then?

image

Curious… they match, but Power Pivot is formatted as Text, not a date?

(I’ve missed this in the past and spent HOURS trying to figure out why my time intelligence functions in Power Pivot weren’t working.  They LOOK so much like datetimes it’s hard to notice at first!)

Setting Date Formats in Power Query

When we go back and look at our Power Query, we can discover the source of the issue by looking at the Data Type on the Transform tab:

image

By default the date gets formatted as an “Any”.  What this means to you – as an Excel user – is that you could get anything out the other end.  No… that’s not quite true.  It means that it will be formatted as Text if Power Pivot is involved anywhere, or a Number if it isn’t.  I guess at least it’s consistent… sort of.

Fixing this issue is simple, it’s just annoying that we have to.  In order to take care of it we simply select the column in Power Query, then change the data type to Date.

Unfortunately it’s not good enough to just say that you’ve set it somewhere in the query.  I have seen scenarios where – even though a column was declared as a date – a later step gets it set back to Any.

Recommendations

I’ve been irritated by this enough that I now advise people to make it a habit to set the data types for all of their columns in the very last step of the query.  This ensures that you always know EXACTLY what is coming out after all of your hard work and eliminates any surprises.

Creating a VLOOKUP Function in Power Query

Tonight I decided to actually follow through on something I’d been musing about for a while:  building a full fledged VLOOKUP function in Power Query.  Why?  Yeah… that’s probably a good question!

Replicating VLOOKUP’s exact match is REALLY easy in Power Query.  You simply take two tables and merge them together.  But the approximate match is a little harder to do, since you don’t have matching records on each side to merge together.

Now, to be fair, you could go the route of building a kind of case statement, as Chris Webb has done here.  In actual fact, you probably should do that if you want something that is lean and mean, and the logic won’t change.  But what if you wanted to maintain a table in Excel that holds your lookup values, making it easy to update? Shouldn’t we be able to take that and use it just like a VLOOKUP with an approximate match?  I don’t see why not.  So here’s my take on it.

Practical Use Cases

I see this as having some helpful use cases.  They’ll mostly come from Excel users who are experienced with VLOOKUP and maintain lookup tables, and reach back to that familiarity.  And they would probably be tempted to do something like this:

image

The concern, of course, is that landing data in the worksheet during this cycle contributes to file size, memory usage and ultimately re-calc speed, so if you can avoid this step on the way to getting it into Power Pivot, you plainly want to do that.

The cool thing is that by building this the way I’ve done it, you’re not restricted to landing your data in the worksheet to use VLOOKUP with it.  You can pull data into Power Query from any source (csv, text file, database, web page) and perform your VLOOKUP against your Excel table without that worksheet round trip.

Let’s Take a Look…

Now, I AM going to use Excel based data for this, only because I have a specific scenario to demonstrate.  You can download a sample file – containing just the data – from this link.  (The completed file is also available at the end of the post.)

So, we have a series of numbers, and want to look them up in this table:

image

I really used my imagination for this one and called it “LookupTable”.  Remember that, as we need that name later.  Note also that the first record is 1, not 0.  This was done to demonstrate that an approximate match can return a #N/A value, as you’ll see in a minute.

Now here’s what things would look like using standard Excel VLOOKUP formulas against that table:

image

Hopefully this makes sense.  The formulas in columns 2, 3 and 4 are:

  • =VLOOKUP([@Values],LookupTable,2,TRUE)
  • =VLOOKUP([@Values],LookupTable,3)
  • =VLOOKUP([@Values],LookupTable,2,FALSE)

Just to recap the high points here… column 2 declares the final parameter as ,TRUE which will give us an approximate match.  Column 3 doesn’t declare the final parameter, which will default to ,TRUE and give an an approximate match.  Column 4 declares the final parameter as ,FALSE which means we’ll want an exact match.  The end result is that only one value matches, which is why we get all those #N/A results.

Standard VLOOKUP stuff so far, right?

Creating the VLOOKUP function in Power Query

Before we get to using the function, we need to create it.  To do that we’re going to go to:

  • Power Query –> From Other Sources –> Blank Query
  • View –> Advanced Editor

Highlight all the code in that window and replace it with this… (yes, it’s not short)

let pqVLOOKUP = (lookup_value as any, table_array as table, col_index_number as number, optional approximate_match as logical ) as any =>
let
/*Provide optional match if user didn't */
matchtype =
if approximate_match = null
then true
else approximate_match,

/*Get name of return column */
Cols = Table.ColumnNames(table_array),
ColTable = Table.FromList(Cols, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ColName_match = Record.Field(ColTable{0},"Column1"),
ColName_return = Record.Field(ColTable{col_index_number - 1},"Column1"),

/*Find closest match */
SortData = Table.Sort(table_array,{{ColName_match, Order.Descending}}),
RenameLookupCol = Table.RenameColumns(SortData,{{ColName_match, "Lookup"}}),
RemoveExcess = Table.SelectRows(RenameLookupCol, each [Lookup] <= lookup_value),
ClosestMatch=
if Table.IsEmpty(RemoveExcess)=true
then "#N/A"
else Record.Field(RemoveExcess{0},"Lookup"),

/*What should be returned in case of approximate match? */
ClosestReturn=
if Table.IsEmpty(RemoveExcess)=true
then "#N/A"
else Record.Field(RemoveExcess{0},ColName_return),

/*Modify result if we need an exact match */
Return =
if matchtype=true
then ClosestReturn
else
if lookup_value = ClosestMatch
then ClosestReturn
else "#N/A"
in Return
in pqVLOOKUP

Now:

All right… the function is there.  Now let’s go make use of it… (we’ll come back to how it works in a bit.)

Using the VLOOKUP function in Power Query

Now, before we go any further, I want to ask you a favour.  I need you to pretend for a second.  Pretend that the data we are connecting to next is a database, not an Excel table.  You’ll see how this can be useful if you’ll play along here.  (The only reason I’m using an Excel table for my source data is that it’s easier to share than a database.)

Let’s go click in the DataTable table.  (This one:)

image

Now, let’s upload this “database” into Power Query…

  • Go to Power Query –> From Table

You should have something like this now:

image

Funny how Power Query reads the #N/A values as errors, but whatever.  Let’s get rid of those columns so that we’re left with just the Values column.

  • Right click Values –> Remove Other Columns

Now, we’re going to make a really small M code edit.

  • Go to View –> Advanced Editor
  • Copy the second line (starts with Source =…)
  • Paste it immediately above the line you just copied
  • Modify it to read as follows:
    • Source –> LookupSource
    • DataTable –> LookupTable

Your M code should now look as follows:

image

  • Click Done

Nothing really appears to look different right now, but you’ll notice that you have an extra step called “LookupSource” on the right.  If you switch back and forth between that and Source, you’ll see we are looking at the original DataTable and the LookupTable.  The reason we do this is to make the next step really easy.

  • Go to Add Column –> Add Custom Column
  • Call the column 2 True
  • Enter the following formula:
    • pqVLOOKUP([Values],LookupSource,2,true)

Okay, so what’s what?

  • pqVLOOKUP is the name of our function we added above
  • [Values] is the value we want to look up
  • LookupSource is the table we want to look in to find our result
  • 2 is the column we want to return
  • true is defining that we want an approximate match

And, as you can see when you click OK, it works!

image

Let’s do the next two columns:

  • Go to Add Column –> Add Custom Column
  • Call the column 3 default
  • Enter the following formula:
    • pqVLOOKUP([Values],LookupSource,3)

So this time we asked for a return from the 3rd column, and we omitted the final parameter.  Notice that it defaulted to true for us:

image

Last one…

  • Go to Add Column –> Add Custom Column
  • Call the column 2 false
  • Enter the following formula:
    • pqVLOOKUP([Values],LookupSource,2,false)

And how about that, all but one comes back with #N/A:

image

And with that you can load this into a table in the worksheet:

image

Notice that the results are identical to that of the original Excel table, with one exception… the #N/A I have provided is text, not an equivalent to the =NA() function.

The completed file is available here.

How Does the VLOOKUP Function in Power Query Actually Work?

This VLOOKUP actually has some advantages over the VLOOKUP we all know and love.  The most important is that we don’t need to worry if the list is sorted or not, as the function takes care of it for you.  It essentially works like this:

  • Pull in the data table
  • Sort it descending by the first column
  • Remove all records greater than the value being searched for
  • Return the value in the requested column for the first remaining record UNLESS we asked for an Exact match
  • If we asked for an Exact match then it tests to see if the return is a match and returns #N/A if it’s not

Some key design principles I used here:

  • The parameters are all in EXACTLY the same order as Excel’s VLOOKUP
  • The required, optional and default parameters match what you already know and use in Excel
  • The function is dynamic in that it will work no matter what your lookup table column names are, how many rows or columns it has
  • It returns results that are in parallel with Excel’s output
  • The function is pretty much a drag’n’drop for your project.  The only thing you need to remember is to define the lookup table in the first part of your query

So how cool is that?  You love VLOOKUP, and you can now use it in Power Query to perform VLOOKUP’s from your Power Query sourced database queries against tables of Excel data without hitting the worksheet first!  (In fact, if your database has an approximate table, you could VLOOKUP from database table against database table!)

How to Reference other Power Query queries

One of the things I really like to do with Power Query is shape data into optimized tables. In order to accomplish that goal, I’ve begun using Power Query to source data over Power Pivot’s built in methods. But in order to build things the way I want, I need an easy way to reference other power query queries.

Why would I go to the effort of feeding through Power Query first? I’m no SQL ninja, and I find Power Query allows me to easily re-shape data in ways that would be hard with my SQL knowledge. I can leverage this new tool to optimize my tables and build Power Pivot solutions that require less tricky and funky DAX measures to compensate for less than ideal data structure. (I’d rather have easy to understand relationships and simple DAX measures!)

Methodology

My methodology generally goes something like this:

  • Load a base table into a Power Query. I then set it to only create a connection. Let’s call this my Base Connection.
  • Next I’ll create as many queries as I need to re-shape the data in the Base Connection into the forms I need, then load those into the data model.

It’s that second part that is key. I need to be able to reference other Power Query queries (namely my Base Connection) so that I could prune/trim/re-shape the data.

Reference other Power Query queries - The Old Way

Until recently, I would create my Base Connection, then I’d do the following to create the new query to reference that one.

  • Go to the Power Query tab
  • Show the Workbook Queries pane
  • Right click the Base Connection query and choose Reference

The problem was this… my intention was to reference and customize my query. Instead, it immediately loads it into a worksheet. I have to wait for that to finish before I can edit the new query and customize it the way I want.

Reference other Power Query queries - The New Way

I learned a new method last week from one of the Power Query team members which is much better (thanks Miguel!). I included it in my last post, but I thought this was worth calling out on its own.

Instead of following the method above, this time we will:

  • Go to the Power Query tab
  • Show the Workbook Queries pane
  • Right click the Base Connection query and Edit

Now we’re taken into the Power Query window. On the left side we can see a collapsed “Queries” pane. When you expand that, you get a list of all Power Queries in the workbook.

image_thumb13[1]

  • Right click the Base Connection query and choose “Reference”

We now have a new query in the editor that we can edit, without loading it into a worksheet first. Faster, and more in line with my goals.

The other thing I like about this method is that it immediately gives me access to that queries pane. Why is that important? Because I can drill through the other queries and get at their M code without having to close the window and go back to Excel first. So if I have some funky M code I need to re-use, it makes it way easier to review it and copy it.

Slicers For Value Fields

Earlier this week I received an email asking for help with a Power Pivot model.  The issue was that the individual had built a model, and wanted to add slicers for value fields.  In other words, they’d built the DAX required to generate their output, and wanted to use those values in their slicers.  Which you can’t do.  Except maybe you can…  :)

My approach to solve this issue is to use Power Query to load my tables.  This gives me the ability to re-shape my data and load it into the data model the way I need it.  I’m not saying this is the only way, by any means, but it’s an approach that I find works for me.  Here’s how I looked at it in Excel 2013.  (For Excel 2010 users, you have to run your queries through the worksheet and into Power Pivot as a linked table.)

Background

The scenario we’re looking at is a door manufacturer.  They have a few different models of doors, each of which uses different materials in their production.  The question that we want to solve is “how many unique materials are used in the construction of each door?”  And secondarily, we then want to be able to filter the list by the number of materials used.

The first question is a classic Power Pivot question.  And the setup is basically as follows:

image

  • Create a PivotTable with models on rows and material on columns
  • Create a DAX measure to return the distinct count of materials:
    • DistinctMaterials:=  DISTINCTCOUNT(MaterialsList[material])
  • Add a little conditional formatting to the PivotTable if you want it to look like this:

image

The secret to the formatting is to select the values and set up an icon set.  Modify it to ensure that it is set up as follows:

image

Great stuff, we’ve got a nice looking Pivot, and you can see that our grand total on the right side is showing the correct count of materials used in fabricating each door.

Creating Slicers For Value Fields

Now, click in the middle of your Pivot, and choose to insert a slicer.  We want to slice by the DistinctMaterials measure that we created… except.. it's not available.  Grr…

image

Okay, it’s not surprising, but it is frustrating.  I’ve wanted this ability a lot, but it’s just not there.  Let’s see if we can use Power Query to help us with this issue.

Creating Queries via the Editor

We already have a great query that has all of our data, so it would be great if we could just build a query off of that.  We obviously need the original still, as the model needs that data to feed our pivot, but can we base a query off a query?  Sure we can!

  • In the Workbook Queries pane, right click the existing “MaterialsList” query and choose Edit.
  • You’ll be taken into the Power Query editor, and on the right side you’ll see this little collapsed “Queries” window trying to hide from you:

image

  • When you expand that arrow, you’ll see your existing query there!
  • Right click your MaterialsList query and choose “Reference”.

You’ve now got a new query that is referring to your original.  Awesome.  This will let us preserve our existing table in the Power Pivot data model, but reshape this table into the format that we need.

Building the Query we need

Let’s modify this puppy and get it into the format that will serve us.  First thing, we need to make sure it’s got a decent name…

  • On the right side, rename it to MaterialsCount

Now we need to narrow this down to a list of unique material/model combinations, then count them:

  • Go to Add Column –> Add Custom Column
  • Leave the default name of “Custom” and use the following formula:  [model]&[material]
  • Sort the model column in ascending order
  • Sort the material column in ascending oder

We’ve not got a nicely ordered list, but there’s a few duplicates in it.

SNAGHTML4800f6be[4]

Those won’t help, so let’s get rid of them:

  • Select the “Custom” column
  • Go to Home –> Remove Duplicates

Now, let’s get that Distinct Count we’re looking for:

  • Select the “model” column
  • Go to Transform –> Group By
  • Set up the Group By window to count distinct rows as follows:

image

Very cool!  We’ve now got a nice count of the number of distinct materials that are used in the production of each door.

The final step we need to do in Power Query is load this to the data model, so let’s do that now:

  • File –> Close & Load To…
  • Only create the connection and load it to the Data Model

Linking Things in Power Pivot

We now need to head into Power Pivot to link this new table into the Data Model structure.  Jump into the Manage window, and set up the relationships between the model fields of both tables:

image

And that’s really all we need to do here.  Let’s jump back out of Power Pivot.

Add Slicers for Value Fields

Let’s try this again now. Click in the middle of your Pivot and choose to insert a slicer.  We’ve certainly got more options than last time!  Choose both fields from the “MaterialsCount” table:

image

And look at that… we can now slice by the total number materials in each product!

image

Transpose Stacked Tables

For the first post of the new year, I thought I’d tackle an interesting problem; how to Transpose Stacked Tables in Power Query.  What’s do I mean by Stacked Tables?  It’s when your data looks like this:

image

Notice that we’ve got 3 tables stacked on top of each other with gaps.  The question is, how do we deal with this?

There’s actually a variety of different ways we could accomplish this, but I want to show a neat trick that allows us to refer to data on the next row(s) in Power Query this time.  We may revisit this in future with some other techniques as well, but for now… I think you’ll find this interesting.

Sample File

If you’d like to play along, click here to download the sample file, with a mock-up of a fictional Visa statement.

Getting Started

The first thing we need to do is pull the data into Power Query, so let’s go to Power Query –> From Table, and set the range to pull in all the data from A1:A17:

image

We end up with the table in the Power Query window, and we’re now going to add an Index column to it; something you’re going to see can be very useful!  To do this, go to Add Column –> Add Index Column (you can start from 0 or 1, your preference.  I’m just going to go with 0):

image

Now, for simplicity, I am going to make an unnecessary change to start with.  What I’m going to do is – in the “Applied Steps” section, I’m going to right click the “Added Index” line, and choose Rename, then rename this step to “AddedIndex” with no space:

image

Transpose Stacked Tables - The Tricky Part

Go to Add Column –> Add Custom Column.  In the window that pops up:

  • Name the column “Location”
  • In the formula area, enter:  AddedIndex{[Index]+1}[Transactions]

And the result:

image

Wow… how cool is that?  We’ve referred to the value on the next row!  But how?  The secret is in the syntax.  It basically works like this:

Name of previous step{[Index] + 1}[Name of Column to Return]

Watch all those brackets carefully too.  The curly ones go around the index row you want to return (the index number of the current row plus 1), and the square brackets around the name of the column you want.

Now, let’s do the next row.  Add a new column again:

  • Name the column “TransactionID”
  • In the formula area, enter:  #”Added Custom”{[Index]+2}[Transactions]

Okay, so what’s with that?  Why the # and quotes around the previous step this time?  The answer is that, in order to read the column name with the space, we need to wrap the column’s name in quotes and preface it with the # mark.  This tells Power Query to interpret everything between the quotes as a literal (or literally the same as what we wrote.)  As you can see, it works nicely:

image

Just to circle back on the unnecessary step I mentioned before, it was renaming the “Added Index” step.  Doing that saved me having the type #”Added Index”.  Personally I can’t stand all the #”” in my code, so I tend to modify my steps to drop the spaces.  It looks cleaner to me when I’m reading the M code.

At any rate, let’s do the last piece we need too.  Add another column:

  • Name the column “Value”
  • In the formula area, enter:  #”Added Custom1”{[Index]+3}[Transactions]

image

Beautiful… I’ve got each row of data transposed into the table the way I need it, but I’ve still got a bunch of garbage rows…

Taking Out The Trash

As it stands, we really only need the rows that start with the dates.  Again, we have multiple options for this, but I see a pattern I can exploit.  I need to:

  • Keep 1 row
  • Remove 5 rows
  • Repeat

How do we do that easily?  Go to the Home Tab –> Remove Rows and choose Remove Alternate Rows!

image

And finally we can get rid of the Index column, set our Data Types, and we’re all set:

image

And there you have it.  Just one of a few ways to Transpose Stacked Tables using Power Query.

Happy Holidays

I’m quite proud of the fact that I’ve managed to publish a new blog post every Wednesday since September 2014… among others from earlier in the year, but now it’s time to take a quick break.  So this will be the final blog post of 2014, basically to acknowledge that the blog will be closed, but I’ll be back on January 7th, 2015 with a new Power Query post.

Until then we wish you the very best of the holiday season, and a prosperous 2015.

:)

Treat Consecutive Delimiters As One

A couple of weeks back, I received a comment on my “Force Power Query to Import as a Text File” blog post.  Rudi mentioned that he’d really like to see a method that would let us Treat Consecutive Delimiters as One; something that is actually pretty easy to find in Excel’s old Import as Text file feature.

…the option for “Treat consecutive delimiters as one” should be made available too. The text import eventually imported but the info was all out of sync due to the tabbed nature of the data – names longer and shorter cause other data to not line up.

So let’s take a look at this issue today.  You can access and download the sample file I’ll be working with from this link.

The issue at hand

The contents of the text file, when viewed in Notepad, are fairly simple:

image

But the challenge shows up when we go to import it into Power Query.  Let’s do that now:

  • Power Query –> From File –> From Text
  • Browse to where you stored the file and double click it to Open it up

The first indication that we have an issue is that we’ve got a single column, the second is that it’s a bit scattered:

image

Well no problem, it’s a Tab delimited file, so we’ll just split it by Tabs, and all should be good, right?

  • Home –> Split Column –> By Delimiter
  • Choose Tab and click OK

image

Uh oh…  What happened here?

The issue is that we’ve got multiple delimiters acting between fields here.  They look like one, but they’re not.  Between Date and Vendor, for example, there are two quotes.  But between the Date values and the Vendor values only one Tab is needed to line it up in the text document.  The result is this crazy mess.

Approaches to the Issue

I can see three potential routes to deal with this problem:

  1. We could replace all instances of 2 Tab’s with a single Tab.  We’d need to do that a few times to ensure that we don’t turn 4 Tabs into 2 and leave it though.
  2. We could write a function to try and handle this.  I’m sure that this can be done, although I decided not to go that route this time.
  3. We could split the columns using the method that I’m going to outline below.

Before we start, let’s kill off the last two steps in the “Applied Steps” section, and get back to the raw import that we had at the very beginning.  (Delete all steps except the “Source” step in the Applied Steps window.)

How to Treat Consecutive Delimiters As One

The method to do this is a three step process:

  1. Split by the Left Most delimiter (only)
  2. Trim the new column
  3. Repeat until all columns are separated

So let’s do it:

  • Select Column1
  • Split Column –> By Delimiter –> Tab –> At the left-most delimiter

image

Good, but see that blank space before Vendor?  That’s one of those delimiters that we don’t want.  But check what happens when you trim it…

  • Select Column1.2
  • Transform—>Format –> Trim

And look at that… it’s gone!

image

Let’s try this again:

  • Select Column1.2
  • Split Column –> By Delimiter –> Tab –> At the left-most delimiter

image

And trim the resulting column again:

  • Select Column1.2.2
  • Transform—>Format –> Trim

And we’re good.  Now to just do a final bit of cleanup:

  • Transform –> Use First Row as Headers
  • Name the Query

And we’re finished:

image

Easy as Pie?

In this case, yes it was.  Easy, if a bit painful.  But what about spaces?  If this file was space delimited I couldn’t have done this, as my Vendors all have spaces in them too.  So then what?

I’d modify my procedure just a bit:

  1. Replace all instances of double spaces with the | character
  2. Split by the left most | character
  3. Replace all | characters with spaces
  4. Trim the new column
  5. Repeat until finished

Final Thought

I haven’t received too much in the way of data like this, as most of my systems dump data into properly delimited text or csv files, but I can certainly see where this is an issue for people.  So I totally agree that it would be nice if there were an easier way to treat consecutive delimiters as one in Power Query.

I do like the versatility of the choices we have currently, but adding this as another option that works in combination with the existing ones would be fantastic.

Power Query team: here’s my suggestion…

image

:)

Making Transformations Re-Usable

We're back with post two of Miguel's week.  This time, he is going to take his previous solution and show us how versatile Power Query can be, leveraging code he's already written and Making Transformations Re-Usable.

Have at it, Miguel!

Making Transformations Re-Usable

In my previous post, I was able to create a Query that would transform form Data to Rearranged Data with Power Query:

(Image taken from Chandoo.org)

...but what if I had multiple sheets or workbooks with this structure and I need to combine all of them?

Well, you could transform that PQ query into a function and apply that function into several other tables. Let’s dive in and see how to accomplish that!

You can dowload the workbook and follow along from here

Step 1: Transform that query into a function

The first step would be transforming that query into a function and that’s quite simple. You see, the query needs an input – and in order to undestand what that input is you need to understand how your main query works.

So our query starts by grabbing something from within our current workbook – a table Sonrisa

This means that our first input in order for our query to work is table so we’re going to replace that line that goes like this:

Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

To look like this:

Source = MyTable,

What is "MyTable"?  It is a variable, which will return the contents that we've fed it.  Except we haven't actually declared that variable yet...

The next step would be to declare that variable at the top of the actual query, making the query start like this:

(MyTable as table) =>
let
Source = MyTable, [….]

And we now have the MyTable variable set up to receive a table whenever we call it (That is the purpose of specifying the as table part.)

After you do that, this is how the PQ window will look:

image

Now you can just save this function with the name of your choice.

Step 2: Get the files/sheets <From Folder>

From here, you’ll go to the "From Folder" option of Power Query and grab all the workbooks that contain the tables that you want to transform.

Then you’re going to use the Excel.Workbook function against the binary files of the workbooks that you need like this:

image

Next, you’re going to expand that new Custom column that we just created, giving us the sheets, tables, defined ranges and such that are within each of those workbooks.  The results should look like this:

image

You have to choose the data that you want to use which, in this case,  is stored on the Data Column as a table.  So let's create new column against that column.

Step 3: Apply the function

In order to create such column, we are going to apply the function to the Data column (which is a table) like this:

image

... and we'll get a new Custom column that contains a bunch of Table objects.  We can now compare that table from data against the newly created custom column:

image

(table preview from the data column)

image

after we perform the function against the data column

Final Steps:

Expand the new column (by clicking that little double headed arrow at the top,) reorder the columns and change the data type, until we have a final table that looks like this:

image

We're now got a table that is the combination of multiple sheets from multiple workbooks which all went through our transformation, and then were all combined into a single table.  All thanks to Power Query.

If you want to see the step by step process then you can check out the video below:

Transforming Data with Power Query

This week we've got a guest post from Miguel Escobar, who is going to tell us why transforming data with Power Query is his first preference versus other tools and methods.  In fact, this is part 1 of 2, and we're going to turn this week over to Miguel in full (I'll be back next week with a new post of my own.

Miguel, the floor is yours!

Transforming data with Power Query

Looking for some cool scenarios to test the power of Power Query in terms of transformation I stumble upon a really cool post that Chandoo made a few weeks ago where he basically needed this:

(Image taken from Chandoo.org)

he used a really cool formula as option #1 and for option #2 a VBA approach. Now it’s time to make an option #3 that I think will become your #1 option after you finish reading this! Sonrisa

Step #1: Make this a table

In order for this to work, you’ll need to make the range a table with no headers and that should give you a table like this one:

Column1 Column2 Column3 Column4 Column5 Column6 Column7
29-Sep 30-Sep 1-Oct 2-Oct 3-Oct 4-Oct 5-Oct
58,312 62,441 60,467 59,783 51,276 23,450 23,557
6-Oct 7-Oct 8-Oct 9-Oct 10-Oct 11-Oct 12-Oct
53,194 60,236 62,079 66,489 59,258 27,140 25,181
13-Oct 14-Oct 15-Oct 16-Oct 17-Oct 18-Oct 19-Oct
56,231 62,397 60,978 60,928 52,430 24,772 25,630
20-Oct 21-Oct 22-Oct 23-Oct 24-Oct 25-Oct 26-Oct
59,968 61,044 57,305 54,357 48,704 22,318 23,605
27-Oct 28-Oct 29-Oct 30-Oct 31-Oct 1-Nov 2-Nov
56,655 62,788 62,957 64,221 52,697 25,940 27,283
3-Nov 4-Nov          
53,490 5,302          

Also, if you want to follow along, you can download the workbook from here.

Since now we have a table, we can use that as our datasource in order to work with Power Query:

image

(Choose From Table in the Power Query Ribbon)

Step 2: The actual transformation process

Note: Before we start, I don’t want you to be scared of some M code that we’ll be showing you later. Instead, I’m going to show you how NOT to be scared of this crazy looking M code and tell you how it was originated. And no, I didn’t manually go into the advanced editor to do anything – I did all of my steps within the actual UI of Power Query Sonrisa 

So we start with 3 simple steps:

Add an Index column that starts from 0 and has an increment of 1:

image

image

Then we move forward with defining the odd and even rows with a custom column using the Index as a reference and the Number.IsOdd function:

image

After this, we will create another custom column based on the one we just created with a simple if function;

image

we can see that our table so far has 3 custom columns:

  • Index
    • that starts from 0 and has increments of 1
  • A TRUE/FALSE column that tells us if the row is odd or not
    • we used the Number.IsOdd function here
  • A column that basically gives the same number for the records in every 2 rows. So esentially we are creating a pair – a couple Sonrisa

The next step would be to remove the Index column as we no longer need it.

image

and now select the Custom and Custom.1 columns and hit the unpivot other columns option:

image

and now our query should look like this:

image

its all coming along great so farSonrisa

Now we create another custom column that will help us along the way. This will work as our column ID for the components of a row.

image

and now here’s where the interesting part comes. You need to hit the Add Step button from the formula bar:

image

and that should give us a query with the results from the last step which in our case is the “Added Custom 2” step. In this section we will filter the custom column to only give us the rows with the value FALSE.

image

You’ll go back to the “Added Custom 2” step and basically repeat the “Add a custom Step” process but this time you’ll filter it to be TRUE instead of false. I’ve renamed this 2 steps in my query to be DataSet1 and DataSet2.  (Right click the step in the "Applied Steps" box and choose Rename to do this.)

image

Tip: You can always check what’s going on in a step by hitting the gear icon right next to the step

and now we want to combine those 2. DataSet1 and DataSet2....but how?

image

so we hit the Merge Queries button but you’ll notice that it will only show you the tables and queries that have been loaded to our workbook or that are stored as connections. So for now we’re going to trick the system and just choose a merge between the tables that PQ tell us to choose from like this:

image

and that should give you this:

image

and we need to change what we have in our formula bar in either one of the DataSet2  to be DataSet1 so we can merge the data from DataSet1 and DataSet2 which should give me this:

image

And, as you can see, we have a column that needs our help. Let’s expand that new column but only the column with the name Value like this:

image

and the result of that should be:

image

We are almost there

So we have the data how we want it, but now we want to clean it as it has some columns that we don’t really need, so just remove the columns using the Remove Columns button from the Power Query UI (User Interface):

image

This looks more like it! but wait, we need to add some data type and perhaps rename those columns so it can be readable for an end-user:

So first we change the data type for both columns as desired:

image

and later we proceed with renaming the columns just by double clicking each column name:

image

in Chandoo’s solution he adds an index column for the end result so we’ll do the same:

image

and this is our final result after we reorder our columns just by drag-n-drop:

image

and once its loaded into our Worksheet it’ll look like this

image

Things to take in consideration

  1. This is a dynamic solution. You can add new columns and/or more rows and it will still work as desired.
  2. This WILL work on the cloud with Power BI so you can rest-assured that what we just described is something that we can refresh on the cloud. (VBA doesn't work on the cloud.)
  3. We could potentially transform this into a function and execute that function into multiple sheets, tables and even defined names within one or multiple workbooks. (Read more about that here.)
  4. If the result of this transformation has an extensive amount of rows, we could load it directly into our Data Model instead of experiencing the performance drawback of loading it into a worksheet. (Things get seriously slow when a workbook gets way too many rows.)

Power Query is a tool that was created specifically for scenarios that needed transformation like this one.  Microsoft is on a roll by delivering monthly updates of Power Query and you can bet that it will continue to get better and better each month, as it already has for the past year and a half to date.

Try Power Query if you haven’t done so yet. Download the latest version here