Transforming Data with Power Query

This week we've got a guest post from Miguel Escobar, who is going to tell us why transforming data with Power Query is his first preference versus other tools and methods.  In fact, this is part 1 of 2, and we're going to turn this week over to Miguel in full (I'll be back next week with a new post of my own.

Miguel, the floor is yours!

Transforming data with Power Query

Looking for some cool scenarios to test the power of Power Query in terms of transformation I stumble upon a really cool post that Chandoo made a few weeks ago where he basically needed this:

(Image taken from Chandoo.org)

he used a really cool formula as option #1 and for option #2 a VBA approach. Now it’s time to make an option #3 that I think will become your #1 option after you finish reading this! Sonrisa

Step #1: Make this a table

In order for this to work, you’ll need to make the range a table with no headers and that should give you a table like this one:

Column1 Column2 Column3 Column4 Column5 Column6 Column7
29-Sep 30-Sep 1-Oct 2-Oct 3-Oct 4-Oct 5-Oct
58,312 62,441 60,467 59,783 51,276 23,450 23,557
6-Oct 7-Oct 8-Oct 9-Oct 10-Oct 11-Oct 12-Oct
53,194 60,236 62,079 66,489 59,258 27,140 25,181
13-Oct 14-Oct 15-Oct 16-Oct 17-Oct 18-Oct 19-Oct
56,231 62,397 60,978 60,928 52,430 24,772 25,630
20-Oct 21-Oct 22-Oct 23-Oct 24-Oct 25-Oct 26-Oct
59,968 61,044 57,305 54,357 48,704 22,318 23,605
27-Oct 28-Oct 29-Oct 30-Oct 31-Oct 1-Nov 2-Nov
56,655 62,788 62,957 64,221 52,697 25,940 27,283
3-Nov 4-Nov          
53,490 5,302          

Also, if you want to follow along, you can download the workbook from here.

Since now we have a table, we can use that as our datasource in order to work with Power Query:

image

(Choose From Table in the Power Query Ribbon)

Step 2: The actual transformation process

Note: Before we start, I don’t want you to be scared of some M code that we’ll be showing you later. Instead, I’m going to show you how NOT to be scared of this crazy looking M code and tell you how it was originated. And no, I didn’t manually go into the advanced editor to do anything – I did all of my steps within the actual UI of Power Query Sonrisa 

So we start with 3 simple steps:

Add an Index column that starts from 0 and has an increment of 1:

image

image

Then we move forward with defining the odd and even rows with a custom column using the Index as a reference and the Number.IsOdd function:

image

After this, we will create another custom column based on the one we just created with a simple if function;

image

we can see that our table so far has 3 custom columns:

  • Index
    • that starts from 0 and has increments of 1
  • A TRUE/FALSE column that tells us if the row is odd or not
    • we used the Number.IsOdd function here
  • A column that basically gives the same number for the records in every 2 rows. So esentially we are creating a pair – a couple Sonrisa

The next step would be to remove the Index column as we no longer need it.

image

and now select the Custom and Custom.1 columns and hit the unpivot other columns option:

image

and now our query should look like this:

image

its all coming along great so farSonrisa

Now we create another custom column that will help us along the way. This will work as our column ID for the components of a row.

image

and now here’s where the interesting part comes. You need to hit the Add Step button from the formula bar:

image

and that should give us a query with the results from the last step which in our case is the “Added Custom 2” step. In this section we will filter the custom column to only give us the rows with the value FALSE.

image

You’ll go back to the “Added Custom 2” step and basically repeat the “Add a custom Step” process but this time you’ll filter it to be TRUE instead of false. I’ve renamed this 2 steps in my query to be DataSet1 and DataSet2.  (Right click the step in the "Applied Steps" box and choose Rename to do this.)

image

Tip: You can always check what’s going on in a step by hitting the gear icon right next to the step

and now we want to combine those 2. DataSet1 and DataSet2....but how?

image

so we hit the Merge Queries button but you’ll notice that it will only show you the tables and queries that have been loaded to our workbook or that are stored as connections. So for now we’re going to trick the system and just choose a merge between the tables that PQ tell us to choose from like this:

image

and that should give you this:

image

and we need to change what we have in our formula bar in either one of the DataSet2  to be DataSet1 so we can merge the data from DataSet1 and DataSet2 which should give me this:

image

And, as you can see, we have a column that needs our help. Let’s expand that new column but only the column with the name Value like this:

image

and the result of that should be:

image

We are almost there

So we have the data how we want it, but now we want to clean it as it has some columns that we don’t really need, so just remove the columns using the Remove Columns button from the Power Query UI (User Interface):

image

This looks more like it! but wait, we need to add some data type and perhaps rename those columns so it can be readable for an end-user:

So first we change the data type for both columns as desired:

image

and later we proceed with renaming the columns just by double clicking each column name:

image

in Chandoo’s solution he adds an index column for the end result so we’ll do the same:

image

and this is our final result after we reorder our columns just by drag-n-drop:

image

and once its loaded into our Worksheet it’ll look like this

image

Things to take in consideration

  1. This is a dynamic solution. You can add new columns and/or more rows and it will still work as desired.
  2. This WILL work on the cloud with Power BI so you can rest-assured that what we just described is something that we can refresh on the cloud. (VBA doesn't work on the cloud.)
  3. We could potentially transform this into a function and execute that function into multiple sheets, tables and even defined names within one or multiple workbooks. (Read more about that here.)
  4. If the result of this transformation has an extensive amount of rows, we could load it directly into our Data Model instead of experiencing the performance drawback of loading it into a worksheet. (Things get seriously slow when a workbook gets way too many rows.)

Power Query is a tool that was created specifically for scenarios that needed transformation like this one.  Microsoft is on a roll by delivering monthly updates of Power Query and you can bet that it will continue to get better and better each month, as it already has for the past year and a half to date.

Try Power Query if you haven’t done so yet. Download the latest version here

6 thoughts on “Transforming Data with Power Query

  1. Hi Miguel - Ive been enjoying learning Power Query from your posts & videos.
    I have been using PQ to automate manipulation of a google form that I use. I wanted to know whether it is possible to link directly to the google form in PQ? At the moment, I export the google spreadsheet to an Excel file and use PQ from file. Is there a more efficient way?

    Thanks!

  2. Hey Grant!

    I haven't tried it yet but I've seen the other Miguel (Llopis) do it using a google doc spreadsheet. I'm not familiar with google docs so I don't really know if the forms are something different but if you have a public link to a form or spreadsheet let me know so I can test it out!

  3. Chandoo Challenge in 12 steps

    let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    AlternatedRows = Table.AlternateRows(Source,1,1,1),
    UnpivotedColumns = Table.UnpivotOtherColumns(AlternatedRows, {}, "Attribute", "Value"),
    DataSet1 = Table.AddIndexColumn(UnpivotedColumns, "Index", 1, 1),
    Custom1 = Source,
    AlternatedRows1 = Table.AlternateRows(Custom1,0,1,1),
    UnpivotedColumns1 = Table.UnpivotOtherColumns(AlternatedRows1, {}, "Attribute", "Value"),
    DataSet2 = Table.AddIndexColumn(UnpivotedColumns1, "Index", 1, 1),
    Merge = Table.NestedJoin(DataSet1,{"Index"},DataSet2,{"Index"},"NewColumn"),
    ExpandNewColumn = Table.ExpandTableColumn(Merge, "NewColumn", {"Value"}, {"NewColumn.Value"}),
    RemovedColumns = Table.RemoveColumns(ExpandNewColumn,{"Attribute", "Index"}),
    ChangedType = Table.TransformColumnTypes(RemovedColumns,{{"Value", type date}})
    in
    ChangedType

Leave a Reply

Your email address will not be published. Required fields are marked *