Map Columns Between Data Sets

Image

Ken is at the PASS BA conference this week, so it seemed like a perfect time for me to publish my first Power Query post here.  In this installment, I'm going to show how to map columns between data sets.

Exploring how to map columns between data sets

I’m on a great new project where Power Query is the bread and butter of the solution. We’re pulling design information from Engineering Design Systems and building transforms to load into another engineering application. This application has a very strict requirement for data layout. Needless to say, the data structures and field names are seldom consistent between the two applications, so a key part of the transform is to map columns from one data set to the other.

Generally, Power Query’s ability to insert, rename, and move columns is useful in a case like this, however we are doing this for a large number of different data transfers, developing the maps in an ongoing process, and I REALLY don’t want to rewrite the Power Query steps for every change in every transform. (Also being able to document the transform is important for design and debugging).

Here’s a simplified example:

Source Data: The Flintstones

The Flintstones Sample data in a small table provides the source of data to be mapped 1

Target Data Structure: M*A*S*H

What does M*A*S*H have to do with the Flintstones? Not much really, that’s the point. But I want to convert data to this layout.

2

The "Map Table": The key piece needed to map columns between data sets

I set a goal to write a power query transform that was agnostic of specific column names and field counts, and so would not use Table.AddColumn, Table.RenameColumns, Table.RemoveColumns, or Table.ReorderColumns operations.

Mapping Strategy/Assumptions:

  • Data will not necessarily end up in the same column order between Source and Target.
  • Not all the columns map from Source to Target.
  • Not all the Target columns will be filled from the Source.

The solution: A "Map" table on an Excel sheet; A simple list of Source field names and Target Field names (I like using a column format for readability).

3

The Transforms

The "TblMap" Transform

The query reads the "Map" table data and flips it around so that the Source names are the table’s column names:

4

The complete M code used for this solution is shown here:

5

The "Output" Transform

This query references the tblMap transform and appends the original source data, giving something like this:

6

Now just promote the first row to Headers, overwriting the existing column names, and the new Target data structure is in place:

7

Dealing with un-mapped columns

But what about those pesky un-mapped columns (Column7 and Column8)? Normally I would use Table.RemoveColumns. I don't want to do that here, though, as this would hard code column names into the M code that might not exist next time, resulting in errors.

Instead, we just transpose the table and filter out any columns that begin with “Column”, and transpose it back.  The complete M code for the query is shown here:

8

And here is the output in Excel once we load it to a table:

9

Closing Thoughts

So there you go. One of the best things I like about this approach is how flexibly it can be modified. Spell “Klinger” wrong? Just modify the spelling in the Map table. Forget to add Rizzo or Nurse Able to the Target? Just add them to the table on the Target side and they are in the result. Forgot to include Dino in the Source data? Just add him to the list.

The sample file is attached. Give it a try. Hope it can be useful.

Column Name Translate

A thought on Data Types

I have not done a lot of testing with data types on this approach. My work will not do any math on the contents until after the re-mapping (I hope), so data typing can be done at the end. If there is any math to be done in the middle of the process, you would need to be careful not to have power Query treat numbers as integers (this has bitten me before).

Performance

The last step where extra column names are removed uses a transpose which could be really slow for long data sets. Another solution that could fix this would be to create a list from the Map table to automate a RemoveOtherColumns function.

 

Pass Parameters to SQL Queries

One of the questions I get quite frequently is how we can pass parameters to SQL queries, allowing us to make them dynamic. Someone asked a question in the forum on the topic, so that finally inspired me to write it up.

Before you do this…

There are a lot of reasons that you should NOT go this route.

Power Query is designed to take advantage of query folding when targeted against a database.  While this isn’t technically correct, you can look at query folding like this… As you connect to the database and start filtering and grouping via the user interface, Power Query passes these commands to the database to execute.  This means that the majority of the heavy lifting is done at the database, not your workstation.  This continues to occur until you hit a command that the database doesn’t know, at which point it returns the data to Power Query, and the rest is done in your instance of Excel.

Why is this an issue?  Dynamic parameters use custom lines of code, which can’t be folder.  In the case of the poster’s question, this means that he just asked to bring 40 million records into Excel, rather than filtering them down on the server.  Take a guess as to which one is faster?

But what if I want to Pass Parameters to SQL Queries dynamically?

Listen, I totally get it.  I love dynamic queries too.  And anyone who knows me is aware that I’m super obsessed with making my data tables as short and narrow as possible.  This almost always involves filtering based on parameters that the user needs to pass at run time.  So are we dead in the water?  No, not by a long shot.

How to Pass Parameters to SQL Queries – Method 1

I’m a big fan of query folding, and would encourage you to use it wherever possible.  In order to do that, you should use the user interface to connect to the database and drive as many filter, sort and group by operations as you possibly can.  The goal is to push as much work to the server as possible, resulting in the narrowest and shortest data table that you can accomplish.  Once you have that, land it in a Connection Only query.  And from there use your dynamic parameters to filter it down further to get just what you need.

I have no idea what was in the 40 million row data set that the user was working with, but let’s assume it was 40 years of data related to 4 divisions and 30 departments.  Assume that our user (let’s call him Mark) wants to bring in the last 2 years data, is focussing only on the Pacific division, and wants to give the user choice over which department they need to work with.  For ease of assumption, we’ll assume that each year is 1/40 of the annual record load and each division provides 1/4 of the total records.  (Yes, I’m aware that nothing is that easy… this is an illustration of concept only!)

The recommended steps would be to do this:

  • Create the Staging query
    • Connect to the raw database table
    • Filter to only the Pacific Division – Server reduces 40m to 10m records
    • Filter to only the 2 years of data – Server reduces 10m to 500k records (10/40*2)
    • Land this output into a staging query – 500k records total
  • Create the parameter table and the fnGetParameter query
  • Create a query that references the Staging query and filters the department to the one pulled via the fnGetParameter query

That should take as much advantage as possible, and means that Power Query only needs to run the processing of 500k records against our dynamic criteria.

Where Method 1 breaks down

But what if the data set is still too big?  What if you need to parameterize the Division, the Date Range and the Department?  In order to avoid issues from the formula firewall, you would have to do this:

  • Create the Staging query
    • Connect to the raw database table
  • Create the parameter table and the fnGetParameter query
  • Create a query that references the Staging query and…
    • Collects the Division, Date and Department variables
    • Filters to only the Pacific Division
    • Filters to only the 2 years of data
    • Filters to only the selected department

Seems about the same, doesn’t it?  Except that this time we can’t take advantage of query folding.  To pass a parameter to the database, we have to separate it from the parameters in order to avoid the formula firewall.  This means that we break query folding.  And this means that Power Query needs to pull in all 40 million records, and process them.  Not the server, your Excel instance.

I don’t know how much RAM you have (and don’t care unless you’re on 64 bit), or how many processor cores you have (as Power Query is single threaded), you’re in for a LOOONNNGGG wait… if it doesn’t just tip over on you.

So how do we fully parameterize this stuff?

How to Pass Parameters to SQL Queries – Method 2

The good news that it can be done, the bad news is that you need:

  1. SQL Skills, and
  2. An adjustment to the default Power Query load behaviour

Let’s talk SQL Skills

The reason you need SQL skills is that you need to be able to write the most efficient query you possibly can, and pass this into the query when you connect to the database. (Thos italics, as you’ll see, are key.)  So, let’s assume that I want to connect to the old AdventureWorks database and pull records from the Sales.SalesOrderHeader table where CustomerID = 29825.  You need to be able to write this:

SELECT * FROM Sales.SalesOrderHeader WHERE CustomerID='29825'

Why?  Because you need to include that query when you’re building/testing your code.  It goes in the advanced options, below:

image

(You may need to trust the Native Database Query to get to the next step.)

So that created the following code for me:

let
Source = Sql.Database("azuredb.powerqueryworkshop.com", "AdventureWorks2012", [Query="SELECT * FROM Sales.SalesOrderHeader WHERE CustomerID='29825'"])
in
Source

Modifying the hard coded query to use dynamic parameters

I then set about fully parameterizing my query using the fnGetParameter query detailed here, and the table shown below:

image

Here’s the code I cobbled together to do this:

let
//Pull in a values from the parameter table
dbURL = fnGetParameter("Database URL"),
dbName = fnGetParameter("Database Name"),
dbTable = fnGetParameter("Database Table"),
sFilterField = fnGetParameter("Filter Field"),
sFieldValue = Text.From(fnGetParameter("Field Value")),

//Create the query
dbQuery = "Select * FROM " & dbTable & " WHERE " & sFilterField & "='" & sFieldValue & "'",

//Get the data
Source = Sql.Database(dbURL,dbName,[Query=dbQuery])
in
Source

 

I won’t go through the fnGetParameter function, as it’s quite well detailed in the blog post, but the key to understand here is that we are pulling a variety of items from the worksheet table, and putting them all together to feed the line starting with dbQuery.  This line dynamically sources the database path, database name and the SQL query.  Wouldn’t this be heaven if it worked?

Here comes the Formula Firewall again…

But no:

image

Now if that’s not enough to make you scream… and here’s the issue…

What I really wanted to do was create a power query that connects to the database without declaring the SQL query up front.  Something like this:

Source = Sql.Database("azuredb.powerqueryworkshop.com", "AdventureWorks2012")

My thought is that I could load this as a connection only query, then reference it and add the dynamic query afterwards.  Unfortunately, I got nowhere.  I’m not saying it can’t be done, but I couldn’t figure this out.  It seems that the only way to pass a query to the database is to pass it during the initial connection.  But doing so, by its very nature, violates the formula firewall.

So how can we get past it?

Bypassing the Formula Firewall

Yes, we can make this work.  But doing so involves turning off Privacy settings.  I’m not generally a big fan of turning off firewalls, but this one just aggravates me to no end. Normally we turn this off to avoid prompting messages.  In this case it just flat out prevents us from running.  So let’s kill it:

  • Go to Query Settings –> Privacy
  • Select Ignore the Privacy Levels and potentially improve performance

And at this point you’ll find out that it works beautifully… for you.  You’ll most likely need to have anyone else who uses the file set the above option as well.

Security Settings

The security setting we changed above is workbook specific.  Now I’m not recommending you do this, but if you get tired of the privacy settings overall, you can turn them off for all workbooks in the Query Options –> Security tab.  The second arrow is pointing at the relevant option:

image

And how about that?  Check out the first option… that one lets you avoid prompting about Native Database Queries.  (I seriously have to question why a SELECT query should trip a warning.)

Sample Workbook

Yes, I’ve got one, which you can download here, but it comes with a catch…

I’ve set it up to talk to a SQL Server in order to demo this, as it’s the big databases with query folding that cause us this issue.  You might notice that not too many people provide free access to a SQL server since they cost money to run.  For this reason, you need a login and password to refresh the solution and really see it work.

As it happens, if you’ve bought a copy of M is for Data Monkey, or attended our PowerQuery.Training workshop, then you’ve already got it the credentials. (They are on page 66 of the book.)  And if you haven’t… what are you waiting for?  🙂

Excel 2016: A fix and a new bug

Well, there’s good news and bad this time.  I just updated my Excel 2016 to 16.0.6868.2048 (First release version) and there is a fix and a new bug evident.

First, the fix

As I reported a couple of weeks ago, there was a new bug introduced in the April 2016 Excel update, as related to Power Query.

… if you are sourcing from a named range that doesn’t have an equal offset of rows/column.  I.e. if your source range doesn’t start in A1, B2, C3, D4, etc… then it pulls the wrong range.  Tables are fine, named ranges are the issue…

The issue was reported, fixed, and build 16.0.6868.2048 (which I finally got today) has fixed the issue.

I have to say that this is pretty cool.  Even though I was frustrated having to wait 2 weeks for a fix, the fact that it was only 2 weeks is pretty darned amazing.  In past cycles, this would have been several years until a new build of Excel came out. So even though we see new bugs, we also need to recognize that the team is working very hard to try and be responsive to them and get the fixes pushed VERY quickly.

Unfortunately, a new bug

I’m not actually sure if this is new in a more recent update, or if it was there in the previous build and I just didn’t notice.  (While I do check updates almost daily, I don’t actually use every feature of Power Query every day.)  While this bug doesn’t prevent you from using your models, it is pretty irritating… Since I know this was a pretty major contention point for many in the past, I figured I should talk about it.

Once again, this is Excel 2016 specific, and doesn’t affect Excel 2010/2013.

So let’s assume you have some data as shown below:

SNAGHTML158303b7

Here’s what’s going on with these tables:

  • The blue table is our raw data.
  • The green table is a simple query that imports the original data, and sets the first column to a Date data type, then loads it to the table.  (Nothing fancy, it’s just that simple.)
  • The orange table was created by right clicking the green table’s query and choosing “Duplicate”, then loading it to the worksheet.  It is an EXACT copy of the query that leads to the green table.

Make sense so far?  Now, let’s add a couple of rows to the blue table, then hit refresh all.  What you should expect to get is this:

SNAGHTML1591b06d

What you actually get is this:

SNAGHTML1590e741[4]

Check out the green table… those dates are pretty impressively formatted as serial numbers, not dates.  But yet, the orange table – an exact duplicate – is fine.  Huh?  Doesn’t this feel like a throw-back to early Power Query days, where tables didn’t hold the formatting properly?

Here’s the best we have from a temporary workaround point of view (courtesy of the Excel/Power Query team):

  • Select the green table –> Table Tools –> Design –> Properties
  • Un-check the Preserve Cell Formatting box

image

When we refresh now, we get…

 

SNAGHTML1591b06d[5]

And let’s just add some more data, then refresh again to make sure it sticks…

SNAGHTML15953acd

Stellar!  The number formats have remained, but the table style formatting has changed to a different one.  Ugh.

Now, it IS still a table.  But unfortunately the style and the number formats all seem to be controlled by that one selection.  So until they fix this, it appears that you can either have your tables pretty, or you can have your number formatting correct.

Or maybe you can create your query, immediately duplicate it for your reports, then delete the original, as the second one seems to behave properly.  (I have no idea why this is.)

Final thoughts on “a fix and a new bug”

The subscription model is a new thing for us, and personally, I’m pretty high on it, despite these kinds of issues.  My hope is that – with the connections I have at Microsoft – that I’m in the first ring of testers, and can get this stuff fixed before it hits you.  I’d highly suggest you also have one person in your company in the “First Release” program for this reason as well.

My understanding of this method is that the fixes we get into the First Release band are implemented before that version is shipped to the General Release band of users.  That’s a good thing, as the last thing we want to see is our end users having to experience two months of the first issue listed here!

With regards to the new bug, I’ve again reported this to the Power Query team.  They’re aware, and we are having some active dialog about it.  I know they are going to fix it, but I’m not sure how soon.  (I really hope it’s as quickly as the last one, as this is pretty visible!)

We need your votes!

Hey everyone, we need your votes to make a difference in Power Query and Power Pivot!  There are a couple of items in the uservoice forums that I’d like to bring your attention to, and hopefully entice you to vote them up.  The more votes we get, the easier it is for the program managers in the design teams to get the support to actually implement these features.  They ARE interested, they just need you to up-vote them to get them done.

Where we need your votes:

#1 - Add Intellisense to the M Editor

So the idea here is simple: Add Intellisense and better general editing capabilities in the Advanced Editor.  This would make a huge difference to those of us writing M code, and I’ve also suggested in the comments that this be extended to the Add Custom Column dialog.

What kills me on this is the signature of the original submitter:  “Software Engineer, Power BI Desktop”.  I don’t think I’ve ever seen a clearer case where they need our help to justify the budget to get this done.  Please go there and throw some votes on this.

Vote up the “Add Intellisense to the M Editor” idea.

#2 – Modification of the Power Pivot Field List experience

Back in November I posted a suggestion to improve the Pivot Table experience  which would benefit everyone, but especially Power Pivot users.  Full details of my suggestion can be found on the blog here, but the basic summary is this:  Allow the fields area to be collapsible in side by side view. This would make it WAY easier to rearrange fields by reducing unused whitespace.

I was really encourage to see Ashvini Sharma’s response which, paraphrased, says: “We want to do this too, so please get enough votes to help us justify it!”

Vote up the “Ability to collapse PivotTable field areas” idea.

My Ask To You

Please, take some time to throw some votes on these ideas, and encourage every other user you know to do the same.  It’s super easy to do, just go there, click the Vote button, assign as many as you want and verify you’re real with your email address.  (The only email I’ve ever received from this is when they confirmed a feature got implemented.)

Again, we need your votes.  Help us out!  I’d like to see both of these hit 500 votes in order to give Microsoft the justification they need to get these done.

Thanks!

April 2016 Power Query Update

The April 2016 Power Query Update was just released for Office 365 subscribers, and I can confirm that it is available to the First Release customers, as I’ve already got it installed.  (If you’re on a later branch it may be a bit longer.)  It’s also available for download for Excel 2010/2013 customers.

WARNING TO EXCEL 2016 USERS!!

If you read Power Query data from named ranges, I HIGHLY recommend that you avoid updating your software to the newest release right now if you can. The latest builds on the insider track have caused a rather large issue if you are sourcing from a named range that doesn’t have an equal offset of rows/column.  I.e. if your source range doesn’t start in A1, B2, C3, D4, etc… then it pulls the wrong range.  Tables are fine, named ranges are the issue.  Microsoft knows, has architected a fix, but it hasn’t been pushed out yet.  I’ll update this as soon as it has.

The problem is not an issue in Excel 2010/2013 running version 2.29.x or the current 2.30.x.  It is only affecting Excel 2016.

UPDATED 2016-04-21: I can confirm that this issue was fixed in Excel 2016 16.0.6868.2048.

What’s in the April 2016 Power Query update?

At first glance it doesn’t seem like a ton – only two that they are calling out – but I think that this will make a few people pretty happy.

ODBC Connectivity Improvement

The first is that they’ve added the ability to easily select from a list of a available DSN’s when you’re setting up a Power Query connection against an ODBC data source.  No new functionality there, but it saves you the headache of having to manually enter the connection string (which you can still do if needed.)

Get and Transform April 1

Image scooped from the Official Excel blog.

Ability to specify CSV column delimiters

We can now specify the type of delimiter for a CSV, including special characters:

image

 

Both small things, but should be quite impactful to the folks who need them.

Newly added Power Query specific help forum at Excelguru

I'm honestly not sure what's taken me so long to do this, but I'm pleased to say that I've finally added a Power Query specific help forum at Excelguru.  I'm hoping that this forum becomes THE place to ask and answer Power Query (or Get and Transform) related questions for both Excel and Power BI desktop.  After all, we wrote the book, so it only makes sense that we try and host the Q&A on the topic.  🙂

Extra monitoring of the Power Query Specific help forum

As the forum gets up and running there are a couple of key people I've added for email notification as well.  The intent here is that we get notified when people post questions, and will try to focus on making sure that they get addressed and (hopefully) solved.  If you are a Power Query expert and would like to be included in that list, just email me or post on this thread.  I'll get you set up.  (Make sure you've signed up for an account on the site, as I'll need your user ID to do this.)

Naturally, if you've got a question pertaining to the topic posted on the blog, you can still ask it here.  If the question is a bit more general though, I'd encourage you to sign up at www.excelguru.ca/forums and post the question in the Power Query forum.

Re-focusing on the Power Technologies

While I was setting this up, I also took the time to set up forums specific forums for some of the other "Power" stack:

Hope to see you there!

Replace Records Via Joins in Power Query

I got an email from a friend today who was using some complicated logic to replace specific records in a table with records from another table.  His query was running pretty slow, so he reached out for a little help. In this post I'll show how to replace records via joins in Power Query; a much easier (and what should be a faster) solution to his issue.

Data Background

The data footprint that was sent to me looked something like this:

image

And the desired output is shown below:

image

So basically, we want to take the record for Unit002 from the Override table and replace the Unit002 value in the Original Data table.

At first glance, this looks hard.  And my friend cooked up something pretty complicated to make this work.  Funny thing is (and believe me… I've had this happen to me as recently as last week…) when you put another pair of eyes on it, you suddenly realize it's much easier than you first saw.

In this case we can actually solve this very easily by using a couple of Power Query's different Join types!

Laying the Groundwork

If you want to follow along, grab the sample workbook here.  You'll notice that we have taken the following actions already:

  • Select any cell in the Original Data table
  • Create a New Query –> From Table
  • Go to Home –> Close & Load To… –> Connection Only
  • Select any cell in the Override With table
  • Create a New Query –> From Table
  • Go to Home –> Close & Load To… –> Connection Only

Which leaves us with the following queries in the Workbook Queries pane:

image

We are now set to replace the records.

Replace Records Via Joins in Power Query

This actually takes a Merge and an Append in order to complete the job.  So let's start at the merge.

  • Right click the "Original" query –> Reference

This creates a pointer to the data in the "Original" query, showing all four rows of data in the table.  The challenge here is that we only want the rows which are NOT being replaced.  The secret to getting those?  An Anti-Join!

  • Go to Home –> Combine –> Merge Queries
  • Choose the Override query
  • Select the Unit column on both the top and bottom queries
  • Change the Join Kind to "Left Anti (rows only in first)"

image

  • Click OK

At this point, you'll have 3 rows left, as shown below:

image

Why only 3 rows?  Because the Left Anti Join only returns the rows which don't match what is in the other table.  So where Unit002 exists in the second table, it cause it to pull everything EXCEPT Unit002 from the left table.  (For more on using Anti-Joins in Power Query, see this blog post.)

Joining tables does create a new column however, even if it is full of null values (as this one is.)  Since we don't need it, let's just delete that column:

  • Right click the NewColumn column –> Remove

Now we just need to add the record(s) from the Override table to this list.  That's fairly easy:

  • Go to Home –> Combine –> Append
  • Choose the Override table
  • Right click the Unit column –> Sort –> Ascending (this step is optional, and done for readability only.)

And you're done!  5 steps (after the connection only queries were created), 100% user interface drive, and should perform quite quickly. Smile

Running Totals using the List.Accumulate() Function

A while back I got an email from someone who had taken my Power Query training course online.  They were asking how to create a running total, although with some added twists and turns for calculating taxable gains and losses for a stock portfolio.  I decided to tackle that using the List.Accumulate() function.

Now, to be fair, I'm not going to demo the whole stock portfolio thing, but I do want to look at the List.Accumulate function as I found this a bit… confusing… to build.  It's super useful to be sure, but the help article… it needs work.

The Data

I'm using a pretty simple dialog box, inspired by my time in Australia.  You can download a copy from this link, but here's what it looks like:

image

Pretty simple, but now I want to create a running total that has 685 for Tim Tams, 741 for Stuffed Koala, and so on.

The List.Accumulate Function

So I headed over to MSDN, and found this helpful little article on the List.Accumulate function. It contains the following information.

Function:

List.Accumulate(list as list, seed as any, accumulator as function) as any

Arguments:

Argument Description
list The List to check.
seed The initial value seed.
accumulator The value accumulator function.

Example:

// This accumulates the sum of the numbers in the list provided.
List.Accumulate({1, 2, 3, 4, 5}, 0, (state, current) => state + current) equals 15

Using the List.Accumulate Function

So this formula looks pretty promising.  Let's go see how it works…

    • Click in the table of data –> create a new query –> From Table
    • Go to Add Column –> Add Custom Column
      • Formula Name:  Initial
      • Formula:

=List.Accumulate(
#"Changed Type"[Sales],
0,
(state, current) => state + current
)

The tricky part here is the #"Changed Type"[Sales], which provides the list of the sales values from the Changed Type step of the query (that was automatically created when we pulled the data in.)

And the result:

image

So this is a bit weird, as it shows the total for all rows, rather than the running total.  I figured that you should be able to change the accumulator function… except that there is no documentation about what the options are!  (I left some critical feedback on the MSDN site, and would suggest you do too, as that's pretty poor.)

At any rate, I tried dropping the "+ current" from the end, leaving just => state.  The result was a 0 value all the way down the column.  So that plainly didn't work. Then I tried modifying the formula again, leaving => current instead.  The result was 231 on all rows (so the last value in the accumulator.)  How 0 + 231 = 1095 I'm not quite sure but whatever.  state + current returns the overall total.

So plainly, we can't just use this function on it's own.

We need the List.Range function!

With the List.Accumulate function returning a total of all rows fed into it, it became plain that we needed to control what was being fed into the list used as a parameter.  So I reached back out to MSDN and browsed the site until I located the List.Range function.

Function:

List.Range(list as list, offset as number, optional count as number) as list

Arguments:

Argument Description
list The List to check.
offset The index to start at.
optional count Count of items to return.

Example:

List.Range({1..10},3,5) equals {4,5,6,7,8}

Using the List.Range function

In order to use the List.Range function, we are going to need to figure out which rows we want.  To do that, we need to add an Index column

  • Add Column –> Add Index Column –> From 1

Then add a column that makes use of List.Range()

    • Go to Add Column –> Add Custom Column
      • Formula Name:  Initial
      • Formula:

=List.Range(#"Added Index"[Sales],0,[Index])

So what I'm doing here is feeding in the Added Index step (from adding the Index column), and providing the [Sales] column to get a list.  But I'm asking it to return the list for the number of rows as contained in the [Index] column.  The result is a green word that says List all the way down the column.  But if I select the whitespace beside any of those List items, we can see what it is contained within.  Shown below is the list for the Stuffed Koala row:

image

Okay, so we now have a list of what we need…

Putting it all together

The final step is to put these together.  So let's add a new column again, but this time we'll use that List.Range() function instead of #"Changed Type"[Sales] as shown below

    • Go to Add Column –> Add Custom Column
      • Formula Name:  Success
      • Formula:

List.Accumulate(
List.Range(#"Added Index"[Sales],0,[Index]),
0,
(state, current) => state + current
)

And the result gives us what we were originally looking for:

image

The only thing left to do is remove the columns we used along the way.  Of course, we could just remove those steps, as they never really needed to happen, but I'm going to select them and remove them so that you can see the work in progress.

And sure enough, we get what we need!

image

Live Power Query and Power Pivot Training in Melbourne: Next week!

I know that this comes with limited notice but… as many of you know I'm currently in Sydney, Australia, and I'll be in Melbourne in a couple of days for Excel Summit South.  Well, as it happens, I'm actually staying in Melbourne for another week to deliver some live Power Query and Power Pivot training for a client.

Well guess what… we still have a bit of room, so we are going to open it up to the general public.  If you're interested in a full day of hands on training on either Power Pivot or Power Query, check out what we are doing at Parity Analytic's website, or download the individual brochures here:

(Registration information is included in the links above)

I'm very much looking forward to being able to share with a few more people, and hopefully you can be one!

Creating a Banding function in Power Query

I got a question on the blog recently about creating a banding function in Power Query, or creating buckets for Accounts Receivable transactions.  (30-60 days, 60-90 days, etc..)  As this is something that can be applied to a lot of areas, I thought it might make a good post to cover.

If you'd like a copy of the sample workbook, you can find that here.

 The need for a Banding function

Picture that you have a list of transactions that could be from 1 – 170 days overdue, and you'd like to group them as follows:

  • 0-30 days (current)
  • 31-60 days
  • 61-90 days
  • 91-120 days
  • >120 days

You could create a table with 365 days in column 1 and the appropriate description in column 2, then merge them, but that seems like a lot of work.  It would be much easier to create a simple little function that banded them correctly for us.  Especially if you happen to have a little template that you can refer to…

The Banding function

The banding function template we need is shown below:

image

Notice the key parts here:

  • days (highlighted in yellow) is the variable that we'll pass into our function to evaluate
  • ARBand is the name of our function
  • Between the indented curly braces we have a list of the potential outcomes we'd like to use for our bands.  If the value of x (which we will test) is less than 31, it is labelled "Current".  If not, then -- if it's less than 61 -- it is labelled "30-60 Days" and so on.  The final clause (=>true) basically returns an "else" statement.
  • The Result line then checks the days variable against the list and returns the correct match or the "else" clause if no match is found (">120 Days" in our case)

This banding function is a super useful template that you can modify to suit for any grouping needs.  If you are updating this function for your own scenario, make sure that the yellow pieces match, the orange pieces match, then change the number bands and offsetting text pairs (ensuring that the remain wrapped in quotes.)

You can add as many steps (bands) as you need, just make sure that each line ends with a comma, and the =>true line stays at the end of the list.

To implement the function:

  • Create a new query –> from blank query
  • Enter the Advanced Editor
  • Paste in the code shown above
  • Modify your bands to suit
  • Click OK to exit the advanced editor
  • Name the function

I obviously didn't need to edit mine, and I called mine "DayBanding".

Setting up the data

There are two pieces that I need to deal with for my scenario.  I have a transactions table, but it only lists the original transaction dates.  In order to work out the day bands, I need to create a way to show how many days have been elapsed.  Easy enough to do, I just need to pull in today's date from somewhere.

So I created a simple table that holds today's date:.  (It's hard coded in the same file, since the transaction dates are hard coded as well.)  Regardless, it looks like this.

image

And here is an excerpt from the table of transactions:

SNAGHTML5829669

Grabbing today's date

Since I'm going to need the date to work out the number of days outstanding, I'll start there.  The steps to accomplish this:

  • Select a cell in the parameter table –> New Query –> From Table
  • Rename the query to "Today"
  • Click the fx icon in the formula bar
  • Modify the formula to show as follows:
    • = Date.From(#"Changed Type"[Value]{0})

(I've discussed this technique a lot on the blog in the past – like in this post – but it basically we are drilling in to the first item in the [Value] column of that table, then wrapping the item with the Date.From() function to extract the date.  We'll use this shortly, but first…

  • Go to home –> Close & Load To… –> Only create connection

And we now have a way to pull up the date when need.

Grabbing the transactions table

Next I needed to pull in the ARTransactions table, include the date, work out the number of days outstanding, then band it all.  Here's the steps I used:

  • Select a cell in the ARTransactions table –> New Query –> From Table
  • Add a Custom Column
    • Name:  Today
    • Formula:  =Today

This works since we called our original function Today, and we drilled right in to the date.

SNAGHTML594999e

Next up, I needed to subtract the Transaction Date from Today's Date:

  • Select the Today's Date column
  • Hold down CTRL and select the Transaction Date column
  • Go to Add Column –> Date –> Subtract Days

SNAGHTML5972c5f

Using the Banding function

The final step is to call the banding function and classify our days:

  • Add Column –> Custom Column
    • Name:  Day OS
    • Formula:  =DayBanding([DateDifference])
  • Right click the Today's Date column –> remove

And we have a nice table that has the grouping level we need:

SNAGHTML59a3950

Another little trick…

Now I'd like to build a Pivot Table using this, but I'm not really in love with the idea that I have to load this data to a table first.  I mean really, I only added a single column.  Normally I'd load this to the data model, but I don't really need Power Pivot for what I want to do.  So let's take a look at another little trick that will let us avoid the data duplication that would be caused by loading this to either the Data Model or the Worksheet.

  • Close & Load To… –> Only Create Connection

Now we need to build the Pivot Table.  I'm going to show the steps for this in Excel 2016 (because I'm working on a computer that only has Excel 2016), but you should be able to make this work in Excel 2010/2013 as well.

  • Insert –> Pivot Table
  • Choose External Data Source (yes, you read that right) –> Choose Connection

In this window, your queries should show up!

image

  • Select the Query – ARTransactions –> Open
  • Choose to place your Pivot Table wherever you'd like it –> OK

Configure the Pivot Table as follows:

  • Rows:  Customer
  • Columns:  Days OS
  • Values:  Amount

And with a couple of sorting and formatting changes, I've got this thing of beauty:

image

Final Thoughts

I showed a couple of tricks here:  How to use a Banding function, and how to build a Pivot Table directly against a connection only query without having to go through Power Pivot.  Both useful things that you should have in your arsenal of tools.  Smile