Values Become Text After UnPivoting Other Columns

Have you ever set up a nice query to UnPivot other columns, only to find that the query data types change when you add new columns?  This post will cover why values become text after unpivoting other columns.


We’ve got a nice little table called “Data” showing here.  Nothing special, it just summarizes sales by region by month, and our goal is to unpivot this so that we can use it in future Pivot Tables.  (You can download the source file here.)


Now, you will notice that April’s sales are outside the table. This is by design, and we’ll pull it in to the table later when we want to break things.  Smile

UnPivoting Other Columns – The Hopeful Start

If you’ve been following my blog for any period of time, you’ve seen this, but let’s quickly go over how to unpivot this:

  • Select a cell in the table
  • Go to Power Query (or Data in Excel 2016) –> From Table

We’re now looking at the Power Query preview of the table:


Great, now to unpivot…

  • Hold down the Shift key and select the Country and Prov/State column
  • Right click the header of either of the selected columns and choose Unpivot Other Columns
  • Right click the headers of the two new columns and rename them as follows:
    • Attribute –> Month
    • Value –> Sales

Re-Pivoting from the Data Model

With the table complete, I’m going to load this to the data model and create a Pivot Table:

  • Go to Home –> Close & Load –> Close & Load To…
  • Choose to Load to the Data Model

The steps to create the Pivot depend on your version of Excel:

  • Excel 2013: Go in to Power Pivot –> Home –> PivotTable and choose a location to create it
  • Excel 2016: Click any blank cell and go to Insert –> PivotTable.  As you have no data source selected, it will default to using the data model as your source:


With the PivotTable created, I’ve configured it as follows:

  • Rows:  Country, Prov/State
  • Columns:  Month
  • Values:  Sales

And that gives me a nice Pivot like this:


Let’s Break This…

Okay, so all is good so far, what’s the issue?  Now we’re going to break things.  To do that, we’re going to go back to our original data table and expand the range:


In the picture above, I’ve left clicked and dragged the tiny little widget in the bottom right corner of the table to the right.  The table frame is expanding, and when I let go the Apr column turns blue, indicating that it is now in the boundaries of the table.

With that done, I’m going to right click and refresh my Pivot Table, leaving me with this:


Huh?  Why was the sales measure removed?  And if I drag it back to the table, I get a COUNT, not a SUM of the values?  And even worse, when I try and flip it back to SUM, I’m told that you can’t?  What the heck is going on here?


Importance of Power Query Step Order

To cut to the chase, the issue here is that when we first created the table in the data model, the Sales column was passed as values.  But when we updated the data to include the new column, then Sales column was then passed entirely as text, not values.  Naturally, Power Pivot freaks out when you ask for the SUM of textual columns.

The big question though, is why.  So let’s look back at our query.

Our original data set

If we edit our query, we see that the steps look like this:


To review this quickly, here’s what happened originally

  • Source is the connection that streams in the source data with the following columns:


  • Changed Type set the data type for all the columns.  In this case the Country and Prov/State fields were set to text, and the Jan, Feb & Mar columns were set to whole number.  We can see this by looking at the icons in the header:


Note that if you don’t have these icons, you should download a newer version of Power Query, as this feature is available to you and is SUPER handy


  • We then selected the Country and Prov/State columns and chose to Unpivot Other Columns.  Doing so returned a table with the following headers


Notice that the first three columns are all textual, but Sales is showing a numeric format?  Interestingly, it’s showing a decimal format now, but it shows the numeric format because all unpivoted columns had explicitly defined numeric formats already.

The final steps we did was to rename our columns and load to the data model, but the data types have been defined, so they were sent to the data model with Sales being a numeric type.

Why Values Become Text After UnPivoting Other Columns

Okay, so now that we know what happened, let’s look at what we get when we step through the updated data set.

  • First we pulled in all the columns.  We can plainly see that we have the new Apr column:


  • The Changed Type step is then applied:


Hmm… do you see that last data type?  Something is off here…

So when we originally created this query, Power Query helpfully pulled in the data and applied data types to all the existing columns.  The problem here is two-fold:  First, the Apr column didn’t exist at the time.  The second problem is that Power Query’s M language uses hard coded names when it sets the data types.  The end effect is that upon refresh, only the original columns have data types defined, leaving the new columns with a data type of “any” (or undefined if you prefer).

  • We then unpivoted the data, but now we see a difference in the output


Check out that Value column.  Previously this was a decimal number, now it’s an “any” data type.  Why?  Because there were multiple data types across the columns to be unpvioted, so Power Query doesn’t know which was the correct one.  If one was legitimately text and Power Query forced a numeric format on it you’d get errors, so they err on the side of caution here.  The problem is that this has a serious effect on the end load to Power Pivot…

  • Finally, we renamed the last two columns… which works nicely, but it doesn’t change the data type:


Okay, so who cares, right?  There is still a number in the “any” format, so what gives?

What you get here depends on where you load your data.  If you load it to the Excel worksheet, these will all be interpreted as values.  But Power Pivot is a totally different case.  Power Pivot defaults any column defined as “any” to a Text data type, resulting in the problems we’ve already seen.

Fixing the Issue

For as long as we’ve been teaching our Power Query Workshop, we’ve advocated defining data types as the last step you should do in your query, and this is exactly the reason why.  In fact, you don’t even need to define your data types in the mid point of this one, that’s just Power Query trying to be helpful.  To fix this query, here’s what I would recommend doing:

  • Delete the existing Changed Type step
  • Select the final step in the query (Renamed Columns)
  • Set the data type for each column to Text except the Sales column, which should be Decimal Number (or currency if you prefer)


When this is re-loaded to the Data Model, you’ll again be able to get the values showing on the Pivot Table as Sum of Sales.

Avoiding the Issue

Now, if you don’t want Power Query automatically choosing data types for you, there is a setting to toggle this.  The only problem is that it is controlled at a Workbook level, not at a global Excel level.  So if you don’t mind setting it for every new workbook, you can do so under the Power Query settings:


Is Changed Type Designed in the Correct Way?

It’s a tough call to figure out the best way to handle this.  Should the data types be automatically hard coded each time you add a new column?  If the UnPivot command had injected a Changed Type step automatically, we wouldn’t have seen this issue happen.  On the other hand, if a textual value did creep in there, we’d get an error, which would show up as a blank value when loaded to Power Pivot.  Maybe that’s fine in this case, but I can certainly see where that might not be desirable.

Personally, I’d prefer to get a prompt when leaving a query if my final step wasn’t defining data types.  Something along the lines of “We noticed your final step doesn’t declare data types.  Would you like me to do this for you now (recommended)” or something similar.  I do see this as an alternate to the up-front data type declaration, but to be honest, I think it would be a more logical place.

October News and Events

It’s a busy month here at Excelguru. Instead of a technical post we wanted to catch everyone up on our October news and events!

Live Course: Master Your Excel Data October News and Events

Ken is teaching a LIVE, hands on course in Victoria, BC on Friday, October 21 from 9:00am-4:30pm. This session is great for anyone who has to import and clean up data in Excel and will change the way you work with data forever! Ken will teach you how to use Excel Tables, Pivot Tables and Power Query. Space is limited to only 20 attendees, so don't miss out on your chance to sign up. For full details and to register for the session, visit:

October News and Events: Power BI Meet-up

The next Vancouver Power BI User Group meet-up is happening on Thursday, October 13 from 5:30-7:00pm. Scott Stauffer, Microsoft Data Platform MVP, will be presenting on How to Operationalize Power BI. Together we’ll look at some solutions that might help pass your Power BI solution over to IT to manage enterprise-wide. Dinner and soft drinks will be provided. View the full details and sign up to attend at:

Microsoft MVP Award Received

For the 11th straight year, Ken has received the 2016 Most Valuable Professional Award from Microsoft! The previous 10 years, Ken’s award has been in the Excel category, but this year’s award is in the Data Platform category. The new category reflects the work he’s been doing this past year with Power Query and Power BI. Congratulations Ken, your guru status remains assured.mvp_horizontal_fullcolor

Our Team Has Grown

As we mentioned the other day, Rebekah Sax has recently joined the Excelguru team. She brings with her a wealth of experience in marketing, communications, event planning and administration. Please join us in welcoming Rebekah as she helps us make new connections and continue to grow.

Opportunity: Influence Excel Charting Features

Hi everyone, I’ve got a quick message/opportunity for you from the Excel team.  In their words:

This survey is being conducted by the Microsoft Excel team. In the future, new charting features will be added to Excel. We want to better understand how you expect charts, that have new features, to be displayed and behave in older versions of Excel that don't have these new features.

This survey will take approximately 10 minutes. Feedback from this survey will be used to improve the user experience in Excel. Responses to this survey will not be associated with any personal information.  Please see the Microsoft Privacy Statement for further privacy details on all Microsoft products.

Thank you for participating,

Excel Team

Click to Start Survey

Fix: Excel Formulas don’t update in Power Query tables

If you’re new to Power Query, chances are you’re more comfortable doing tricky mathematics using Excel formulas, rather that Power Query formulas.  No shame there, but you’ve probably run into a situation where you set up the formulas, refresh your query and the Excel formulas don’t update in Power Query 's output table.

I’ve worked with this issue for a long time, and it’s actually caused me to avoid using Excel formulas in tables generated via Power Query all together. Having said that, there is now an easy way to fix this which renders that avoidance obsolete.

The Issue:  Excel Formulas don't update in Power Query tables

Let’s take a quick look at this scenario.  We have a simple table called Animals as follows:


And it gets landed in another table.  But in this table, we added a new column called “Est” to the end, which holds the following formula: =[@Price]*[@Quantity]


So far so good, but what happens when we add a new line to our Animals table and refresh it?


Plainly, this is not good at all!

The Fix:  Excel Formulas don't update in Power Query tables

The fix is remarkably simple, once you know what to do:

Step 1: Change the Table Design Properties

  • Select any cell in the OUTPUT table (the green one)
  • Go to Table Tools –> Design –> Properties (External Table Data group)


  • Check the box next to Preserve column soft/filter/layout and click OK


Now, at this point, nothing appears to change.  In fact, even refreshing the table seems to make no difference.

Step 2: Ensure the Formulas are consistent

The reason the formulas didn’t fill correctly for us is different now.  It is entirely based on the fact the formula in the last column is no longer consistent.  Naturally, that means that Excel won’t auto-fill the formula, as it doesn’t know which is correct (the formulas or the blank cell.)  We need to fix that before this will work for us.

  • Copy from the first formula cell down the entire column (I've got reports that this DOES matter, and that copying from another cell may not fix it.)

Our data should now look something like this:


Step 3:  Test it

And now, when we add new data and refresh the Power Query…


Wrap-up Thoughts

On my Excel 2016 this behavior is now default.  I don’t know when it changed, to be honest.  And if your behavior is different, I’d love to know.  I’m running the Office Pro Plus subscription – first release.

On Excel 2010/2013, the old default of not updating the tables appears to prevail.  It’s actually for this reason that I covered this, as it came up as a question in my Power Query forum.

I’m not sure if this is good or bad, but this setting can/must be managed for each output table individually.  There doesn’t seem to be a way to set one behavior or other to apply to all tables.  To be honest, I think they’ve got it right in Excel 2016, so at least it’s fixed if you’re current.  (And for reference, my understanding is that this required a patch to Excel, not Power Query, which is why I suspect that we likely won’t see it fixed for Excel 2010/2013.)

Do you use (legacy) Get External Data features?

Hey everyone,

A contact at Microsoft is looking for people who DO NOT use Power Query extensively, but DO use “legacy” methods to get external data into Excel.  These “legacy” methods include using:

  • Any/all of the commands from Data (Tab) –> Get External Data (group)


  • VBA to retrieve, clean up and/or land data in Excel

In his own (okay, slightly edited) words, my contact is looking for users:

…who are importing data using the classic Get-Data Excel capabilities and have very little (or zero) familiarity with Power Query. [I’d like to] get their feedback on new ribbon sketches.

UPDATE:  My contact at Microsoft now has sufficient users for his study, so no more users will be accepted at this point.  Thanks!

This is a cool opportunity to get in touch with Microsoft and give your impressions and feedback on some potential designs that they are thinking about for future product improvements.  If you are interested and are not a heavy Power Query user, just drop a comment below (please don’t include your email.  I can access that via the blog control panel.)

July 2016 Power Query Update

Hey folks,

I'm actually on vacation, so this post is going to be short.  I just wanted to make sure you all are aware that there is a new Power Query update available.

New features in the July 2016 Power Query update:

  • New SAP HANA connector.
  • New SharePoint Folder connector.
  • New Online Services connectors category.
  • Improved DB2 connector, now leveraging the Microsoft driver for IBM DB2.
  • Improved Text/CSV connector, now exposing editable settings in the preview dialog.
  • Improved relational database connectors, now including Schema information as a part of the Navigation hierarchy.
  • Data Source Settings enhancements, including “Change Source” capability.
  • Advanced Filter Rows dialog mode within the Query Editor.
  • Inline Input controls for Function invocation within the Query Editor.
  • Support for reordering Query Steps within the Query Editor by using drag and drop gestures.
  • Date picker support for input Date values in Filter Rows and Conditional Columns dialogs.
  • New context menu entry to create new queries from the Queries pane within the Query Editor.

My Thoughts (without actually using it yet)

Now you can get full pictures at the official blog from Microsoft, but I'll just call out a couple that I think are pretty darned important from a usability perspective.

  1. Continuing with last month's update where we got Drag and Drop for the query groups, we now get Drag and Drop for the query steps.  That is just plain AWESOME.
  2. The new Advanced Filter dialog looks pretty good.
  3. The Date Pikcer also looks pretty helpful.
  4. A context menu to create new queries is also SUPER helpful.  One thing I'd like to see added here, is the ability to set each new query to load to connection/table/data model from INSIDE the query editor.  (Currently, the choice you make is applied to ALL new queries - the main reason I have my defaults set to load to connection only.)

June 2016 Power Query Update

Yesterday, Microsoft released the June 2016 Power Query update.  Even though there are only four items on the list of new features, some of them are quite impactful.

What’s new in the June 2016 Power Query Update

The four new features are:

  • Conditional Columns
  • Column type indicator in Query Editor preview column headers
  • Reorder Queries and Query Groups inside Query Editor via drag and drop gestures
  • Query Management menu in Query Editor

Microsoft has a blog on this here, but let me hit these quickly in reverse order to give my comments as well:

Query Management Menu in Query Editor

Honestly, to me this is kind of a throw away waste “button for the sake of a button” kind of feature.


Does it make things more discoverable?  Maybe.  But we can get to all these features by right clicking the query in the Queries pane on the left of the editor.  Personally, I would have rather seen them give me a feature to “pin” the Queries pane open and set that as a default, as I find the navigation from that area much more useful:



Reorder Queries via Drag and Drop

This is great… so great in fact, that the only real question is why it hasn’t worked in the past.  Time & resources is the answer, but it’s now working the way you’d expect it to work.


PS, if you don’t know how to group your queries, right click on one, say “Move to Group” and select New Group.  Pretty handy for keeping things organized.

Column Type Indicator

This is BY FAR the most important of the upgrades.  The reason is that this has been a deadly area of weakness since day one.  If you’ve ever been burned by an “any” data type, you know why.  And if you haven’t… hopefully this will help ensure you don’t.

We can now plainly see which columns have been defined with each data type:


Notice how easy it is to tell that the “Client, Task and Notes” fields are text (as shown by the ABC icon in the column header.)  Hours is a decimal number, rate is a whole number, and Date… is undefined.  That one needs attention as indicated by the question mark.  Very visual, and very badly needed for  a long time.  This one feature is, in my opinion, worth the upgrade.

Conditional Columns

This is also a pretty cool feature, as it lets a non-coder build an if then else (if) statement.  Full caveat here: this is the image from the official Microsoft blog, not one of mine, but it shows you the general idea:

June-2016-updates-for-Get-Transform-in-Excel-2016-1.png (1282×809)

As cool as this is, there are some issues here:

  1. You can only feed out full columns as outputs, not formulas/equations.  So if I wanted to check a column and return [Hours]*[Rate] in one case and [Hours]*1.5*[Rate] in others, it won’t work.  (Instead I’ll get text.)  To do that you’ll still need to write your formulas manually.
  2. You can’t provide IFERROR style logic to check if something errors and react accordingly.  To do that you’ll still need to create your own custom column formula using the “try otherwise” formula.
  3. Assume you created a custom column using the “Add Custom Column” button, and manually wrote your “if then else” formula.  You then committed it and want to change the logic, so you click the gear icon in the applied steps window… and you’ll be taken to the Conditional Column interface shown above, not the original window where you can create more complex logic.  So if you want to modify that formula to be more complex than this new interface allows, you’re now going to have to go to the Advanced Editor window.  I have suggested to Microsoft that they need a button to return is to the previous interface for this scenario.

Despite the shortcomings, we should recognize that this is a great new feature.  You can test if one column compares (match, doesn’t match, greater than, etc) another column or specific value without having to manually write any M code formulas.  You also aren’t obligated to feed out a column’s value, but rather can feed out text or values too.  So as long as your logic needs are fairly simple, you can use this feature.

Download the June 2016 Power Query update

You can pick it up from Microsoft’s site here:

Also, I’ve started holding on to the previously released installers should you ever need to regress to a prior version.  You can find the installers I have in my forum here:

Display Last Refreshed Date in Power BI

One of my favourite tricks to use on a dashboard is to show the Last Refreshed Date.  This is important as it lets you know how stale your data is.  I think it’s even more important with Power BI, as it tells you if your automatic refresh is actually working!

Last Refreshed Date for Power Pivot

I’ve actually covered this topic in the past for Power Pivot solutions. You can find that article here if you’re interested.

Display the Last Refreshed Date in Power BI

Generate Last Refreshed Date with Power Query

Unlike the previous article, to work with Power BI, we need to generate the Last Refresh date ourselves using Power Query.  No big deal, it is as simple as this:

  • Open PowerBI Desktop
  • Get Data –-> Blank Query
  • Go to Home –> Advanced Editor and replace the code in the window with this:

Source = #table(type table[LastRefresh=datetime], {{DateTime.LocalNow()}})

  • Click Done and rename the query “LastRefresh_Local”
  • Click Close & Apply

This code makes a nice little table that returns the current date and time each time we refresh the solution.

Create the Last Refreshed Measure

To finish it off, we need to create a simple measure and add it to our dashboard.

  • Go to your Report window –> New Measure –> Define it as follows:

Last Refreshed (Local) = FORMAT(LASTDATE(LastRefresh_Local[LastRefresh]),"mmm dd, yyyy hh:mm:ss AM/PM")

  • Now let’s create a Card visual to hold it:


So far everything looks really good here.  The card is formatted nicely, and I updates to the current date and time every time I hit the refresh button.  It’s a thing of beauty!  What could possibly go wrong with this?

Publishing to the Power BI Service

Encouraged with our visual, we’ll now publish it to Power BI:

  • File –> Publish –> Publish to Power BI
  • Login to the Power BI service and scroll down to the Datasets area

At this point you should click the little … to the right of the dataset you uploaded (whatever you called it) and click Refresh.  It will prepare for a bit, then should refresh.  If your experience is anything like mine, your report shows this:


At first, you think “Hey cool!” until you realize that this is the data – to the second – of your original upload.  So even after you refresh the data set, you need to refresh the report:


And when you do, you get this:


Um… WHAT?  That’s 7 hours different!  How is this possible?

So I’m in the Pacific time zone, which is currently UTC-7.  (We are UTC-8 when we are not on daylight savings time.)  So apparently the date that is showing here is giving me the UTC date.

This really sucks, obviously. I was really hoping that the dates and times would show based on my browser settings or something, but obviously not.

I tried a bunch of things to fix this, including forcing the column to a DateTimeZone data type, but that made the DAX formula choke, as it only accepts DateTime data types.  I tried various combinations using DateTimeZone.UtcNow(), DateTimeZone.LocalNow(), adding periods, subtracting them and more.  The killer though was that as soon as the time zone got removed (which is necessary for the DAX measure to work), Power BI displayed the time assuming it was UTC. So the issue I’m seeing here is that – no matter the DateTime you feed Power BI – it thinks it has the same time zone as the server.  Not so good.

How to Display the (Correct) Last Refreshed Date with Power BI

Just to be clear here, the title above is actually a heinous misrepresentation.  The timestamp was correct, it’s just being interpreted incorrectly by Power BI (but not Power BI desktop).  So I had to fake it out.

I reached out to Chris Webb for his thoughts, and he suggested hitting an API.  I’d already tried looking for a page, but picked up on this API which turned out to be quite useful:  Even more interesting is that both and return the same thing.  I’m hoping that this means it is going to allow me to survive daylight savings time with no adverse impacts…

So armed with that, I decided that the only way I was going to get this to work was to pull back the UTC offset, and add it to my date.  That effectively returns a bogus date that Power BI will represent in UTC to match my time zone.  Complicated, no?

Collecting the Current PST Date/Time

I don’t think I’m going to clearly document all the steps for this, as that would make this article really long.  Instead, here’s how you can implement this:

  • Create a new Blank query and paste this code in the Advanced Editor

Source = Web.Page(Web.Contents("")),
Data0 = Source{0}[Data],
Children = Data0{0}[Children],
Children1 = Children{1}[Children],
#"Removed Other Columns" = Table.SelectColumns(Children1,{"Text"}),
#"Inserted Last Characters" = Table.AddColumn(#"Removed Other Columns", "Last Characters", each Text.End(Text.From([Text], "en-US"), 6), type text),
#"Changed Type2" = Table.TransformColumnTypes(#"Inserted Last Characters",{{"Text", type datetimezone}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type2",":00",".0",Replacer.ReplaceText,{"Last Characters"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",":30",".5",Replacer.ReplaceText,{"Last Characters"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Replaced Value1",{{"Last Characters", type number}, {"Text", type datetime}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "LastRefresh", each [Text]+#duration(0,Number.From([Last Characters]),0,0)),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom",{"LastRefresh"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Other Columns1",{{"LastRefresh", type datetime}})
#"Changed Type"

  • Click Done and rename the query as “LastRefresh_API”
  • Click Close & Apply

The basic gist is that it does the following:

  • Pulls the web page and drills into the single cell we need
  • Splits off the time zone information, replacing the minutes with fractional hours and turns it into a decimal number
  • Extracts the original date/time (without the time zone) and adds the time zone offset to that value.  This essentially subtracts 7 hours from the original date for me right now
  • Performs as bit of cleanup to drill into the faked date/time we need

(If you want to use this for your time zone, I believe all you should need to do is change the “pst” portion of the URL to your time zone name.)

Create a new Last Refreshed Measure

Creating this measure is pretty much the same as the last one.

  • Go to your Report window –> New Measure –> Define it as follows:

Last Refreshed (API) = FORMAT(LASTDATE(LastRefresh_API[LastRefresh]),"mmm dd, yyyy hh:mm:ss AM/PM")

  • And create a Card visual to hold it

You should now have this odd looking contrast:


Great, except that… it is 3:22 PM for me, not 8:22 AM.  But fine, whatever, let’s publish this and see what happens.

Publishing to the Power BI Service

To publish the file, we take the following steps again:

  • File –> Publish –> Publish to Power BI
  • Confirm that we want to overwrite the data set
  • Login to the Power BI service and scroll down to the Datasets area
  • Click Refresh Now

At this point you might be told you can’t as you haven’t provided credentials to the data set.  Just Sign In as Anonymous and you’ll be fine:


And of course, now we can refresh the Report as well.  And when we do…


As you can see, the API version is showing the correct time here, not the Local version.


The biggest takeaway here is that you have to care about how your audience is using your files.  If you are going to deploy your reports by emailing *.pbix files around the company, the setting up a table using the DateTime.LocalNow() will work great.  But if you are going to publish your files to the web… that obviously throws in some wrenches.  Here’s a side by side view of those two alternatives:


Probably the most frustrating thing to me is that I was trying to find a solution that will show it correctly in BOTH Power BI Desktop and Power BI Online.  There’s nothing worse than working with a report you know is broken depending on where you look at it.  🙁


If you want the file so that you can try it out in your own PowerBI service… here you go.  🙂

New Vancouver Power BI User Group

I’ve been thinking about this for a while and, after discussing it with a couple of others who are passionate about Power BI…  I’m pleased to announce that we have created a new Power BI User Group in Vancouver, BC!

What is the Power BI User Group about?

The goals of this user group are fairly simple:

We plan to meet monthly, and have a presentation on using Power BI technologies.  (This could be Power BI Desktop, Excel, Power Query or Power Pivot.)  Whatever it the presentation, and no matter how focussed it is on a specific area, it will ultimately be relevant to the over-arching Power BI path of taking your data from raw form to a published dashboard.  This user group is basically dedicated to bringing you content to inspire you and make you an expert in the Power BI technologies in your company.

Our secondary goal is to be a networking group for Power BI professionals.  If you’ve ever felt like the only one in your company that actually understands what you do… well that’s why we are here.  To give you someone to swap stories with, get ideas and maybe even change your career goals.  🙂

Oh… and did I mention that another goal we have is to keep these events free for attendees?

How can you get involved?

There’s actually a few ways you can get involved with us…

If you’re looking to attend…

Then it’s simple.  Sign up at our Meetup site.  Then attend a meeting. That’s it.  No cost, no fuss.  All we ask is that you register in advance and attend if you say you’re coming.  (We have limited space in our venue right now, only able to seat about 25 people.)

If you’d like to sponsor the event…

We are looking for a sponsor to cover the cost of pizza and soft drinks for our user group attendees.  It shouldn’t be much, and we’d be happy to tell everyone how awesome your company is.  If you’d like to come on board as a sponsor, please get in touch with me via my contact form.

If you’d like to speak…

Got something cool that you’ve built using the Power BI technology stack?  Would you like to talk about how to actually get Power BI traction in a corporate environment?  Got some other relevant topic that you’re passionate about?  Come to an event and chat with us.  One of our key goals is to make sure we have good variety in our speakers!

When is the first meeting?

Great question!  We’re going to be meeting Thursday, July 14 at 5:30pm in downtown Vancouver.  I’ll be presenting on how to build this self updating Power BI dashboard which is originally sourced from PDF files.

Keep The Most Recent Entry

This week’s post was inspired by a question in my Power Query help forum.  The poster has a set up data, and needs to keep the most recent entry for each person from a list of data.


I never saw the user’s real data, but instead mocked up this sample, which you can download here:


Obviously it’s fairly simple, and the key thing to recognize here is what we’re after.  What the user needed was this:


As you can see, we want to keep the most recent entry only for each person.  In the case of Fred and Mark that is Mar 31, but Jim didn’t have any activity for March, with his last entry being Feb 29.  So how do we do it?

1st attempt to keep the most recent entry

I figured this was pretty easy, so advised the poster to do the following:

  1. Pull the data into Power Query
  2. Sort the Date column in Descending order
  3. Use the Remove Duplicates command on the Student column
  4. Give the query a name (I called mine “Unbuffered” for reasons that will become clear)
  5. Load it to the worksheet

Easy enough, right?  Except that we actually got this:





Huh?  What the heck is going on?  I tried changing the dates to text in an attempt to steal away Power Query’s ability to sort based on dates.  (Okay, it was a shot in the dark, and it didn’t work.)

As it turns out, the “Table.Distinct” command that is used to remove duplicates IGNORES previous sorts, going back to the original data sort order.  I’ll admit that this completely shocks me, and is not at all what I’d expect.

So how do you keep the most recent entry?

There’s a few potential ways to deal with this:

  • Sort the data before it comes into your query.  This could potentially be done in a staging query, via a SQL sort command or some other method.  The challenge is that this isn’t always practical (using that custom SQL query breaks query folding, right?)
  • Issue some kind of command (like a group by) that creates a new table which is already sorted in the correct order.  Again, this would work, but really seems unnecessary unless you have some other need to do so.
  • Sort the table, then buffer it before removing duplicates.

Huh?  “Buffer” it?

Using Table.Buffer() to help keep the most recent entry

I’m not a master of explaining Table.Buffer() (yet), but basically you can look at it like this:  It pulls a copy of the table into memory, preventing Power Query from recalculating it.  This can be super useful if you’re passing tables to functions, but in this case can help us lock down the previous query steps before applying the duplicate removal.  When the query state is buffered, that is the “most recent” copy that Power Query will revert to.

Rather than adjust the previous query, here’s what I did in order to create the working solution:

  • Duplicated the “Unbuffered” query
  • Renamed the new query, calling it “Buffered”
  • Selected the “Sorted Rows” step we generated (just before the “Removed Duplicates” step
  • Clicked the fx icon in the formula bar


As I’ve mentioned a few times on this blog, this creates a new step that simply refers to the previous query step.  I then just wrapped the text for the new Custom1 step in the formula bar with Table.Buffer():



And when you hit Close & Load, you get a different result that our previous query… you get the result we actually wanted:


So what’s happening here?

First, just to be clear (before Bill or Imke call me out on this), inserting the new step wasn’t entirely necessary.  I only did this to demonstrate the key difference in a distinct step of it’s own.  I could have easily just wrapped the Sorted Rows step in Table.Buffer() and it would have worked fine.  🙂

The key difference here is that the Table.Distinct() command we use the Removed Duplicates step will go now only go back so far as the buffered table.  From the Excel user’s perspective, it’s kind of like we’ve been able to copy all the steps before this, and lock them in with a PasteSpecial command, and point Power Query to that version of the data instead of looking back at the original.

Cool!  I’m going to use Table.Buffer everywhere!

Um… don’t.  That’s actually a really bad idea.

Table.Buffer() needs to be used with a bit more care than that.  Every time you buffer a table, it needs to be processed and written into memory.  That takes resources.  You only want to use this command when it makes sense.  Some places where it does:

  • When you need to lock down previous steps to prevent things being ignored, like in the case above
  • When you want to pass a table to a function.  If you don’t buffer it first, the table may get re-calculated/refreshed before being passed into the function.  If you’re doing this for every row, that can be a lot of re-calculations.  In this case it may make sense to Buffer the table, then send the buffered table into the function.  Even though the buffering takes overhead, it only happens once, which can speed things up

Just also remember that the instant you buffer your table, you break any query folding ability, as the data is now in Excel’s memory space.  Something that is worth consideration if you’re doing large operations against a database.