Merge Files With Different Column Headers

A client contacted me today asking how to merge files with different column headers in Power Query.  The issue she's facing is that some of the files in her folder have a column called "customer", where others have a column called "ship to/customer".  Plainly there has been a specification change somewhere down the line, but it's causing issues in the combination - an issue that would affect either Excel or Power BI.

What happens when we try to merge files with different column headers?

In order to replicate this issue, I created two very simple CSV files as shown here:

image

I dropped these into a folder called "Test" and then

  • Created a new query From File --> From Folder
  • Renamed the query to FilesList (making a query that I can use to easily sort/filter the list of files later)
  • Right clicked the query in the Queries pane --> Reference
  • Renamed this query to Transactions
  • Clicked the Combine Binaries button

At this point I was presented with the following window:

image

The only thing I really want to point out here is that I choose the Example file which has the column name that I do NOT want.  (I want to rename "ship to/customer", so it's important that it show up here.)

I then clicked OK, and was presented with this:

image

Err.. wait… what happened to my customer column?

Why is the Customer column missing?

To understand this, we need to look at the steps in the Transaction query:

image

If you were to click on the "Invoked Custom Function1" step, you'd see that it adds a new column to the Transaction query.  The first table shows 3 columns where the first column is "ship to/customer".  The second table also shows 3 columns, but in this the first is "customer".  So all is working so far.

But then, when you get to the the "Expanded Table Column1" step of the Transactions query, it expands to show only the "ship to/customer" column.  Why?  It's because of the following M code generated by Power Query:

= Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Transactions", Table.ColumnNames(#"Transform File from Transactions"(#"Sample File")))

What this means in English is that it reads the columns from the table in the first sample file. That's not super helpful.

Now we could work on trying to enumerate all headers, but that would be a pain, as the code is complicated and still leaves us in a place where we would need to combine both columns anyway.  Let's fix this by dealing with it at the source.

How to merge files with different column headers properly

Step 1: Prepare the Transactions query:

Delete the Changed Type step at the end of Transactions query.  This is because it is setting the "ship to/customer" column to text, and by the time we're done, that column will be called "customer".  If we leave the step as is, it will cause an error.

Step 2: Modify the Transform Sample query:

Next we need to select the Transform Sample query:

image

Now, what we want to do is rename that "ship to/customer" column to make it "customer".  So let's do that:

  • Right click "ship to/customer" --> Rename --> "customer"

The problem here though, is that when we apply this to our other files, THIS will cause an error. Why?  They don't have a "ship to/customer" column to rename.  So we need to wrap this in an error handler.

To do this, we need to adjust the formula that was just created to wrap it in a "try/otherwise" clause.  This is essentially equivalent to Excel's IFERROR() formula.  If it works, it will return the result.  If not, it returns an alternate item, which we will set to be the previous step in the query.  In other words "Try to rename this column.  But if it fails, give me the original table.

The keys here are to

  • Insert the try and otherwise in the correct location (remember they are case sensitive)
  • Get the right syntax for the previous step name (remember to wrap it in #" " if the step name has a space in it.)

In this case, it should look like this:

image

Step 3: Revel in your success:

You got it.  At this point, returning to your Transactions query should leave you pretty pleased, as we've plainly been able to successfully merge files with different column headers into the table that we actually want:

image

The only thing left to do is set the data types, and we're done. Smile

Reduce Development Lag

How we reduce development lag when building queries in Power Query is a question that came up in my blog post at PowerPivotPro last week, even though that wasn't the main issue.  I thought it might be a good idea to throw out a development methodology that can help with this if you're struggling in this area.

Also, if you haven't read "Why Excel's Power Query Refresh Speeds Suck" on Rob's site, please do, as it really highlights a key issue that can affect Power Query refresh times, and we really need you to vote it up.

Why do we need to reduce development lag?

This isn't a problem that affects small data sets, but the bigger the data set is, the more you feel it.  The issue comes up because - during query development - Power Query doesn't load all of the data into the local system.  (If it did, you'd REALLY scream about crappy performance!)  What it does instead is pull in a preview of the data.

The number of rows varies based on the number of columns and data types, but for now, let's pretend that the Power Query preview:

  • pulls exactly 1000 rows
  • your data set is 70 million rows

I do want to be clear here though… this issue isn't restricted to data sets of this size.  If you're consolidating 30 CSV files with 50k rows each (a total of 1.5 million rows), you're certainly going to feel this pain.  Why?

To illustrate this, let's go through a workflow:

  • You connect to the data set
  • Power Query pulls the first 1000 rows and load them to your preview
  • You filter out all records for dept 105, removing half the rows in your data set

At this point, Power Query says "Hey, you eliminated half the rows.  I'll go pull in some more for you, so that you can keep operating on a full preview window."  It goes back to the data source, and essentially streams in more data, tossing the records you don't need, in order to fill up the preview of 1000 rows.

Now you drop 20 columns, allowing the data set to expand to 1200 rows in the preview.  Naturally, it needs to go back to the data set, pull values again, run it through all the steps to date in order to land you 1200 rows.

Now to be fair, I'm not certain if something like a "Change Type" or "Replace Values" step causes a refresh, but it very well may.  On large data sets it certainly feels like it does.

If you're not nodding your head at this point, I can pretty much assure you that the data sets you've been working with are tiny.  The bigger those sets get, the more this hurts.

And before anyone says "why not just add a Table.Buffer() command in there?"… in my experience this doesn't help.  From what I've seen, if you are in dev mode, this causes the buffering to get re-executed, which just adds time to the process.

A Strategy to Reduce Development Lag

So how do we reduce development lag and avoid the time wasted when developing our query chain?  Here's a strategy that may work for you.  It's a 5 step process and requires Excel, although you could certainly develop in Excel then copy your queries to Power BI Desktop afterwards. The reason?  Power BI has no grid to land your data to…

Step 1: Connect to the Data Source

The first part is easy.  You simply connect to your data source, whether that be a file, database, folder, or whatever.  The key thing here is that your output needs to be "flat".  (No nested lists, tables, binaries or values in any columns.)  Each column that contains nested lists/tables or values needs to be expanded before you can move to step 2.

In the example below, you can see that I pulled data from a database which has related fields:

SNAGHTML22cc8944

As the related "Value" is a complex data type, it needs to either be removed or expanded.  In this case, I wanted some data from within the tblItems, so I performed the following steps:

  • Expanded the tblItems column to extract the tblCategories column
  • Expanded the tblCategories column to extract the POSCategoryDescription field
  • Removed the tblChitHeaders column

And the result looks like this:

image

And now, let's pretend that every step I add is causing the preview to refresh and taking ages to do so.  (Ages is, of course, a relative term, but let's just assume that it has exceeded my tolerance for wait times.)  So what can I do?

What I do now is:

  • Stop
  • Give the query a name like "Raw-<tablename>" (where <tablename> is your table.  Mine is Raw-ChitDetails)
  • Choose Close & Load To…
  • Load to "Connection Only"

Once done, I can build myself a temporary stage to reduce development lag during the construction stage.

Step 2: Reference the Data Source query and land to an Excel Table

To create our temporary stage, we go to the Queries pane, right click the Raw-ChitDetails query, and choose Reference.  (And if you've read the post on Why Excel's Power Query Refresh Speeds Suck, you're cringing.  Go vote if you haven't already).

When launched into a new query, you're going to do two things and two things only:

  • Rename the Query to something like "Temp-<tablename>" (again, where <tablename> is your table.  I'm using Temp-ChitDetails.
  • Go to Home --> Keep Rows --> Keep Top Rows --> 10,000

And now, go to Home --> Close & Load and choose to load to an Excel table.  This is your temporary data stage.

The key here… never do more than just the Keep Top Rows part.  You don't want to do any additional filtering or manipulations in this query.  You'll see why a bit later.

Step 3: Load the Excel Table to Power Query and develop your solution

Next, you'll need to click a cell in the new table, and choose to load from Table or Range.  This will create you a new query that you can start to manipulate and do the cleanup you want.

The beauty here is that the table you've loaded from Excel needs to be manually refreshed, so we have essentially frozen our data preview.  The preview will still get refreshed (we can't avoid that), but since the data has been loaded to a worksheet in Excel, Power Query will treat that table as the data source.

The reason we needed 10,000 rows is to have enough data to identify the data patterns we are working with.  If that's not enough because you filter out larger quantities, then up the 10,000 to something larger or filter in the Raw query (and suffer the lag time there) before loading to the table.

So this is the query where you do all of your development work.  With a smaller "frozen" data set, this should allow you to work while suffering much less refresh lag time.

As a sample here, I ran a Group By on the smaller data set.  Is that a good query to illustrate this? Probably not, I just wanted to show that I did something here.

image

So this query only has two steps, but hopefully you get the idea that this could be a few dozen (or more) steps long, each taking MUCH less time to update than working against the whole data set.

Quick note… I deleted the Changed Type step that is automatically added when pulling data from an Excel table.  I did this as the Changed Type step often breaks Query Folding when you pull from a database.

With my development work done, I've named my query with the name I want to use for the final load destination.  In this case it's "ChitDetails", and set it to Close & Load To… --> Connection Only and Data Model to load the table for Power Pivot.

Of course, this query is only working against my frozen 10,000 row data set, which isn't going to cut it long term, so I need to fix that.

Step 4: Re-point the working query against the Data Source query

With the query loaded, I'll go back and edit it.  The key is that I need to select the Source Step.  In the formula bar we'll see the code used to call from the Temp data table:

image

This formula needs to be updated to pull from the Raw-ChitDetails table, so I'm going to do exactly that:

image

Notice that the formula is actually:

=#"Raw-ChitDetails"

This is to deal with the fact that I used a hyphen in the name, so it needs to be escaped with the hash-quotes in order to read correctly. If you forget to do that and just type =Raw-ChitDetails, you'll be told you've got a cyclical expression.

At this point, you've done your development work more quickly, and have now pointed back to the original data source.  (This is exactly why we only restrict the number of rows in Step 2.  If we did more, we'd need to retrofit this query.)

When you click Close & Load now, it will load to the data source you chose earlier.  (Power Pivot for me.)

Step 5: Delete the Temp Queries and Excel Table

Notice that we actually have two Temp queries now.  The first was used to reference the Raw query and land the data in the Excel table.  The second was pulling from the Excel table into Power Query.  Both can now be deleted in additional to deleting the worksheet that held the temporary table as their jobs are done.

What if I need to do this again?

Easy enough.  You can simply:

  1. Create a new query by referencing the Raw table.  (Queries Pane --> Right Click --> Reference)
  2. Name the query Temp-<tablename> and Close & Load to an Excel table
  3. Select a cell in the table and create a new query From Table or Range
  4. Rename this new query as "TempStage" (or something) and choose to Close & Load To… --> Connection Only
  5. Edit your "final" query's Source step to =TempStage.  (You only need the #"" if you include a character that Power Query uses for something else.

And at that point you're back in development mode.

What if my Query is already several steps long?  Do I have to start over?

Absolutely not!  If you've been working away building a very long query, you can split them apart into the staging setup I used above.  Open your query and walk through the steps to find where you wish you'd broken them apart.  Right click that step and choose "Extract Previous"

SNAGHTML22ffbe84

You'll be prompted for a new query name, so enter "Raw-<tablename>".  What this will do is extract all the previous steps into a new query for you, and set your current query's Source step to point to that query.  Then you can follow the steps outlined in the previous section to get back into development mode.

What about Power BI Desktop?

The problem with Power BI desktop is that it has no grid to land the data to, which means we can't get that "frozen preview" setup.  But we can adapt the solution to use Excel if you've got Power Query in Excel. You do this by:

  • Creating the same "Raw Data" stage in Power BI Desktop
  • Reference the Raw Data query to get into your "Final" query

Then create your data stage:

  • In Excel, replicate the Raw Data staging query EXACTLY as it is in Power BI desktop
  • Filter the query to 10,000 rows
  • Close & Load to an Excel table
  • Save & Close the file

Back in Power BI Desktop

  • Create a new query to pull from the data in Excel
  • Repoint your "Final" query to reference the Excel query
  • Do your development
  • Repoint to the original query when done

It's a bit clunky, but if you're really suffering performance, you may find it worth it.

Should Previews Always Refresh?

One thing I've thought about setting up a UserVoice item for is a setting in the Excel/Power BI UI that allows me to turn off preview refreshes until I click the button inside the PQ editor.  My initial thought is that this would help, as it would essentially cache the preview, and we could operate on it without tripping a recalculation to slow things down.

Naturally, when we first create the query, it would need to create a preview.  But after that, I would hope that it just worked with the data loaded in the preview until I hit the Refresh Preview button.  That would be the key to re-execute all query steps to date - in full - to generate a revised preview.

Having said that, I have a feeling that this would be pretty complicated to implement.  Some steps (like combine binaries and grouping) would require the preview to be updated or you'd get nothing.  But others, like filtering rows out, really shouldn't.  It could require a significant change to the underlying architecture as well.

Regardless, something to "freeze" the preview would certainly be welcome as the development experience with larger data sets can certainly be painful.

Adding null to values returns null

Today a user brought up something in the forums I've seen before; adding null to values returns null.  Simply put, they wanted to add the values in two columns together, but didn't get the results they expected.

The issue: Adding null to values returns null

Have a look at the following data:

image

Never mind that it's pivoted, pretend that you want to use Power Query or Power BI Desktop to sum across the columns and put the total in each row. The problem, of course, is that we can't sum across columns, as Power Query doesn't do that.  So our first temptation is to reach to a custom column and use the formula:

=[ProductA] + [ProductB] + [ProductC]

But notice how on the second and third rows we get null instead of the totals we'd expect:

image

The issue is obviously the null, which totally screws up the math.  (This still surprises me, as Excel ignores null values, which is what I believe should happen here.)

My original suggestion (was poor)

My original thought (and recommendation) was to select the columns and go to Transform --> Replace Values and replace null with 0.  It seems to work in this case.

Having said that, there's a potential issue here.  Null and zero are not the same things at all.  A null is a effectively an empty data point, where 0 is a value. In a Power Pivot or Power BI model that could lead to reporting zero dollars of sales for a day versus reporting an empty set.  Are those different?  To me they are… empty means we weren't even open, where 0 means that we were and sold nothing.  Maybe it's subtle, but it's a real difference.

Another concern comes up when we are dealing with columns that have real values in them.  Simply put:

  • Average(null,null,null,1,2,3,4,5) = 3
  • Average(0,0,0,1,2,3,4,5) = 1.875

So while my answer worked mathematically for a SUM, it was in truth not a good one.  And, in fact, the poster replied as said that he couldn't replace nulls with zero, as this compromised his data.

The evolution of solution

Bill Sysyz dropped in and posted a bit of M code that would sum all records on the row, which look like this:

= Table.AddColumn(YourPreviousStep, "Sum", each List.Sum(Record.ToList(_)))

I think this is really cool, but it also had an issue in that it summed ALL records in the row.  But there are also columns that include text too.  Now we can adjust this to only pick up certain columns, but it involves writing some code that provides each column name in the List.Sum() function.

That kicked off a discussion between Bill and I about complexity, which ended up with my stumbling into a solution that worked.  And it's simple…

Adding null to values and returning values

So here's all we need to do:

  • Select the columns you want to sum (hold down CTRL to select non-contiguous columns)
  • Go to Add Column --> Standard --> Add

I did exactly that for the ProductA, ProductB and ProductC columns from the data above, and here's the results (compared with the custom column method):

image

And just for reference, here is the formula that was generated:

= Table.AddColumn(#"Added Custom", "Sum", each List.Sum({[ProductA], [ProductB], [ProductC]}), Int64.Type)

So pretty much what Bill provided with specific hard coded columns, and 100% user interface driven, just the way it should be!

Final thoughts

The amusing part to me about this is that I have no idea how long this has worked.  I stumbled on this solution during as I was typing "you should be able to just do this… " and it worked!  I've been working with this tool so long that I sometimes miss that some of the old gaps got filled in.  At any rate, it works, it's awesome, and hopefully it helps someone.  Smile

Protect Power Queries

How you protect power queries is a question that will come up after you've built a solution that relies heavily on Power Query, especially if you're going to release it to other users.

(This is a quick post, as I'm in Australia at the Unlock Excel conference, but still wanted to get a post out this week.)

Can you Protect Power Queries?

The answer to this is yes, you can.  It’s actually very easy, and prevents your users from not only modifying your queries, but adding new queries to the workbook as well. Essentially, it shuts the door on any additions or modifications to query logic, while still allowing queries to be refreshed… at least, it should.

So how do we Protect Power Queries?

To protect Power Queries we simply need to take advantage of the Protect Workbook Structure settings:

  • In Excel (not Power Query), go to the Review tab
  • Choose Protect Workbook
  • Ensure that Structure is checked
  • Provide a password (optional)
  • Confirm the password (if provided)

Once you’ve done this, the Power Query toolsets will be greyed out, and there is no way for the user to get into the editor.

image

Does refresh work when you Protect Power Queries?

This part kills me.  Seriously.

The answer to this question depends on whether or not you use Power Pivot.  If you don't, then yes, you're good to go.  As long as all your tables land on worksheets or as connections, then a refresh will work even when you protect Power Queries via the Protect Workbook method.

If, however, you have a single Power Query that lands in the data model, your stuffed.  If Power Pivot is involved, then the refresh seems to silently fail when you protect Power Queries using this method (and I don't know of another short of employing VBA, which is a non-starter for a lot of people).

It's my feeling that this is a bug, and I've sent it off to Microsoft, hoping that they agree and will fix it.  We need a method to protect both Power Query and Power Pivot solutions, and this would do it, as long as the refresh will consistently work.

Caveats about locking your workbook structure:

Some caveats that are pretty standard with protection:

  • Losing your password can be detrimental to your solution long-term. Make sure you have some kind of independent system to log your passwords so this doesn’t happen to you. And if your team is doing this, make sure you audit them so you don’t get locked out when as staff member leaves for any reason.
  • Be aware that locking the workbook structure also locks the ability for users to get into Power Pivot.
  • Workbook security is hackable with brute force macro code available on the internet for free. (Please don’t bother emailing me asking for copies or links to this code. I don’t help in disseminating code which can be used to hack security.) While protecting the workbook structure will stop the majority of users from accessing your queries, it should not be mistaken for perfect security.

Create Column From Examples

I knew that the create Column From Examples feature had been released to Power BI Desktop, but it just showed up in my Excel 2016 build today.  And man is it cool!

A scenario for Create Column From Examples

A user on my forum asked how to build a function that would extract the "Show Name" from the following format:

DV1511H, Episode Name ( Show Name, SeriesNumber)

As the user pointed out, this can be done via the following Excel formula:

=LEFT((MID(A1,SEARCH(" ( ",A1)+3,200)), (SEARCH(", S",(MID(A1,SEARCH(" ( ",A1)+3,200)))-1))

But how do you do something similar in Power Query?

My initial thought

My first expectation was to use one of the techniques from M is for Data Monkey, using the equivalents of Excel's SEARCH and MID functions, basically emulating the Excel logic.  And while that would totally work, I got distracted by something when I opened up my Excel today:

image

The new create Column From Examples button was there on my ribbon.  So naturally, I had to see what it would come up with.

How to create Column From Examples

Naturally, it starts with clicking the created Column From Examples button, which gives you two options:

  1. From All Columns
  2. From Selection

In this case, they would do the same thing, but I'm going to choose "Selection" anyway, as I only need to look at one column.  When I do, I get a new message across the top, and a new column.  I put in the pattern I wanted to get:

image

And once I hit Enter, it actually shows me the pattern it used:

image

As I'm happy with it, I click OK.  The formula that it provided, (which I passed on to the questioner,) is:

Text.BetweenDelimiters([Column1], " ", ",", 3, 0)

Potential Improvements for create Column From Examples

Let me first say that I think this is fantastic.  I would not have come up with this function on my own, as I didn't even know that this function existed.

Some things that I wish we could change though:

  1. The column is created with a generic name.  I really wish we could have changed this during the creation phase instead of ending up with the generic "Part of Column 1" text.  In order to fix this, we either need to edit the M code formula or do another rename step, both of which could be avoided if we could simply rename the column during the creation phase.
  2. There is no gear icon in the applied steps window to take us back into the interface.  I'm sure that would be really hard to implement, but if you mess it up today, well… delete it and try again.
  3. There is no way to copy the function during the creation phase, and with the gear icon not available, the only way to copy/change the formula is via the formula bar.  Not a big deal if you know your M code, but for a novice/intermediate user picking out the correct parts with all the commas, quotes and parenthesis here could be a bit tricky.

image

Overall, despite what I would change here, this is a fantastic new function that is going to make life a lot easier for people.  Very cool!

‘DIY BI’ e-Book Launches Tomorrow!

Last week I announced that we are working on a series of free 'DIY BI' e-Books.  We've been hard at work on polishing it up, and I'm pleased to announce that the first DIY BI e-Book launches tomorrow!  It will be emailed at 9:00 AM Pacific Time to everyone on our newsletter list.

Sign up to get the free 'DIY BI' e-Book series

If you haven't already, sign for our mailing list to receive your copy!  You can do so at the bottom of this post.

Creating the 'DIY BI' e-Book

I'm really thankful that I have a team of people behind me for this.  For me, technical writing is actually the easy part.  It certainly takes time, don't get me wrong, but the magic of copy editing, proof reading and graphic design is a whole other story.

Deanna has done a great job of proofing the book, and making me re-write any paragraphs that sounded good in my head, but maybe didn't translate so well beyond that.  And Rebekah has done a phenomenal job on the graphic design and layout.

Each book will be themed as shown below:

image

Blue for Excel, based on the Excelguru website colour scheme.  Dark green for Power Query (like the powerquery.training site), light green for Power Pivot (like the Power Pivot logo) and yellow for Power BI like it's colour scheme.

The 'DIY BI' e-Book Cover

We wanted to create a cool cover, but most of the stock images for sale out there have a Mac in the picture.  Since 3/4 of these technologies won't work on the Mac, that plainly wasn't something we wanted to put out there.  So that led to us staging our own photo shoot to generate our cover - which I'll admit is a lot harder than I thought it would be.  Here's the finished cover for the first 'DIY BI' e-Book.

image

Our next e-Book will use the same cover image, but will be themed in the dark green of the Power Query series.

And yes, before you all ask, that IS a Pie Chart in the bottom left. And no, I don't love pie charts.  But sometimes you have to have one, because your boss asks for it.  (Just don't expect to find one INSIDE any of the e-Books!)  Smile

Reserve Your Free 'DIY BI' e-Book Now

If you're already receiving out newsletter, there is nothing else you need to do.  It will show up in your inbox shortly after 9:00 AM Pacific time on Apr 7, 2017.  If you're not on our newsletter list yet though, just sign up. It's that easy!

Subscribe to our mailing list

* indicates required





Free ‘DIY BI’ e-Books

Today I wanted to just make a quick announcement that we are currently working on a series of free 'DIY BI' e-Books.

Free 'DIY BI' e-Books? Tell me more!

Over the past few years of working with Excel an Power BI, I've obviously picked up a few different methods, tips and tricks for working with the software.  And looking at how successful our free e-Book "Magic Tricks for Data Wizards" has been through the Power Query Training site, I thought it would be nice to so something similar for Excelguru readers.

One of the cool things about the Excelguru audience at this site is the diversity.  A lot of people originally came here for Excel, but we've been exploring Power Query, Power Pivot and Power BI for the past few years as well.  The one thing that ties us all together is that we are building "Do it Yourself Business Intelligence" or "DIY BI".

My original plan was to release one e-Book with 20 different tips, tricks and techniques; 5 each for Excel, Power Query, Power Pivot and Power BI.  After getting started, however, I realized that it was going to take me a bit longer to get that all done than I wanted.  But since I want to get information out to our readers, I've decided to break this down into four separate e-Books which will be collected under the umbrella of "DIY BI Tips, Tricks and Techniques".  Each e-Book will focus on one specific area of the DIY BI story.

What will the free 'DIY BI' e-Books include?

Well… tips, tricks and techniques, of course.  Smile  Okay, seriously, each is fully illustrated and written to give you some great examples and ideas that I hope will help you in your DIY BI journey.

Here is what is covered in DIY BI Tips, Tricks and Techniques for Excel:

  • The easiest formula to return the end of the month
  • Show a message when cells are hidden
  • Quick alignment of objects
  • Easy to read variances
  • Show a message if your Pivot data is stale

Sample image from DIY BI Tips, Tricks & Techniques for Excel

What areas will the free 'DIY BI' e-Books cover (and when will they be released)?

Those e-Books will be released in the following order:

  • DIY BI Tips, Tricks and Techniques for Excel
  • DIY BI Tips, Tricks and Techniques for Power Query
  • DIY BI Tips, Tricks and Techniques for Power Pivot
  • DIY BI Tips, Tricks and Techniques for Power BI

The first is already written, we just need to lay it out and make it look a bit more awesome.  Our target is to get it released by the end of next week.

With regards to the rest, I'll go as fast as I can on them, but as you can imagine, doing things right does take time.  I would expect that each will take 2-3 weeks to build out properly, but if I can get them out faster I most certainly will.

Am I going to need Excel 2016 to get value from the free 'DIY BI' e-Books?

No.  While I highly advocate being on a subscription version of Excel 2016, you'll find content in each of the first three e-Books which can be used in prior versions of Excel.

How do I receive the free 'DIY BI' e-Books?

You sign up for the Excelguru newsletter.  It's just that easy.  As soon as each e-Book is finished, we'll be emailing it to everyone who is currently subscribed to our newsletter.

And in the mean time, you also get a monthly email from us which now includes news about the latest updates to both Excel and Power BI.

Longer term, once all four e-Books are written, any new subscribers will receive the first e-Book upon signup, and then the next in the series will arrive every couple of days until you have the full set.

So what are you waiting for?  Sign up right here and don't miss out on free DIY BI Tips, Tricks and Techniques for your work!

Subscribe to our mailing list

* indicates required





Calculate Start and End Dates

I got an email from a reader this morning who asked how to calculate start and end dates for a given employee when they have had multiple terms of employment.  Since it's been a while since we've had a technical post on the blog, I thought that this would be a good one to cover.

The Challenge

In this case (which you can download here) we are given the table shown below on the left, and we need to create the table shown on the right:

image

As you can see, John's start date needs to be listed as Jan 1, 2013 and his end date needs to be listed based on the last date he worked here; Oct 31, 2016.

How to Calculate Start and End Dates using Power Query

My first thought was "we'll need a custom function to do this", but as it turns out, there is a a MUCH easier way to accomplish this, and it's 100% user interface driven as well.  I'm virtually certain that the performance will also be much better over larger data sets as well (although I haven't specifically tested this.)

Let's take a look:

Step 1: Connect to the data

This is pretty easy, just select the table and use Power Query to connect to the data:

  • Excel 2010/2013:  Power Query --> From Table
  • Excel 2016: Data --> From Table/Range

We'll be launched into Power Query and will be looking at our short little table:

image

Step 2: Calculate Start and End Dates via Grouping

The trick here is to actually use Power Query's Grouping feature to calculate the start and end dates.  To do this:

  • Go to Transform --> Group By

The dialog will open and is already offering to group by Name, which is what we need.  Now we just need to select the grouping levels.  The first is going to be our Start Date, so we'll rename it as such and change to calculate a Min of the From column:

image

The effect here is that this will provide the lowest value from the "From" column for each employee.  A perfect start.

Next, we need to add a new grouping level to get the End Date.  To do that:

  • Click Add Aggregation
  • Configure the new column as follows:
    • New column name: End Date
    • Operation: Max
    • Column: To

It should look as follows:

image

And believe it or not, you're done!

image

How this works

The key here is that the grouping dialog in Power Query works for all records in the group.  This is really important, as the first column has no bearing on subsequent columns… if it did, we'd get the max for the first record, which is not at all what we'd be looking for.  Instead, the Group By will restrict to find all records for John, then will pull the Min and Max out of the remaining three rows, returning those as the values.

The other thing that is worth noting here is that the order of the source data is irrelevant.  We could have provided either of these options and the answers would still have been calculated correctly:

SNAGHTML19ec2a56

Final Notes

It's also worth mentioning that this technique to calculate start and end dates will also work in both Excel and in Power BI, as the feature set is identical between the two products.

Sometimes things that look hard, are actually really easy when we have the right tools in our hands, and this happens to be one of those situations.  Smile

Using Power BI When You’re Not a Data Person

This week's blog post on using Power BI when you're not a data person is a guest post by Rebekah Sax of Excelguru Consulting Inc.

Last week I participated Excelguru's first ever "Data to Dashboards" Power BI Boot Camp. Seventeen of us lucky attendess were introduced to the incredible power of Power BI. Just to be clear, I am NOT a data person. My background is in marketing and communications, which is the main focus of my role here at Excelguru. So I was a bit nervous about being the only participant that was not in finance/accounting, an Excel jockey or a data analyst of some sort. Would I be able to keep up with the class and learn this brand new platform? How could someone go about  using Power BI when you're not a data person who usually works with numbers?

Welcome to Power BI

The boot camp consisted of three jam-packed days of hands-on exercises led by Ken Puls. We started off nice and easy with creating some basic visuals we are all familiar with, such as map, bar and column charts. By the end of the first day, we had learned how to build reports and dashboards using multiple data sources, and how to publish and share them. Later, we began writing DAX measures for filtering and shaping our data. This is where I learned that I CAN CODE! However, there is also a great feature called Q&A. This is a "query interface" which allows you to pull insights from your data using regular English, no coding required. For example, you can type in "number of locations" or "total sales in May" and Power BI will build you a basic visual that shows the information, pulled from your source data table.

For the last part of the day, we had fun playing around with cool custom visuals from the Power BI Custom Visuals Library. These are a great way to tell a story with your data by making it more visually interesting and engaging. Some of the options are a little whimsical. I mean, I don't know why I would ever need to represent data as fish swimming in a tank, but I could if I wanted to.

Power BI Custom Visuals Library

The number of custom visuals available in the Power BI library is really impressive, and many of them are free.

Cleaning and Working with Data

Our second day began by looking at the ways Power BI gets, transforms and loads our data using Power Query. This amazing tool is also used in Excel, so the techniques we learned are transportable between Excel and Power BI. I'm glad I had been previously introduced to Power Query, but some things were still challenging. For example, it took a while to wrap my head around Power BI's six different types of joins for merging two tables together. Luckily, I can use the copies of the example slides to use as a reference later on.

Inner Join

The "Inner" join type in Power BI. This join gathers only records which exist in both tables.

It was interesting to learn different techniques for breaking apart and recombining data from various sources into one cohesive data set. We also dove into formulas using M code (the Power Query language) and learning intermediate DAX measures (yikes, more coding!).

The Loaded Pencil

What really blew me away was when we started applying these measures to sales data from a fictional restaurant called the Loaded Pencil. First, we set up a calendar table and used the Calculate measure to start pulling out key metrics such as sales for a particular period. Another measure allowed us to compare to sales for the same period the previous year, quarter or month. Using different visuals, we easily filtered the data to view specific sales details. Our report even allowed us to look at the broad categories and then drill down right into specific items. We then set up cool visuals displaying food and alcohol sales as compared to budgeted goals. Now this was something I could sink my teeth into from a marketing perspective. I began to see real-life applications for using Power BI when you're not a data person!

The Loaded Pencil Sales Report

Sample visuals looking at the sales data from our fictional restaurant, The Loaded Pencil.

Taking It Mobile, RLS and Developing Like a SQL Pro

The first part of day three was really fascinating, where we focused on making our reports and dashboards mobile-friendly. Essentially, we built a mobile app displaying the outlets of a well-known coffee shop chain. It showed all the locations around BC and even let us target specific regions or cities, so you could find where to get your next caffeine fix. (Unfortunately, star points aren't tracked on this app, so I'll keep using the official one.) I could really the possibilities of mobile-accessible reports for a team with members regularly out of the office on site visits, client meetings, sales calls, etc.

Making Data Mobile

An example of our coffee shop Power BI Desktop report and the mobile-friendly version. Note that on the mobile version, the data set has been filtered to show just the Lower Mainland locations.

While the topic of Row Level Security are not something I need to worry about in my role, it was interesting to see the different methods that they can be created and applied to restrict the information shown to only that which a reader is entitled to see.

Using Power BI When You're Not a Data Person

One of the things that sticks out the most for me was chatting with another participant at the beginning of day two. She was so excited that she was already able to start using some of the stuff we learned the day before to her job. I have to say, my head was full to bursting by the end of the boot camp, but I did gain a few big insights:

  1. Using Power BI when you're not a data person to create a report can make it easier to present and understand the key information. This is especially true if you are like me and get overwhelmed by row upon row of numbers as on a traditional spreadsheet.
  2. Because you can move from the "big picture" down into specific detail with just a few clicks, you can make better connections and insights. You may even be able to pick up on something that has gone unnoticed but is having an impact.
  3. Being able to access precise, up-to-date information while on the go is huge. Imagine being able to answer a specific question right on site, without having to go back to the office and calling or emailing your client later.

Just remember, you don't have to already be an Excel expert to be able to use Power BI. It's certainly complex and can seem daunting, but the instruction here was fantastic and - if there's stuff that wasn't covered - it's great to know that there is also passionate community always willing to help someone trying to learn.

Power BI Boot Camp

I'm pretty excited to announce that we have our very first Power BI Boot Camp coming up in Vancouver, BC on Feb 22-24.  This is something that I've wanted to do for a while, and it's going to be awesome!

image

The boot camp is going to be 3 full days of hands on Power BI, and is intended to get you up and running, building cool solutions that will impact your business.

Why do you care about the Power BI Boot Camp?

Okay, I get it… you're an Excel person, and while you've heard of Power BI, you're not quite sure if you need it or not, and certainly don't know if you can justify the cost of a 3 day Power BI Boot Camp...

I honestly believe that the answer to both questions is an emphatic yes!

Top 5 reasons you need to learn about Power BI:

We'll cover the "how do I convince my boss" near the end of the post, but first, let's talk about what Power BI is, and why it is relevant to you.

  1. A major reason that companies are behind in their Business Intelligence development is that they are using outdated versions of Excel and cannot (or are afraid) to upgrade.   Power BI Desktop is a separate program that sits completely outside Excel, meaning that you can run the most up to date business intelligence tool DESPITE the version of Excel you're stuck on.  I'm looking at you Excel 2003 & Excel 2007 users. Winking smile
  2. You can do a TON of stuff with Power BI for free.  Yes, there are services that cost a nominal fee, but you'll be surprised at how much you can do for absolutely nothing.  The investment you make here will pay back in spades as you begin to automate tasks that were taking you huge amounts of time, and get the ability to deliver true business intelligence to your stakeholders.
  3. Power BI's data collection experience is based on Power Query, and it's modelling experience is based on Power Pivot.  So if you have those skills from your Excel journey, they will be totally relevant in Power BI.  And if you don't?  The new skills you learn at this course related to these areas are portable back to Excel as well.  (So again, if you're stuck on an older version of Excel, this is your chance to upskill BEFORE your Excel is updated.)
  4. If you've ever tried to share Excel dashboards, you know how hard it can be to do so and maintain them.  And if your end users want their reports accessible and interactive on mobile?  Forget it.  But Power BI solves these issues, including delivering interactive reports to your mobile phone (yes, even your non-Windows phones!)  Rest assured, however, that if your company is cloud averse, this tool is not dead to you, and it's still worth coming.
  5. You'll be learning in a small class environment from a world class expert who cares about delivering value.  I've been there.  I know how hard your job is.  And I want to help you become more effective.  That is what my company is all about, and I'm very serious about it.

What will the Power BI Boot camp cover?

To be completely honest, the better question is probably "what won't we look at?"  I've got all kinds of stuff cooking for this.  When I first started drafting the Power BI Boot Camp outline I was wondering how I'd ever fill 3 full days.  Now I'm trying to figure out what to cut so that I can fit it all in.  Keep in mind that it's not going to necessarily be in the order below, as some topics are MUCH bigger than the summary I'm providing, but here's a bit of an overview…

Where it all starts…

We'll start by building a basic Power BI solution built on an Excel file.  Some minor data modifications, some cool visuals and then publishing it to the cloud service, we'll walk through the most basic of Power BI journeys.  I want to do this so that you can see just how easy it is to build a cool solution that works and interacts, like this one.  (Go ahead... click any of the bars and see how it cross filters in your web browser!)

More data, more transformations, and some modelling…

From there we'll start to dive deeper, teaching the techniques that allow you to build more robust models and solutions.  We'll spend a bunch of time looking at different data sources, as well as some of the more complex methods for fixing bad data to make sure that you can use it in your solutions.  We'll also show you how to link these tables together so that they filter nicely across multiple visuals.

image

Let's get visual…

Power BI is all about getting visuals with your data.  It has a big collection of visuals, as well as an ever growing gallery of custom visuals as well.

While I won't promise that we'll look at each different visual in the product (we only have 3 days), we are going to look at a lot of them, including some of my favourite custom visuals.  I want to show you how to create them, format them, and what data you need to get the best of them to work.

image

We're building dashboards to tell a story here, so I want to try and get that as close to real life as possible for you.  From bar, column and line charts, to maps and bullet charts, I really want to get you comfortable with these, as well as how to control the interactions between them.

And speaking of Dashboards… do you know the difference between a report and a dashboard in Power BI?  That will get clearly explained here as well.

Getting the numbers right…

We'll get into working with the DAX formula language, and understand how to add and override filter context to make your formulas show exactly what you want when you filter any visual.

How about Calendar intelligence?  I have a huge passion in this area too.  The Power BI Boot Camp won't just teach you how to build your own calendar table, you'll also leave armed with a collection of DAX calendar intelligence measures.  Those measure patterns are super important as they'll allow you to report results correctly when you ask for Month to Date data or Month to Date for 2 months ago.

image

Sharing is caring!

We'll look at sharing via the web first, both for internal and external users on an invite-only basis.  But even more, I'm also going to walk you through embedding your reports in a web page so that you can show the world how awesome your company is.

In addition, you'll want to bring your phone with the Power BI app installed.  Why?  Because I'm going to make sure you get set up with your reports and dashboards delivered to your phone in a mobile optimized fashion where you can even annotate your reports by hand.

image

And if you'd prefer not to be cloud based?  We'll look at creating Power BI templates that you can distribute within your organization as well.

You know… while we're talking about sharing… have you ever sent a report to a manager only to have them come back with a one of those "can you show me this?" type questions?  Let's try and solve that little problem by allowing our manager to ask their own questions using natural language queries.  In other words, how cool would it be if your manager could just type "Bar chart total stays by city" and it did it, even if you've never built that report?  Because they can.

image

And what about updates and security?

Wait… how do I keep this stuff up to date?  And how do I ensure that only the right people are looking at their data?  Well of course we'll cover those topics!

Not only will we look at scheduling refresh for cloud hosted data sources, but also your on-premises data sets via gateways and 3rd party applications.  Ooohh… connecting data to the cloud… can I hear anyone saying "Security!"  Well yes, so let's talk about both encryption and also how to restrict the data sets to show only relevant rows to the correct people.  (And let's face it, you will want that last part even if you keep your data local!)

So why Ken's Power BI Boot Camp then?

My goal is to deliver real value to you.  Anyone who has been to any of my courses in the past knows this:

  • It's virtually impossible to find someone who has more passion for what they teach than me
  • We limit the class size so that it doesn't get too big.  A huge part of the reason for this is because…
  • My style is to roam the room and help people keep up with the content so that everyone learns
  • I provide completed examples for all the work we go through, with catch up points along the way, just in case the above isn't achievable
  • I provide a ton of supplemental resources as well

I've been to classes before, and I know that it can be somewhat difficult to apply the course materials to your own work.  For that reason the Power BI Boot Camp is designed like all my courses: We will build solutions like you'd expect to do so in the office, exposing the pitfalls where the software doesn't work as you'd logically expect.  This is intentional, as we teach you not only how to do things correctly, but also how to debug problems as they happen.

What skill set do you need?

Do you need experience working with Power BI in order to come to the Power BI Boot Camp?  Heck no, that's what this course is about!  And while some Excel knowledge is helpful, you're certainly not going to see a test on Excel functions here.

Ultimately what you need is a hunger to learn; a desire to change your reporting world.  If you can bring me that enthusiasm, I'll teach you what you need to know in order to make that happen.

Making the Case to Your Boss

I managed an accounting department in my former life, so I totally get this.  We're all about data here, so here's the up front details:

The cost of the course is $1,495 plus GST.  It includes your breakfast and lunch all 3 days, but you need to look after your dinner and hotel room (if necessary).  So now you can work out the total cost.

I'll let you do the math and work out how many hours per week you need to save yourself in order to break even.  If you spend a significant amount of time cleaning up data to be used in Excel, you'll find that you'll break even on this alone.

Honestly though, the true value proposition of this course isn't the ability to break even on a labour basis, it's about the opportunity costs.  Opportunities like:

  • Identifying programs that are costing your company money which don't earn returns.
  • Finding the best paying leads to pursue for greater sales.
  • Delivering value to your stakeholders when and where they need it.
  • Reducing inventory during slow seasons to reduce carrying costs.
  • Highlighting key metrics that are under performing.
  • Building solutions to blow the minds of your clients or stakeholders.

Every business is different, but ultimately it's the opportunity to get better information into the hands of those who make the decisions, be it you or your boss.  That benefit FAR outweighs the price we charge, and will break even for you VERY quickly.

I hope to see you at the course.  It's going to be awesome, and it will change your data life forever. 🙂

Register for the Boot Camp here.