Analyzing text with Power Query and Excel

We all know how good Power Query is for cleaning, shaping and analyzing numerical data. But this powerful tool opens doors for analysts that had been closed for long time, such as for analyzing text. In this article, I will give you some ideas how Power Query can be used for analyzing text that is unstructured. This is a new area for me as well so I’ll be grateful for any comments, thoughts, ideas and shared experience that will help to elaborate further on the methodology.

Our scenario

First, let’s put things in a context with simple but realistic example. Assume you are the newly appointed Marketing Manager of a 4-star SPA hotel. You want to analyze reviews about the hotel in sites like booking.com and tripadvisor.com to understand the major service issues. Thanks to Power Query, you will be able to extract, clean and shape data from those sites to receive a nice table like this:

Comments table

* Table contains ~300 real opinions for several Bulgarian SPA hotels in Hissarya, Velingrad, Sandanski and Bansko. Source: booking.com and tripadvisor.com

But how can you get some insights from this data? Obviously you will spend lots of time to read all reviews. Even if you do that, you need to summarize the information somehow. So let’s play a little within Power Query and see how it can help us analyze text.

Preparing data for analysis

First, we load the table in Power Query (From Table) and clean it of punctuation symbols like commas, semicolons, dashes, brackets, etc., as well as replace capital letters with lower cased one. Why do that last one? Because we want to count the most frequently used words and for Power Query ‘hotel’, ‘hotel,’, ‘Hotel’ are different words.

The cleaning can easily be done with tools like Replace Values, Trim, and Lowercase. I believe they are well-known to average user of Power Query so will I skip detailed operations. Instead, here some lessons learnt from practice:

  • First, duplicate the Opinions column so you can have original text.
  • Then, replace all punctuation symbols with a space.
    • Later we will use a period a separator to split opinions into sentences. And since a sentence can finish with exclamation mark or a question mark it is useful to replace ‘!’ and ‘?’ with ‘.’
  • Since this is a hotel, clients may use the symbol ‘*’ in their opinions (i.e. ‘nice 4* hotel’ or ‘definitely not a 4* hotel’). So don’t replace this symbol.

Above points are nice examples that when analyzing text or other data you need to do it in a context and should explore it first, but that is another story.

Here are the steps applied so far. As a final touch the text has been trimmed and cleaned.

Steps taken to clean data

Separating each comment

Our next step is to split the column using a period ('.') as a delimiter, and doing so at each occurrence of the delimiter.

Split columns by delimiter

Power Query creates a number of new columns, each one containing a sentence. I bet you are guessing what will follow. That’s right – select the original two columns and choose Unpivot Other Columns.

Unpivot other columns

After removing the Attribute column and filtering out rows that are blank, our table has a column for the original comments (Opinion) and a column for each sentence contained in the original comments (Sentence). Please be patient here, and you will understand why this is important.

Table containing full comments and each individual sentence

Let’s now duplicate the Sentence columns and repeat the above actions: split each sentence into its individual words, then unpivot. Below is the table after removing the Attribute column and filtering out blank rows:

Table containing full comments, each individual sentence, and each word

As a next step I will add custom column with a simple formula that calculates the length of each word. We can use it to remove common words that bring no value to our analysis – ‘a’, ‘the’, ‘is’, ‘or’, ‘but’ etc.

Custom function to calculate the length of each word

Finally, filter to exclude words that are 1, 2 or 3 letters long. We will use the resulting table as the basis for analyzing text.

Table filtered for words greater than 3 letters

Finding Lookup words

As a first step in analyzing text, let’s Duplicate the query (you may use Reference as well).

Duplicating the query

In the new Query, simply use the Group By function on the Words column to count the rows.

Grouping by the Word column

Thus, we will have a list with all words (above 3 letters long) and how many times they have been used. For convenience, we can sort this and then load it into Excel.

Word count table

Let’s take a look at the list. Obviously it contains words like ‘with’, ‘this’, ‘have’ that we need to ignore because they could be used in many different situations. On the other hand we see ‘good’ and ‘nice’ in the top 20. Those words are highly subjective – what is good for me might not be good for you. Besides, we don’t know whether the reviewer said ‘good food and nice service’ or ‘the food was not good and service far from nice’. So let’s skip them for now.

Apart from the above, we see that customers mention most often words like ‘room’, ‘pool’, staff’, ‘food’. These are our words of interest (I call them ‘Lookup words’). We may now filter all sentences that contain these words, allowing us to read much less information compared to original list of opinions.

Honestly speaking, if you have enough experience or even some common sense, you can skip this step. I mean it is more than expected that ‘room’, ‘staff’ and ‘pool’ are most often mentioned in reviews for a SPA hotel. However, identifying lookup words might be needed for cases like analyzing text of more open-ended questions.

Filtering Lookup words

So far we have identified our words of interest and listed them in a table:

List of lookup words

So how do we filter all sentences containing these lookup words? We can make it dynamic by creating a custom function. The process is similar to one used to load data from multiple internet pages, as described in M is for (Data) Monkey.

First, we create a query to Filter Rows of the Word column that contain one of our lookup words, ‘staff’ for example.

Filtering for a look up word

Then we go to Home > Advance Editor and change the M code a bit as shown below:

Updating the M code

After we create the function, we rename it as something more user-friendly (I used fxFilter).

Then we load our LookupWords table into Power Query (From Table):

LookupWords table

Next, we Add Custom Column. (Note: in Excel 2016, you have to Invoke Custom Function.)

Adding a custom column

And when we expand the table, we get all rows that contain the words from our lookup list.

Identifying Connected words

We now need to split the Sentence column again to identify the words that were most often used in sentences containing our Lookup words. I call these words ‘Connected words’.

There are some final steps such as eliminating words that appear rarely (i.e. less than 5 times), removing duplicate rows, rearranging columns etc. that I will skip.

Our final table is on the left below:

Final table for analyzing the comments

Analyzing text

As you see, I added two slicers to our table - one for our Lookup words and one for our Connected words. By using them in combination, we can get more insights quite easily. For example, these are the words used most frequently together with ‘staff’:

Slicers used to identify comments regarding staff

Here, I have filtered for the Connected word ‘friendly’, which gives us just 10 rows to read:

Analyzing text for the word 'friendly'

In a separate column, we can manually type “Y” for each positive opinion and “N” for each negative. Now we can calculate that 30% of mentions are actually negative. This is a simple way to quantify and measure our data. And it takes just few minutes to find out that the picture is similar with ‘helpful’ and ‘polite’ – positive feedback is prevailing.

When we filter ‘English’ it is easy to see that 8 people mentioned staff speaks no or poor English. That directly calls for some action - we need to train our staff in English. If you were reading through 300 opinions, you might miss this quite easily.

Analyzing text containing 'staff' and 'English'

It takes just few clicks and couple of minutes or reading to find out that clients are happy with the breakfast:

Analyzing text for comments on breakfast

But are disappointed by the prices and quality of food in the restaurant:

Analyzing text for comments on the restaurant

Of course this is just a starting point and each of the above observations needs further investigation and analysis. But the key takeaway here is that we can successfully use Power Query to summarize unstructured text data. We can then focus our attention on sentences that contain the most commonly used words instead of reading the whole bunch of reviews.

Applying the technique

Many other ideas come up into my mind of how we can use this for further analyzing text data. We can use the Stayed column to study trends in time, we can quantify some key words (as shown with 'friendly') or make a word cloud showing our top 20 words. What are your ideas - how can we use Power Query and Excel to analyse unstructured text?

Nuthin’ ain’t nuthin’ in Power Query

There are two kinds of nuthin' in Power Query: null, and blank. I tripped on this issue the other day, and Ken thought it would be a good idea for a blog post.

Let's just call out the two types of nuthin' in Power Query:

  • null is literally "no value" for any data type from text to table.  In other words, the cell is completely empty.
  • A blank also has looks like "no value", but it is equivalent to a cell formula of  ="" in Excel.  In other words, the cell holds a value that renders as blank.

Why is this important?  It's because, inside Power Query (and indeed many programming languages) null and blank are not equal!  And it turns out that nuthin' matters more than getting the right nuthin' in Power Query!

Burned by nuthin' in Power Query

Some time ago I built a set of Excel Power Query transforms which report on data extracted from a client system. For some time the client data has been extracted into Excel files, but there were some problems. My solution was to extract the client data files as CSVs instead.

What I found, unexpectedly, was that blank data values are treated differently by the PQ import functions depending on whether the file being imported is an Excel file or a CSV file!

Here's the rules:

  • For an Excel import, blanks are converted to null - always.
  • For CSVs, blanks are imported as blanks (not nulls). But when a field type is changed in a Power Query step the numeric and date column blanks are converted to nulls, and text column blanks remain as blanks (not nulls)

Seeing nuthin' in Power Query

What does it matter?  Have a look at the following cases...

1. Excel data with blanks loaded into PQ. Blank cells are imported as null.

2. CSV data with blanks imported to Power Query: Blanks are read as blanks.

3. CSV data with blanks: dates and numbers change to null after type change. Text blanks remain blank

The impacts of nuthin' in Power Query

Nuthin' in Power Query could have caused me more issues here... my transforms used conditional columns to check for nulls in text columns, a test which failed when the value is blank.

As Ken pointed out, another key issue is that the Fill Down and Fill Up functions in Power Query are used to fill null values. But blanks are not nulls, so the functions do not work as intended in my case.

How to deal? Caveat Emptor! It appears Power Query treats blank data differently in the CSV import and Excel import functions.

So, if you are building transforms based on a consistent source of imported data, then there is little impact. But if you should need to change the type of data source like I did then beware. I had to do some detail testing, and I was looking to re-write some longstanding and well tested transforms.

Instead, for my specific situation, I added a value change step to the CSV import in PQ to replace nuthin' (blank) with the keyword null for the whole data set. That allowed me to maintain my downstream logic, but cost me some processing speed.

I hope this alerts PQ users to a potential issue in their transforms. If there are any alternate solutions to the problem let's see them in the comments.

Power Query Recipes Now Available

As you may have heard, we have been working on a set of helpful Power Query Recipes for Excel and Power BI and are excited to announce that they are now officially available!

Are our Power Query Recipes for you?

The Power Query Recipes are targeted at people who are familiar with the Power Query interface (in either Excel or Power BI), and will lead you step-by-step through the process needed to clean up and convert your data from one format into another.  And if you're not already comfortable with Power Query?  Consider joining the Power Query Academy so we can change that!

So what is in the Power Query Recipes package?

There are currently over 30 cards in the set, showing easy-to-follow steps that will deal with a variety of common data issues.  I personally am finding them super useful, and often refer back to them when I'm helping people clean up their data in person or in forums.

We've even marked each of the Power Query Recipes that has a video version in Power Query Academy.  If you are already subscribed to our Academy, simply click the video camera in the bottom corner, and it will take you straight to the appropriate video so that you can see the technique demonstrated in a live setting.

Are there samples of the Power Query Recipes?

Of course there are!

Here's an example of one of my favorites, which lays out how to create Full Anti Join, something that does not exist in the regular Power Query user interface:

image

And another which shows how to create a Calendar Table on the fly:

image

We actually have a sample package available which includes four recipe cards (including the precedent card for the calendar recipe shown above), as well as the full table of contents for the current version.

Just some of the patterns included in the full version are:

  • Splitting data into new columns or rows
  • Pivoting, unpivoting, and transposing data
  • Several different ways to merge tables (including the Full Anti Join shown above)
  • Extracting a new column based on values in the prior or next rows
  • Six different ranking methods
  • Creating dynamic calendar tables
  • Adding a random number to all rows

Will there be more Power Query Recipes?

Oh yes!  While there are already over 30 recipes in the set, we already have another 20 on our "to-do" list, and are adding to that list regularly.  Our intention is to release new recipes on a regular basis, putting even more Power Query resources at your fingertips.

How do I get my copy of the Power Query Recipes?

There are a few different options:

  1. If you are a Power Query Academy member*, the recipes are already included in your registration.
  1. Pre-order a copy of the new 2nd edition of our M is for Data Monkey book from Amazon by September 30, 2018 and receive a FREE set of the initial  recipes.**
  1. Purchase the current stand-alone set of recipes from the Power Query Training web store for $24.95 USD. This is also an option to add a subscription ($2.95 USD every 3 months) so that you get all the new recipe cards as we expand the set.

* Please note that the recipe cards are not included in the free Power Query Academy trial, only with the full course.
**Upon receipt of your Amazon proof of purchase being received via email at support@powerquery.training, we will provide a coupon code for $24.95 which can be applied to either the stand-alone or subscription purchase.  Act fast though, as this offer ends on October 31, 2018.

Disaster Recovery in Puerto Rico with Power Query

When Ken was at the Microsoft Business Applications Summit a few weeks ago, he met Mr. J.A. Garcia who has been doing some amazing work with Power Query. We wanted to share his story about how he has been using Power Query in helping with disaster recovery efforts in Puerto Rico:

"[In] my line of work there's been two defining moments that have changed the way we look at our tools. The first one was the Zika outbreak and the second one was Hurricane Maria.

The first time I saw Power Query was [as part of] Power BI during the Zika outbreak [in 2016]. One of our clients needed up-to-date information of the Zika outbreak and its effect on healthcare. With the help of a consultant, we started using Power BI and Power Query.

Aedes aegypti mosquito

An Aedes aegypti mosquito, one of the main transmitters of Zika virus.

I began taking courses during that time, and one of them was about Excel. That's when I learned about Get & Transform in Excel 2016.

Any new job that I received, I tried to use Power Query. I taught myself SQL so I could understand better the process of extracting data and how to integrate it into Power Query.

Our job was changing. We could give the tools to our clients that would let them refresh when they needed it the most. No more waiting [on] our area for a data refresh!

Then Hurricane Maria hit Puerto Rico [in September/October 2017]. It was a harsh two weeks of no communication. As soon as I came back from work, I noticed the change in attitude. As a healthcare company, we began doing Public Health.

Hurricane Maria - Disaster Recovery with Power Query

Hurricane Maria is regarded as being the worst natural disaster on record to affect Dominica and Puerto Rico and the deadliest Atlantic hurricane since Hurricane Stan in 2005.

My main job was identifying members with certain serious conditions. I used Power Query and Excel to create processes that obtain information from the assessment done to keep track of the efforts of the company. The clients could refresh the data and see who was missing, fix any data entry errors and more.

I'm very proud of my work, and Power Query in Excel and Power BI has been a large part of my growth. In the present, we have created a tool that refreshes constantly to help identify members with serious conditions. Now in case of any emergency, we'll know who to attend."

~ J.A. Garcia

We were very inspired how Mr. Garcia began is Power Query journey as part of the disaster recovery efforts after these emergencies, and that he and his team continue to leverage this powerful tool in both Excel and Power BI. Power Query really can help save lives!

Do you have an story to share about your Power Query journey? Maybe it hasn't saved your life literally, but perhaps it has saved you hours of time and effort, a significant amount of money, or even your sanity! Let us know in the comments below or contact us through the Excelguru site.

Update on the Master Your Data Book (Data Monkey v2)

Miguel and I were at the Microsoft Business Applications Summit last week, and we were frequently asked for an update on the Master Your Data book (aka M is for Data Monkey version 2.0). We were told that it’s time. People pointed out that they had pre-ordered it on Amazon ages ago. Enthusiasts asked why we don’t have a subscription model with monthly updates like Power Query does.

Master Your Data Book Cover

There’s a hunger to see the new version. We’re flattered that you rely on us, and honestly, we’re gutted that you are still waiting for it. And after attending the summit, we know that we need to give you an update on the Master Your Data book.

Some background on publishing…

Before we tell you what’s happening, I’d like to just explain a bit about the back story on what we have to consider when we write books on technology. The primary factors are:

  1. How to fit it in with our schedules. Even over the long term, books don’t come close to earning anywhere near the financial rewards of just dedicating time to consulting projects. (This is a big factor in the subscription question.)
  2. In today’s world of constant updates, we know that there are new features added on a monthly basis. The question is, which ones are serious enough to cause us to delay the release?

If you look back at M is for Data Monkey, we are really proud of its long-term value and continued relevance. It kills us that merges aren’t in there… they came out a few days after the book went to print. Would we have held the book for them if we had known? Yes. Would we have held the book for conditional columns? No. This is just one of the kinds of decisions we have to face.

No matter which way we go, we’ll always wish we waited for the next great feature. And we can’t. We know that. But our goal is to make sure that the material inside the book stands the test of time well and continues to hold relevance as features are added and changed. We believe that we’ve managed to do that with M is for Data Monkey fairly well, all things considered. Are there easier ways to do some things today? Yes. But does the book give you a deeper understanding and still let you accomplish the same goals? We believe it does.

The factors that lead to delays

Features are one thing. They generally add new functionality. But User Interface changes are something else entirely…

Earlier this year, we made the call to delay the Master Your Data book in order to get a clearer picture on what Excel 2019 was going to look like. We needed to know which Power Query features would be there, and which wouldn’t. It just doesn’t make sense to publish a book around the same time of Excel 2019’s release with Excel 2016 screen shots. We’ve already been told that some stuff in M is for Data Monkey looks “dated”. We certainly didn’t want the new Master Your Data book to be “dated” on the day of release due to a User Interface change.

And now, at the Microsoft Business Applications summit, we saw a preview of what is targeted for release into Power Query in the next few months. These new features are significant, they are impactful. But most relevant to us is that they contain a significant change to the Power Query User Interface. They will affect every single screen we use. They will affect every single screen shot we take. And if we don’t wait, we will deliver to you a brand-new book that has pictures that don’t look anything like the User Interface you see on screen. Even if we were to push material to the publisher today, it takes 2-3 months to get the book to Amazon, so best case, you get 2-3 months' use out of the book.

To us, that is irresponsible. We refuse to take your money and deliver you a substandard product. It’s just not right.

Just how significant were the MS Biz Apps announcements?

If you weren’t at the Microsoft Business Applications Summit, you might not know about these announcements. You can read the full list here, but let’s recap the key ones for us here. We can divide them into two categories:

  1. Awesome-but-not-critical (i.e. we would cry because we couldn’t include these, but wouldn’t delay the book for them)
    • New data connectors (including extract from PDF)
    • Fuzzy lookup
  2. Critical features (stuff that must be in the Excel version of Power Query before we can test material, write about nuances, shoot images and release the book)
    • Data profiling (quality) previews
    • M Intellisense in the Advanced Editor, formula bar and Add Custom Column window

These last two features will have a significant effect on the images of the book, as you can see here:

Power Query UI Preview

What is the revised timeline for the Master Your Data book?

The new Power Query features are estimated to arrive in Power BI Desktop by October 2018. And based on the historical pattern, these features will show up in Excel within 2-3 months of their Power BI release date. Giving us time to test the new features, take screenshots, revamp the book order to best tell the data story the way we need to… We are hoping to have the book in print by the end of Q1 2019. It’s still aggressive on our side, but that is our refined target. If the builds ship later, or things take longer then anticipated, it could slide it into Q2.

Yes, we know it’s a long way away. We know you’ve been waiting, and we wish it could be faster. But again, we hope that you understand that we are doing this to truly give you the best book that will last longer than it could otherwise.

In the mean time – can we give you something else?

We have been working on another product as well: Master Your Data in Excel & Power BI recipe cards.

Naturally, all members of our Power Query Academy will get a free copy of these. And due to the significant delay of the book, we’d also like to offer a free copy to anyone who has pre-ordered Master Your Data on Amazon. (More on this below.)

So, why can we do these, but not the book? It’s because they assume you already know Power Query’s User Interface, so only provide the steps on how to accomplish the goal. Loaded with before and after pictures, and the route to get from one to the other, we aren’t bound by User Interface design changes.

Here’s a quick sample of one of the cards:

Power Query Recipe: Pivoting Stacked Data

And another:

Power Query Recipe: Split Records into Columns

How will these be sold?

Ultimately, we plan on selling this product on a subscription basis through our web store as follows:

  • $14.99 for the purchase of the downloadable card set
  • $2.99/quarter for a subscription to updates

We already have 26 cards designed, with more on the way. As we expand the set, the original purchase will include the new cards. But for those on subscription, we will update your original purchase and give you access to the new cards when we release them. We’re not intending to hold these for quarterly release, but rather send a new one every time we build it. You might get five in one quarter and one in the next, but our intention is to keep delivering new patterns as we discover them and build summary tips cards to illustrate them.

Wait… didn’t you say subscription doesn’t work for publishers?

For books, yes, it’s really hard. They’re complicated and require ensuring that things are taught in the right order, with all the updated techniques along the way. These Master Your Data recipe cards are snapshots of what to do in certain data cases when working with data in Excel and Power BI, so are a much more refined scope.

It’s way easier for us to update cards, or add entirely new ones, as it doesn’t require re-writing precedent chapters. So in this case, it makes sense, as we can provide an initial catalogue of patterns, and add more over time. We’ve already got ideas for a bunch more to expand this set.

How can I get my hands on the Master Your Data Recipe Card set?

There are a few ways…

For those of you who are members of the Power Query Academy, we will add the tip cards as a resource as soon as we have them ready. It’s part of your subscription so, as long as you’re still an Academy member, you’ll get all the new ones we create. In addition, we will also make sure you get a copy of the new Master Your Data book as soon as it is released (even if your subscription has expired and you’re no longer an Academy member).

For those of you who have pre-ordered our new book on Amazon, please follow the Excelguru blog. We will post when the recipe cards are ready and will let you know what you need to do to get your free download of the initial package of cards. The subscription for updates will be available as well, but will be entirely optional.

And if you’ve just been waiting for the Master Your Data book and haven’t purchased yet, all good. We’ll be setting these up in an online store to allow you to buy the download version and (optionally) sign up for the updates as well.

When will the Master Your Data Recipe Card set be available?

Soon. We are in final design for the card set now and need to set up our web store to handle subscriptions. Our target is to have that all done by September 15, 2018, if not earlier. Keep watching here for the official announcement.

Ultimately…

…we wish we could send the Master Your Data book to you today, but hope that this will make a reasonable substitute to get you over the hump until we can. Thank you for your patience, understanding and trust in us as we work to deliver you the best version we possibly can.

Power Pivot eBook Coming Soon

It's been a long time coming, but we are putting the finishing touches on the third installment of our free 'DIY BI' series. Consequently, we are excited to announce that the Power Pivot eBook will be officially released on Tuesday, July 3, 2018!

Power Pivot eBook

This brand new book will feature five of Ken's top tips, tricks, and techniques for Power Pivot, including:

  • Hiding fields from a user
  • Hiding zeros in a measure
  • Using DAX variables
  • Retrieving a value from an Excel slicer
  • Comparing data using one field on multiple slicers

Power Pivot eBook

 

About the 'DIY BI' Series

This free eBook series is available to anyone who signs up for the monthly(ish) Excelguru email newsletter. The series includes four books, one edition each for Excel, Power Query, Power Pivot, and Power BI. Each book contains five of our favourite tips, tricks, and techniques which Ken developed over years of research and real-world experience.

DIYBI eBook Series

We first launched this series in the spring of 2017 with the Excel Edition, and the Power Query edition followed later that summer. You can read some more about why Ken decided to create this series in his initial blog post about it.

The Excelguru Newsletter

The monthly Excelguru email newsletter features the latest updates for Excel and Power BI, as well as upcoming training sessions and events, new products, and other information that might be of interest to the Excel and Power BI community.

Don't Miss Out, Get Your Free Copy of the Series

If you're not already a newsletter subscriber, you can sign up here. We will send you the Excel Edition right away, and the Power Query Edition a few days later. All of our current and new subscribers will receive the Power Pivot edition once it is released on July 3, 2018. Be sure to keep an eye on your inbox for the new book.

We will be continuing to work on the fourth and final book, the Power BI Edition, over the coming months so stay tuned for details!

Trick to Protect Excel Tables

Slobodan emailed me to describe a trick to protect Excel tables that he is using to drive data validation lists.  The data validation lists are sourced from tables loaded via Power Query, and leverage a little hack to hide them from prying users eyes.  I thought it would be cool if he shared it with everyone, so asked him to write up a little blog post on it, and here it is!

Take it away Slobodan…

Hello everybody,

Recently, my team and I had faced a problem with refreshing PQ tables that we managed to solve with a simple trick (no VBA coding), and shared it with Ken who asked me to share it with community. Thank you Ken for this opportunity! Glad to make some kind of contribution, to all of you PQ users.

Solution Background

We created calculation model for our sales people (Full cost calculation).  Inside this Excel file, they have a lots of drop down lists from which they can choose customer, partner etc. The idea is to make these dropdown lists dynamic.  In other words, whenever a new customer is created in SAP, they should be able to select this customer in Excel using a dropdown list. This is where Power Query comes to the rescue.

We have scheduled daily export of all our customers from SAP to a file on a network drive, and use this file as the data source for a local PQ table in the workbook. We then use our Power Query table “Customers” as the source for dropdown lists in calculation model.

The Challenge

How to make it fully automated? We have two goals here:

  1. We want Power Query to be scheduled for automatic refresh on a daily basis
  2. At the same time, we would like to protect Excel tables sourced via Power Query from careless users

For the first point, we have Power Update - a tool which allows you to schedule daily refresh.

Note from Ken: I haven't seen Slobodan's model, so there may be a need to use Power Update to do what he's doing.  If you only need your Power Queries to update each time the Excel workbook is opened, however, you could force an update by changing the table's connection properties to force an update upon open.

Second issue, in order to protect Power Query table, we need to hide these sheets and protect the workbook.  The end result is that our Customers table is hidden and cannot be unhidden and everything looks promising.

clip_image002

Of course, Excel protects the whole workbook structure using this method, which causes Power Update to fail. In fact, query refreshes also fail if we try to refresh data manually.

clip_image001

So the obvious solution doesn't work.  I spent time Googling for solution to this but could not find one 🙂

Our Solution

I am not VBA guy, but I remembered one tip from Mynda Treacy’s dashboard course which I applied here.

Step 1

  • Hide the worksheet and open the Visual Basic Editor (press Alt+F11)

Step 2

  • In the Project Explorer Window (Ctrl + R if it's not showing) select the sheet which  contains the Power Query table

clip_image003

Step 3

  • In the Properties Window (press F4 to display this), set the Visible property to "2 - xlSheetVeryHidden"

clip_image004

Step 4

  • Go to Tools --> VBAProject Properties --> Protection
  • Check the box next to "Lock Project for Viewing"
  • Set a password so only you can access it
  • Close the Visual Basic Editor

The Effect

Our sheet containing the Customers table is hidden, and there is no possibility to unhide it.  It doesn't even show up in the menu!.

clip_image005

At this point the only way to unhide the worksheet is to go into the Visual Basic Editor, and reset the worksheet's Visible property - but you protected the VBA project with a password so no one can get in there.

The great thing is that refreshing the Power Query tables will work, because you didn’t actually lock the workbook structure.

Caveat

This solution is intended to protect data from regular excel users, who can easily mess up your workbook.  Do be aware that users with VBA skills will be able to break the password, or extract the hidden sheet contents.

Hopefully someone finds this useful 🙂

Take care!

Ranking Method Choices in Power Query

My recent post on showing the Top X with Ties inspired a discussion on ranking methods.  Where I was looking to rank using what I now know as a standard competition rank, Daniil chose to use a dense ranking method instead.  Oddly, as an accountant, I've never really been exposed to how many different ways there are to rank things - and I'd certainly never heard the terms skip and dense before. (At least not tied to ranking!)

Naturally, after a few emails with Daniil and a bit of a read over at Wikipedia on 6 different common ranking methods, I had to see if I could reproduce them in Power Query.

What are the 6 different ranking methods?

Let's look at a visual example first.   These were all created in Excel using standard formulae:

image

The first thing I had to do was figure out what each ranking method actually does.  So here's a quick summary according to Wikipedia's article on the subject:

  • Ordinal Ranking - This ranking method uses sequential number for each row of data, without concern for ties
  • Standard Competition Ranking - Also know as a form of a Skip ranking, this method gives ties the same rank, but the following value(s) are skipped.  In this case, our values go 1,2,3,4,4, 6.  (5 is skipped as the 5th item is tied with the 4th)
  • Modified Competition Ranking - This is similar to the Standard Competition ranking method, but the skipped values come before the ties.  In this case, we would get 1,2,3, 5, 5, 6.  (As 4 and 5 are tied, they both get ranked at the lower rank.)
  • Dense Rank - In this ranking method, ties are given the same value, but the next value is not skipped.  In this case we have 1, 2, 3, 4, 4, 5.
  • Fractional Rank - Now this one is just weird to me, and I'd love to know if anyone has actually used this ranking method in the real world.  In this algorithm, ties are ranked as the mean of the tied ordinal rank.  Very strange to me, but it won't stop me from building it!

So know that we know what they all are, let's build them in Power Query so that we can perform them in both Power BI and Excel.

Groundwork for demonstrating the ranking methods

If you download the sample workbook, you'll see that it has the full table shown above.  To make this easy, I set up a staging table called SalesData as via the following steps:

  • Select a cell in the Excel table --> Data --> From Table/Range
  • Select the Item and Sales columns --> right click --> Remove Other Columns
  • Load it as a connection only

This gave me a simple table with only the product names and values as shown here:

image

As you can see, the values column has already been sorted in descending order, something that is key to ranking our ties.

One thing I should just mention now is that - for every ranking method - we will actually start every new query by:

  • Referencing the SalesData query
  • Renaming the new query to represent the ranking method being demonstrated

That means that I'm just going to give the steps each time based on the view above, since that's what we should get from the referencing step.

Ranking Method 1: Ordinal Rank

This ranking method is super easy to create:

  • Sort the Sales column in descending order
  • Sort the Item column in ascending order (to rank ties alphabetically)
  • Go to Add Column --> Index Column --> From 1
  • Rename the Index column to Rank
  • Reorder the columns if desired

Yes, that's it.  It simply adds a row number to the way you sorted your data, as shown  here:

Ordinal Ranking Method in Power Query

Ranking Method 2: Standard Competition Rank

This ranking method involves using a little grouping to get the values correct:

  • Sort the Sales column in descending order
  • Add an Index column from 1
  • Go to Transform --> Group
    • Group by the Sales column
    • Create the following columns:
      • Rank which uses the Min operation on the Index column
      • Data which uses the All Rows operation
  • Expand the Item column
  • Reorder the columns if desired

The result correctly shows that the Dark Lager and Winter Ale - 4th and 5th in the list, but tied at 557, each earn a rank of 4, and the Member Pale Ale (6th in the list) comes in with a rank of 6.  There is no item ranked 5th, since their rank was improved to be in a 4th place tie.

Standard Competition Ranking Method in Power Query

Ranking Method 3: Modified Competition Rank

To create ranking following the Modified Competition ranking method, we need to:

  • Sort the Sales column in descending order
  • Add an Index column from 1
  • Go to Transform --> Group
    • Group by the Sales column
    • Create the following columns:
      • Rank which uses the Max operation on the Index column
      • Data which uses the All Rows operation
  • Expand the Item column
  • Reorder the columns if desired

The only real difference between this ranking method and the standard competition rank is that we create the Rank column using the Max of the Index column instead of the Min used in the previous method.

The result correctly shows that the Dark Lager and Winter Ale - 4th and 5th in the list, but tied at 557, now earn a rank of 5 (not 4 like the standard rank).  There is no item ranked 4th, since their rank was dropped to reflect a 5th place tie.

Modified Competition Ranking Method in Power Query

Ranking Method 4: Dense Rank

The dense ranking method requires a change to the order of the steps from what we did in the standard competition ranking method.  Namely the Group By command must come before the addition of the Index column:

  • Sort the Sales column in descending order
  • Go to Transform --> Group
    • Group by the Sales column
    • Create the following columns:
      • Rank which uses the Max operation on the Index column
      • Data which uses the All Rows operation
  • Add an Index column from 1
  • Expand the Item column
  • Reorder the columns if desired

This method will yield the results found here:

Dense Ranking Method in Power Query

The result correctly shows that the Dark Lager and Winter Ale - 4th and 5th in the list, but tied at 557, ranked in 4th place - just the same as the Standard Competition rank.  But where it differs can be seen in the ranking of the Member Pale Ale.  6th in the list, it is ranked 5th, as there are no gaps left after the ties.

Ranking Method 5: Fractional Rank

As I mentioned at the outset, I find this to be one of the strangest methods of ranking.  Like the others though, it's actually really easy to create when you know how. (And certainly more straight forward than using an Excel formula to calculate it!)

  • Sort the Sales column in descending order
  • Add an Index column from 1
  • Go to Transform --> Group
    • Group by the Sales column
    • Create the following columns:
      • Rank which uses the Average operation on the Index column
      • Data which uses the All Rows operation
  • Expand the Item column
  • Reorder the columns if desired

One thing I will say… it's certainly makes it obvious that there are other ties in the table.  Maybe that's the point of it?

Fractional Ranking Method in Power Query

Final Thoughts

I was actually surprised to see how easy it is to change the ranking methods with just some minor modifications to the order of steps and/or the aggregation chosen when applying the grouping method.  It certainly gives us some robust choices!

And while we can certainly create each ranking method using Excel formulas (each is demonstrated in the sample file if you're curious), this is even more awesome.  Now we don't need to load data and land it in the grid.  We can go straight to Power Pivot or Power BI should be need to.

If you'd like to download a file with each of the methods illustrated, just click here.

Return a Specific Day of the Next Month

In a comment on a previous post, a reader asked how you return a specific day of the next month from any given date.  In other words, I've got a date of March 5 and I want to use Power Query to return April 10 in Excel (or Power BI).  How do you do it?

The Excel User's First Guess

So my first thought was to jump straight into the Power Query Formula reference guide to review the date functions.  Surely there must be something in there to manipulate dates and such, right?

Here's a quick list of the the functions I knew I'd need:

  • Date.Year()
  • Date.Month()
  • Date.AddMonths()

So those are awesome for ripping dates apart and shifting them, but what I really needed at the end was a way to put things back together.  I needed an equivalent of Excel's =DATE(year,month,day) function.  I couldn't find one.

Return a Specific Day of the Next Month

After poking around with this for a while, it suddenly occurred to me that I was doing this all wrong.  To return a specific day of the next month, I just needed to provide the "literal" #date() and I was good to go.

Let's take a simple table like this:

image

I pulled it into Power Query, went to Add Column --> Custom Column, and added the following formula:

=#date(
Date.Year(Date.AddMonths([Dates],1)),
Date.Month(Date.AddMonths([Dates],1)),
10
)

And at that point it works beautifully:

image

Basically, the #date() literal works just like Excel's DATE() function, you just case it differently and put a # tag in front of it:

#date(year,month,day)

It's a weird one, for sure

Returning a specific day of the next month is one of those odd cases where you have to use one of Power Query's literals to create the date you want, rather than employing a function to convert values as you're used to in Excel.   The good news though?  Miguel does an amazing deep dive into the M coding language in our Power Query Academy, including explaining what literals, tokens, keywords and more are all about.

If you want to understand this in depth, check out our course:

image

PS:  Sign up for our free trial first, to make sure you like our style!  And when you're convinced… you won't find better Power Query training anywhere.  Smile

Unlock Excel Conference

Unlock Excel Returns to Australia and New Zealand

We're really excited that Ken is going to be heading back "down under" this April for CPA Australia's Unlock Excel conference. Spending two days in each of four different cities, Unlock Excel features sessions from a renowned group of Microsoft Most Valuable Professionals (MVPs). Based on how well-received the conference was last year, you don't want to miss out on this year's event!

Unlock Excel

The Unlock Excel conference will be coming to Melbourne, Sydney, Brisbane, and Wellington in April 2018.

Who is Unlock Excel For?

Unlock Excel is geared towards people who want to discover fresh and exciting ways to unlock the full potential of their data. While advancing your Excel skills, you will also learn how to become more efficient, explore new tools and applications, and streamline your day-to-day processes.

Who are the MVPs?

Microsoft chooses their MVPs annually based on their high level of public community contributions, but it's more than just quantity of materials that they share for free... they also have to be technical experts in their field.  One of the hallmarks of an MVP is their  continued dedication to discovering the best ways to use Excel and other Microsoft products, and another is their passion for sharing those techniques with the world.

Ken and the other presenters at Unlock Excel are passionate educators with a deep knowledge of Excel. Because of their experience bringing together diverse platforms, products, and solutions, they will share how they tackle real-world problems.

What Will I Learn?

The sessions at Unlock Excel will feature a variety of topics including charting, financial modelling, VBA, and the Power BI suite of tools. By attending, you will pick up valuable time-saving tips and tricks to take your current knowledge to the next level. Additionally, you will learn to manage your own complex data sets, thereby uncovering unique insights. As a result, you will add value to discussions and help drive effective decision making for your business.

When and Where Can I Attend?

This year's Unlock Excel conference will be visiting the following cities:

Take advantage of early bird pricing available only until March 13, 2018. Visit the CPA Australia site for more information and to register.