Using Rich Data Types in Power Query

If you’re on Office 365 and don’t have Excel’s new Rich Data Types, you should know that they’ll be coming to you soon.  Giving us the ability to create both Stocks and Geographies, these are going to add some exciting new capabilities to Excel, particularly if we want to enrich our data.  In this post, we'll quickly explore what Rich Data Types are, what they add, and how they are treated by Power Query.

What is a Rich Data Type?

Have a look at the following data:

Table of locations for experimenting with Rich Data Types

The challenge with this data is that it is completely text based.  What if we wanted to enrich this with more information like population, latitude or longitude?  The answer is to convert it to Excel’s new Rich Data Type.  To do this:

  • Select the data
  • Go to the Data tab -> Data Type -> Geography

This will then convert the text into “Entities” with a little map icon beside them.  And clicking on that little map icon shows some pretty cool new things:

Example of the Geography Data Type

This is the new geography data type. Unlike the original text entry, this object contains all of the properties you see on the card, adding a whole bunch of power to our original data.

NOTE:  The data on this card comes from a variety of sources such as Wikipedia and WeatherTrends360.  Full attribution can be found at the bottom of the card.

Working with a Rich Data Type

One of the very cool things about this new data type is the ability to expand the enriched data from the object.  To do this:

  • Mouse over the top right of the table
  • Click the Add Column dialog
  • Check the box(es) next to the columns you want to add

Adding enriched data to the Location table

Shown below, we’ve extracted Latitude, Longitude, Population and Name.

The Location table with enriched data added

Note:  This button just writes the formulas needed to extract the data from the Rich Data Type. We could have easily written formulas to do this ourselves, such as =@Location].Latitude or =A4.Latitude.

The impacts of this should be pretty clear… even though we started with text, we now have the ability to convert it into a real place and pull further data back from that area!

Rich Data Types and Power Query

The ability to enrich a plain text data source is huge.  One simple example of their impact is that we could add the Lat/Long coordinates to allow proper mapping in Power BI. But how will Power Query read these new Rich Data Types?  Not well as it turns out…

The enriched Locations table has been brought into Power Query but creates an error

Ideally, Power Query would pull in this data and recognize it as a proper record, which would allow you to extract the elements.  And while I’m sure that will happen one day, it won’t be possible when Rich Data Types hit your build of Excel.

The trick to getting at this data today is actually already evident in the image above: create new columns in the original table.  Even though Power Query (in Excel or Power BI) can’t read the Rich Data Type itself, it CAN read the columns you extract via formulas.  It’s a workaround, and one we’d prefer not to have to do, but at least we can get to the enriched data that these new data types give us.

Task Tracking with Power Query

Did you know Power Query can be used as a task tracking tool? This might sound quite unusual but the method described here has been used for solving a real business case. The example I will use is rather simplified but still close to reality, and will demonstrate how to build task tracking with Power Query.

Laying out the Scenario

Vicky is a manager of a small team that is dealing with customer questions on various topics. One of her duties is to distribute various questions among her subordinates. After that, each of them should take some actions and report what is the status of each task.

The problem is – how can each employee can see what tasks are assigned to him/her and fill in the respective information for each task? At the same time, Vicky should at any moment be able to assign a new task and review the statuses of old ones. This is the table Vicky needs:
Task Tracking with Power Query

Unfortunately, she has no other tool at hand except Excel. Luckily, she can set up task tracking with Power Query right in Excel, which could work perfectly in this case.

Setting up Task Tracking with Power Query

So let's start building the solution.

1. Load the left table (in this example, called Filled by Manager) into Power Query.
Manager's table to assign tasks

2. Next, create one query for each employee by filtering the Employee column.
Create individual employee queries

3. Load each Employee table into a separate Excel sheet. (Of course they can be on different files linked to the source table).
Sample employee table showing tasks assigned

4. Then, create a table for each employee to fill in the actions and statuses.
Employee's task tracking worksheet

You can see in the above picture what each employee will have in his/her worksheet - a green table on the left with the tasks assigned to them, and a yellow table on the right where he/she has to fill in the respective information.

Creating the Filled By Employees Table

5. Load all the Employee tables into Power Query.
Load all the employee tables into Power Query

6. Append them in a new query (in this example, called Statuses).
Append all the employee tables into new Statuses query

You are probably guessing what the next step is – load the Statuses query into Excel right next to the Filled By Manager table

However, the result is not what we would expect.
The Filled by Manager table is not matching the newly loaded Fill by Employees table

Note that on first row of the Manager’s table is a task assigned to Ivan on 27.01.2019, but row 1 of the Employee’s table shows the task assigned to Maria on 09.02.2019.

In order to fix this mess, we need one additional query.

Building the Task Code Query

7. Once again, load the Manager’s table into Power Query and remove all columns except for Task Code.
Task Code Column

Task Code is a unique identifier of each task. In this example, it is simply composed of the employee's name and the number of occurrences of this name in the register up to the respective row. In Excel language, the formula is:
Use COUNTIIF to create unique task identifiers

The trick is that we fix the first row of column F (containing the employees' names) but the end row is not fixed.

8. Merge the Register Employees and Statuses queries together.
Merge the Register Employees and Statuses tables

9. Finally, expand the table and voila - it is in the required order. The only thing left is to load it back into the Manager’s table.
Final table for task tracking with Power Query

Now, any time she needs to, Vicky can refresh the Filled by Employees table and see the updated statuses of each task.

Likewise, each one of her subordinates can simply refresh the Manager’s table (the green one that is on left of his/her tab) to see any new tasks that have been assigned.

You could also automate the refresh operation VBA. For more details, refer to Chapter 16 of Ken's M is for (Data) Monkey book.

Final Words

This article presents nothing new and unusual as a Power Query technique. What is new and unusual is the way Power Query has been used for solving a typical business problem. This is just additional proof of how powerful and useful this tool is.

You can find the file with example here: Task tracking with PQ

Become a Data Master with Power Query

Ken is really excited to be teaching his popular Master Your Data with Power Query class for the first time in New York City! Join us this spring for a small group hands-on workshop and learn how to become a Data Master with Power Query.

Learn to become a Data Master with Power Query

Join Ken Puls for a live hands-on session in New York, NY on April 17, 2019.

What's so great about Power Query?

If there is one thing you need to learn in Excel today, Power Query is it. With Power Query, you can clean, reshape, and combine your data with ease. No more tedious cutting and pasting between multiple files. No more manually removing garbage rows or adding new columns. And no more repeating the same time-consuming steps whenever you need to refresh the data.

Instead, once you have your data the way you want it, all you have to do is click refresh and it will be ready to be loaded into the next day's/week's/month's/quarter's report. You can even schedule these refreshes to happen automatically!

What will be covered in the workshop?

The day will begin with a quick overview of Excel tables, PivotTables, and what makes "good" data. Next, Ken will show you how to import data from a wide variety of files, including Excel workbooks, CSV and TXT files, databases, and even entire folders. You'll be able to clean, transform, and refresh your data in Power Query with just a few clicks.

Ken will also show you how to append (or stack) data from multiple tables and 7 ways to merge (or join) tables without any VLOOKUPs. You'll be able to pivot data like this:

Pivoting Data with Power Query

and unpivot data like this:

Unpivoting Data with Power Query

But wait, there's more!

Ken will teach you some more advanced techniques using conditional logic. He will also give an overview of best practices for structuring your queries and query folding. Additionally, you'll receive a copy of the course slides to refer back to. Many of these slides contain handy recipes that will lead you step-by-step through the data transformation techniques.

Not only that, you will be able to directly ask a leading Power Query expert to help you with challenges you are currently facing with your own data. That kind of in-person access is invaluable!

How can Power Query help me?

A data wrangler spends the majority of their time just gathering, cleaning, and preparing the data before it can be even used in a report, chart, or other data model. Instead, become a Data Master with Power Query and get hours of your time back. For example, Ken was able to help a workshop attendee automate a workflow in 30 seconds, saving them 6 hours per week!

Power Query is the data preparation tool of the future, not only for Excel but also for Power BI Desktop, Microsoft Flow, and more. Thus, everything you learn in this course is transferable to other technologies - giving you more bang for your buck.

How do I make the case to my boss for sending me to the course?

As Ken mentioned in a previous blog post, the cost of the course can look like a lot up front. This is especially true if you must also pay for travel, hotel, etc. But divide the $499 USD registration fee + any expenses by your hourly rate. You'll see a return on investment pretty quickly with the time you save using Power Query.

However, the real value of the training comes in when you look at what you can do for your company with that extra time. You can now focus on analyzing the data instead of preparing it. See, Power Query turns your data into information. It allows you to identify new opportunities, make better decisions, and add real value to your organization.

Where do I go to become a Data Master with Power Query?

Ken will be leading this full-day workshop on Wednesday, April 17, 2019 at the NYC Seminar and Conference Center in New York City. But this is a small group session so there are only a limited number of spots available. Register today to secure your chance to receive personal guidance from a world-class Power Query expert. Go to the Excelguru website to view the full course description and register online.

One Solution to Challenge 4

Yesterday, I posted a new Power Query Challenge, and in this post I'm going to show my solution to challenge 4.  You can pick up this solution as well as solutions created by the community in this thread of the Excelguru Forum.  And as a quick note - the very first answer posted there is much slicker than what I've written up here... but hopefully some tricks here will still help you up your Power Query game. 😉

Background on the Solution to Challenge 4

The original issue was to create a header from different rows in the data.  You can read the full reason for this in the original blog post, but basically put, I needed to convert this:

image

To this:

image

On a dynamic basis so that I could easily repoint the data set to this one:

image

And return this:

image

The major wrinkles in creating a Solution to Challenge 4

The biggest issues I had to deal with were these:

  • I couldn't just promote row 1 to headers and rename the first column.  Why?  Because Power Query would have hard coded "admin" in the first set, which would have triggered an error when pointed to Sales
  • I couldn't just rename the annual columns.  Why?  Because the years change, so Column2 is 2015 in one data set and 2016 in the other
  • After promoting the header row, I couldn't declare data types.  Why?  Because the column names get hard coded, meaning that the code would trigger an error when it couldn't find 2015 or 2018 in the data sets.

Fixing these on a static basis is easy, it's wanting it to be dynamic that is the issue.

Creating the Solution to Challenge 4

So how did I accomplish the goal?  I started with the workbook I posted in the forum, then took these steps.

As described in the original post, I edited the MakeMyHeaders query in order to do the work.  I then:

  • Demoted Headers (Home -> Use First Row as Headers -> Use Headers as First Row)
  • I then right clicked the "Changed Type" step in the Applied Steps area and renamed it to "AllData" (with no space)

image

This basically gives me an easy-to-come-back-to point of reference for later.  And since I did not include a space, it's super easy to type.  (If I left a space in there, it would be #"All Data" instead.)

Next, I needed to create my header row which involved:

  • Keeping the top 2 rows only (Home -> Keep Top Rows -> 2)
  • Right clicking and renaming the step "HeaderBase"

image

I then replaced the department name (Admin) with "Name". The trick was to make this dynamic, which involved a couple of steps.

  • Right click Column1 and do a replacement as follows:

image

  • Then, in the formula bar, remove the " characters around both HeaderBase[Column1]{1} and HeaderBase[Column1]{0}

SNAGHTML23bb1ac0

So what was that all about?  It's replacing the value in the 2nd row - row {1} - with the value from the first row - row {0}.  And it's completely dynamic!  (For reference, the reason I reduced this to only 2 rows was that if I tried this when all rows were showing, the departments on the data rows would be lost.)

I then went and removed the Top 1 row, leaving me with this:

image

So far so good.  Now I just needed to add back the original data.  To do that:

  • I went to Home -> Append and appended the MakeMyHeaders query (yes, I appended it to itself)
  • I then modified the formula from:

= Table.Combine({#"Removed Top Rows", #"Removed Top Rows"})

  • To

= Table.Combine({#"Removed Top Rows", AllData})

Which left this:

image

The final cleanup took a few more steps:

  • Promote First Row to Headers
  • Delete the automatically created Changed Type step (so we don't lock down the years in the code)
  • Remove Top Rows -> Top 2 Rows (to get ride of the rows I used to create the Header)

And we're done!

image

Proving that the Solution to Challenge 4 works

Naturally, loading this query to a table will show that it works.  But to really prove it out:

  • Edit the MakeMyHeaders query
  • Select the Source step
  • Change the formula to =Sales
  • Go to Home -> Close & Load

You'll see that it updates nicely with the updated headers

Why the solution to Challenge 4 even matters

If you ever decide to combine Excel worksheets, and want to hold on to the worksheet name as well as the data, you'll need this technique!

Power Query Challenge 4

It's time for Power Query Challenge 4!  This one is a tricky little challenge with creating a header row - but from different rows in the data set.

The real world scenario driving Power Query Challenge 4

Have you ever tried to combine Excel files in a folder, and wanted to preserve the worksheet name along with the contents?  If you have, you'll end up looking at data that follows this kind of pattern:

image

Notice that in step 1 we have the sheet name and a table with the contents.  And step 2 shows what happens when we expand all columns from the table.  So what's the issue?

This is the crux of Power Query Challenge 4… we need a header row that looks like this:

SNAGHTML236b9b5f

Easy right?  Not so fast!

The value in the Name column will change for each file in the folder.  In addition, the data in the columns may also have different names.  So you can't hard code anything here…

Sample data for Power Query Challenge 4

Let's be honest, this isn't simple or it wouldn't be a challenge, but we're going to try and keep it simpler by focussing on just the header row issue. (We're going to skip the whole combine files stuff, and just use some pre-formatted Excel tables that exhibit the problem.)

To build and test your solution I'm providing a file with two different data tables (Admin and Sales) and a query called "MakeMyHeaders" that just refers to the Admin data set right now:

image

The data in the Admin table looks like this:

image

And Sales looks like this:

image

Solving Power Query Challenge 4

To solve this challenge, you should work in the MakeMyHeaders query, and convert the data so that it outputs the data shown here:

image

And, if you did it correctly, you should then be able to edit the MakeMyHeaders query, select the Source step and change the formula to =Sales.  After loading, you should get the output shown here (without any errors):

image

Simple right?  Smile

Posting Your Solution

As mentioned in The Future of Power Query Challenges, we are no longer accepting submissions by email.  And don't post your solution here either.  (Comments are still welcome!)  Instead, we are collecting answers on the Power Query Challenge 4 thread in our forum.  Post your solution there, provide a short description of the approach you took, and have a look at other's submission there.

My solution will come out as a blog post tomorrow.

The Future of Power Query Challenges

Whether by email or in person at conferences, I've received a lot of requests for more Power Query Challenges.  You all seem to love testing your skills, and I really want to keep the Power Query Challenges going.  In order to do so, however, I need to evolve the format a bit to make it sustainable.

As awesome as the Power Query Challenges are, I was a bit unprepared for how much you all would love them, and how much time it would take out of my schedule.  And I'll be honest, that's why you haven't see any more so far - I just haven't had the cycles to do them.  Does that sound odd considering you're creating the solutions?

Why do the Power Query Challenges need to evolve?

The reality is that it takes me a good few hours to come up with the data set, come up with my own solution, then write up and publish both the challenge and solution posts that I do.  So that in its own is a bit of a time eater, but one that I'm prepared to do as sharing knowledge is part of what we do here at Excelguru.

The real killer though, was my original intent to read and review all of the submissions.  That part has already become unmanageable for me. Each of the first three Power Query Challenges has seen an increase in participation, with over 45 solutions provided by you all last time.  If you consider that it takes about 5 minutes just to open and review each submission, that's a good half day of work.  (And that's if they're easy to understand.)  Then try to group and write them up… I'm into this for probably another day.  I just can't put that much time into a non-chargeable service, as I need to earn funds to pay the awesome team that helps me do what I'm doing.

So the question I've been wrestling with is: how can I keep providing these challenges to you and still sustain the time I need to run my business?  So here's my thoughts…

What's the plan for new Power Query Challenges?

I still plan to curate and write up the challenge.  I still plan to write up my solution.  But rather than have you all email us with your solutions, I'm going to set up a post in the Excelguru Power Query forum with the original data.  If you want to play along, you can download the workbook there, and even upload your solution to that thread, with a short comment of how you approached the task - or what you think is special about your solution.

I see a number of benefits here:

  1. You can get your solution out there much more quickly than waiting for us to curate and post them.
  2. Solutions posted by others are available to you immediately upon submission (rather than waiting for use to share a link later).
  3. You can review and comment on the solutions posted by others (give kudos, ask questions, etc.).
  4. With a brief description, you can focus on the ones you feel are the most interesting to you.
  5. As it is a forum, it actually allows you a place to post challenges of your own, too.

Basically, it allows the challenge process to be a lot more "self-service", and isn't that what things are all about today?

I'm hoping that this might allow more interaction between the entire community around these challenges, without me becoming a road block to the process.  Do I anticipate fewer responses?  Yes, I do, only because people will read them before attempting it on their own, or may not share theirs if they see a similar solution.  I'm not sure what we can do about that, or if it's even an issue.

Your Thoughts?

I'm curious as to your thoughts on this approach. Will it work for you?  Do you have concerns about it?  Does it make it less attractive?  Please feel free to leave your thoughts in the comments.

Analyzing text with Power Query and Excel

We all know how good Power Query is for cleaning, shaping and analyzing numerical data. But this powerful tool opens doors for analysts that had been closed for long time, such as for analyzing text. In this article, I will give you some ideas how Power Query can be used for analyzing text that is unstructured. This is a new area for me as well so I’ll be grateful for any comments, thoughts, ideas and shared experience that will help to elaborate further on the methodology.

Our scenario

First, let’s put things in a context with simple but realistic example. Assume you are the newly appointed Marketing Manager of a 4-star SPA hotel. You want to analyze reviews about the hotel in sites like booking.com and tripadvisor.com to understand the major service issues. Thanks to Power Query, you will be able to extract, clean and shape data from those sites to receive a nice table like this:

Comments table

* Table contains ~300 real opinions for several Bulgarian SPA hotels in Hissarya, Velingrad, Sandanski and Bansko. Source: booking.com and tripadvisor.com

But how can you get some insights from this data? Obviously you will spend lots of time to read all reviews. Even if you do that, you need to summarize the information somehow. So let’s play a little within Power Query and see how it can help us analyze text.

Preparing data for analysis

First, we load the table in Power Query (From Table) and clean it of punctuation symbols like commas, semicolons, dashes, brackets, etc., as well as replace capital letters with lower cased one. Why do that last one? Because we want to count the most frequently used words and for Power Query ‘hotel’, ‘hotel,’, ‘Hotel’ are different words.

The cleaning can easily be done with tools like Replace Values, Trim, and Lowercase. I believe they are well-known to average user of Power Query so will I skip detailed operations. Instead, here some lessons learnt from practice:

  • First, duplicate the Opinions column so you can have original text.
  • Then, replace all punctuation symbols with a space.
    • Later we will use a period a separator to split opinions into sentences. And since a sentence can finish with exclamation mark or a question mark it is useful to replace ‘!’ and ‘?’ with ‘.’
  • Since this is a hotel, clients may use the symbol ‘*’ in their opinions (i.e. ‘nice 4* hotel’ or ‘definitely not a 4* hotel’). So don’t replace this symbol.

Above points are nice examples that when analyzing text or other data you need to do it in a context and should explore it first, but that is another story.

Here are the steps applied so far. As a final touch the text has been trimmed and cleaned.

Steps taken to clean data

Separating each comment

Our next step is to split the column using a period ('.') as a delimiter, and doing so at each occurrence of the delimiter.

Split columns by delimiter

Power Query creates a number of new columns, each one containing a sentence. I bet you are guessing what will follow. That’s right – select the original two columns and choose Unpivot Other Columns.

Unpivot other columns

After removing the Attribute column and filtering out rows that are blank, our table has a column for the original comments (Opinion) and a column for each sentence contained in the original comments (Sentence). Please be patient here, and you will understand why this is important.

Table containing full comments and each individual sentence

Let’s now duplicate the Sentence columns and repeat the above actions: split each sentence into its individual words, then unpivot. Below is the table after removing the Attribute column and filtering out blank rows:

Table containing full comments, each individual sentence, and each word

As a next step I will add custom column with a simple formula that calculates the length of each word. We can use it to remove common words that bring no value to our analysis – ‘a’, ‘the’, ‘is’, ‘or’, ‘but’ etc.

Custom function to calculate the length of each word

Finally, filter to exclude words that are 1, 2 or 3 letters long. We will use the resulting table as the basis for analyzing text.

Table filtered for words greater than 3 letters

Finding Lookup words

As a first step in analyzing text, let’s Duplicate the query (you may use Reference as well).

Duplicating the query

In the new Query, simply use the Group By function on the Words column to count the rows.

Grouping by the Word column

Thus, we will have a list with all words (above 3 letters long) and how many times they have been used. For convenience, we can sort this and then load it into Excel.

Word count table

Let’s take a look at the list. Obviously it contains words like ‘with’, ‘this’, ‘have’ that we need to ignore because they could be used in many different situations. On the other hand we see ‘good’ and ‘nice’ in the top 20. Those words are highly subjective – what is good for me might not be good for you. Besides, we don’t know whether the reviewer said ‘good food and nice service’ or ‘the food was not good and service far from nice’. So let’s skip them for now.

Apart from the above, we see that customers mention most often words like ‘room’, ‘pool’, staff’, ‘food’. These are our words of interest (I call them ‘Lookup words’). We may now filter all sentences that contain these words, allowing us to read much less information compared to original list of opinions.

Honestly speaking, if you have enough experience or even some common sense, you can skip this step. I mean it is more than expected that ‘room’, ‘staff’ and ‘pool’ are most often mentioned in reviews for a SPA hotel. However, identifying lookup words might be needed for cases like analyzing text of more open-ended questions.

Filtering Lookup words

So far we have identified our words of interest and listed them in a table:

List of lookup words

So how do we filter all sentences containing these lookup words? We can make it dynamic by creating a custom function. The process is similar to one used to load data from multiple internet pages, as described in M is for (Data) Monkey.

First, we create a query to Filter Rows of the Word column that contain one of our lookup words, ‘staff’ for example.

Filtering for a look up word

Then we go to Home > Advance Editor and change the M code a bit as shown below:

Updating the M code

After we create the function, we rename it as something more user-friendly (I used fxFilter).

Then we load our LookupWords table into Power Query (From Table):

LookupWords table

Next, we Add Custom Column. (Note: in Excel 2016, you have to Invoke Custom Function.)

Adding a custom column

And when we expand the table, we get all rows that contain the words from our lookup list.

Identifying Connected words

We now need to split the Sentence column again to identify the words that were most often used in sentences containing our Lookup words. I call these words ‘Connected words’.

There are some final steps such as eliminating words that appear rarely (i.e. less than 5 times), removing duplicate rows, rearranging columns etc. that I will skip.

Our final table is on the left below:

Final table for analyzing the comments

Analyzing text

As you see, I added two slicers to our table - one for our Lookup words and one for our Connected words. By using them in combination, we can get more insights quite easily. For example, these are the words used most frequently together with ‘staff’:

Slicers used to identify comments regarding staff

Here, I have filtered for the Connected word ‘friendly’, which gives us just 10 rows to read:

Analyzing text for the word 'friendly'

In a separate column, we can manually type “Y” for each positive opinion and “N” for each negative. Now we can calculate that 30% of mentions are actually negative. This is a simple way to quantify and measure our data. And it takes just few minutes to find out that the picture is similar with ‘helpful’ and ‘polite’ – positive feedback is prevailing.

When we filter ‘English’ it is easy to see that 8 people mentioned staff speaks no or poor English. That directly calls for some action - we need to train our staff in English. If you were reading through 300 opinions, you might miss this quite easily.

Analyzing text containing 'staff' and 'English'

It takes just few clicks and couple of minutes or reading to find out that clients are happy with the breakfast:

Analyzing text for comments on breakfast

But are disappointed by the prices and quality of food in the restaurant:

Analyzing text for comments on the restaurant

Of course this is just a starting point and each of the above observations needs further investigation and analysis. But the key takeaway here is that we can successfully use Power Query to summarize unstructured text data. We can then focus our attention on sentences that contain the most commonly used words instead of reading the whole bunch of reviews.

Applying the technique

Many other ideas come up into my mind of how we can use this for further analyzing text data. We can use the Stayed column to study trends in time, we can quantify some key words (as shown with 'friendly') or make a word cloud showing our top 20 words. What are your ideas - how can we use Power Query and Excel to analyse unstructured text?

Power Query Challenge 3 Results

Wow… 46 submissions for Power Query Challenge 3!  Crazy stuff. So many that I haven't even had time to read them all yet!

Due to the overwhelming responses, here's how I'm going to handle this:

  1. Show you how I approached this challenge
  2. Call out a couple of the submissions that I thought were cool and/or interesting
  3. Leave it to you to throw a note in the comments if you think that your (or someone else's) submission should have been listed.

You can find access to all of the submitted solutions (including mine) stored on my OneDrive account.

My solution to Power Query Challenge 3

Step 1 of Challenge 3: Assigning scores to letters

The most important component to Challenge 3 is to have a table that assigns a value to each letter, and there are a couple of ways you could do this.  I elected to do this via the following method:

  • Create a new blank query and name it LetterValue
  • Enter the following formula in the formula bar:
    • = {"A".."Z"}

This creates a nice list of letters from capital A through capital Z:

SNAGHTML99b7af

As great as this is, I still need to add values to the numbers (1 for A, 2 for B, etc..).  It's easy to do with an Index column, but the problem is that the button to do so is not available when you're working with a list.  No problem though, we just convert it to a table first:

  • Go to List Tools --> Transform --> To Table --> OK
  • Go to Add Column --> Index Column --> From 1
  • Rename the columns to Letter and Value
  • Set the Data Types to Text and Value

And that's it.  The query (as shown below) can now be loaded as a connection only for later use:

SNAGHTML9d4b74

Other options for Step 1 of Challenge 3

To be fair, I'm pretty comfortable knocking out a quick list, as I do it all the time for Calendar tables.  But as Bill Syzyz pointed out, you get bonus points for this being 100% UI driven.  Could I have done that?  Of course!  Instead of creating the list, you would:

  • Create an Excel table that holds the letters and values
  • Pull it into Power Query
  • Right click the Letter column --> Transform --> Upper Case
  • Right click the Letter column --> Remove Duplicates
  • Load it as connection only

Is it easier?  Maybe.  It it more robust?  It could be more robust if you also want to score other characters.

Step 2 of Challenge 3: Scoring Words

So now we get into the money part of Challenge 3 - scoring the entire word.  Let's assume that we have a nice little Excel table which stores all the words in a column called "Word" like this:

image

(To be fair, the data could come from a database or anywhere else, the source is really incidental to the problem.)

To score these words we can

  • Pull the data into Power Query
  • Right click the [Word] column --> Duplicate
  • Right click the [Word - Copy] column --> Transform --> UPPERCASE

This leaves us here:

image

Now the trick…(psst… I have a cool pattern card for this… watch this space in the next few days for news!)

  • Right click [Word - Copy] --> Split Column --> By Number of Characters
    • Choose to split by 1 character, repeatedly
    • From the Advanced Options, choose Rows

Your output should now look like this:

SNAGHTMLaa6bfd

Next:

  • Go to Home --> Merge Queries
  • Choose LetterValue and merge [Word - Copy] against [Letter]
  • Expand only the [Value] column from the [LetterValue] column (by clicking the expand icon at the top right of the column)

SNAGHTMLad0db2

The data is out of order now, but it doesn't matter.  It's time to put the finishing touches on Challenge 3…

  • To to Transform --> Group By
  • Configure a basic grouping like this:
    • Group by Word
    • Create a Score column that SUMs the [Value] column

image

And you're done!

image

You can now throw any value into the Excel table, hit refresh, and it will score the words…

image

Well how about that…

And our upcoming Power Query Recipe cards look like they could be way better than hard work too!  (Stay tuned for their release next week!)

Your Challenge 3 Submissions

With 46 submissions for Power Query Challenge 3, it was impossible to go through them all.  I'm hoping that you'll consider being part of the community of reviewers here and check out a few random ones, then post in the comments any that you think I should have mentioned.  I can tell you that in the first four submissions alone there are three different versions of custom functions, and a UI driven approach.

But there are a couple of submissions I looked at that I did want to highlight, as they brought more than just a query to the game.  Smile

Bart Titulaer

Bart's solution includes not only his Power Query work (which he tried more than one way), but he also thought it might be fun to include some frequency distributions with the solution:

image

It's a good reminder that the end goal of Power Query isn't just to clean up data, it's to actually use it.  (Something I probably don't cover enough on this blog!)

Kolyu Minevski

Kolyu decided to compare scoring English vs Bulgarian, and even summed it up for me with a note:

image

Nice to know since I'll be back in Sofia at Bulgaria Excel days on November 1!

Power Query Challenge 3

**Please note that the challenge is now closed, so we are no longer accepting submissions. However, you are still welcome to try it out on your own.

I've got a lot of feedback that you enjoy the Power Query Challenge series we've been running, so it's time for another!  Challenge 3 is just a fun one that was inspired by a conversation I had with Alex J.

Background for Challenge 3

I'm sure you've all seen this before:

If the letters A-Z are worth a value of 1-26 respectively, then:

  • Knowledge = 96%
  • Hardwork = 98%

Of course, the joke is that Attitude is worth 100%.

Your job for Challenge 3

For challenge 3, based on assigning a letter score of 1 for A, 2 for B, 3 for C, etc…:

  • Take a table of words
  • Return the total score using Power Query

So basically… do this:

image

But keep in mind that correctly solving Power Query Challenge 3 requires one very important thing… It needs to work no matter the case of the letters in the original column.

Before you submit your answers to Power Query Challenge 3

After our previous challenges, I got feedback that people really want to see the solutions submitted by others.  I haven't shared them only because I didn't specifically say I was going to, and I can't guarantee that there isn't personally identifiable information in them.  That will change here…

Submitting Your Answer to the Power Query Challenge

**The challenge is now closed, so we are no longer accepting submissions.

To submit your answer:

  • Please name your file using your name - or the name alter-ego if you prefer.  (Keep in mind that your real name could be in the file properties, and it's up to you to clean that out.)  The reason we need a name is so that anyone reading solutions can ask questions about your solution.
  • We have to cap this somewhere, so will allow submissions until the end of Sunday, Sep 16 only.
  • While the challenge is open, you can submit your answer to Rebekah at Excelguru dot ca with the subject Power Query Challenge 3.

After Submissions are closed:

I will post my solution to Power Query Challenge 3, and share a link to a folder of all the submitted solutions.

Please note that last challenge we had over 30 submissions.  While I may post some highlights from the solutions, I won't be doing a full write-up of all them due to the volume we expect to see.

Have fun!

Disaster Recovery in Puerto Rico with Power Query

When Ken was at the Microsoft Business Applications Summit a few weeks ago, he met Mr. J.A. Garcia who has been doing some amazing work with Power Query. We wanted to share his story about how he has been using Power Query in helping with disaster recovery efforts in Puerto Rico:

"[In] my line of work there's been two defining moments that have changed the way we look at our tools. The first one was the Zika outbreak and the second one was Hurricane Maria.

The first time I saw Power Query was [as part of] Power BI during the Zika outbreak [in 2016]. One of our clients needed up-to-date information of the Zika outbreak and its effect on healthcare. With the help of a consultant, we started using Power BI and Power Query.

Aedes aegypti mosquito

An Aedes aegypti mosquito, one of the main transmitters of Zika virus.

I began taking courses during that time, and one of them was about Excel. That's when I learned about Get & Transform in Excel 2016.

Any new job that I received, I tried to use Power Query. I taught myself SQL so I could understand better the process of extracting data and how to integrate it into Power Query.

Our job was changing. We could give the tools to our clients that would let them refresh when they needed it the most. No more waiting [on] our area for a data refresh!

Then Hurricane Maria hit Puerto Rico [in September/October 2017]. It was a harsh two weeks of no communication. As soon as I came back from work, I noticed the change in attitude. As a healthcare company, we began doing Public Health.

Hurricane Maria - Disaster Recovery with Power Query

Hurricane Maria is regarded as being the worst natural disaster on record to affect Dominica and Puerto Rico and the deadliest Atlantic hurricane since Hurricane Stan in 2005.

My main job was identifying members with certain serious conditions. I used Power Query and Excel to create processes that obtain information from the assessment done to keep track of the efforts of the company. The clients could refresh the data and see who was missing, fix any data entry errors and more.

I'm very proud of my work, and Power Query in Excel and Power BI has been a large part of my growth. In the present, we have created a tool that refreshes constantly to help identify members with serious conditions. Now in case of any emergency, we'll know who to attend."

~ J.A. Garcia

We were very inspired how Mr. Garcia began is Power Query journey as part of the disaster recovery efforts after these emergencies, and that he and his team continue to leverage this powerful tool in both Excel and Power BI. Power Query really can help save lives!

Do you have an story to share about your Power Query journey? Maybe it hasn't saved your life literally, but perhaps it has saved you hours of time and effort, a significant amount of money, or even your sanity! Let us know in the comments below or contact us through the Excelguru site.