Power Query Challenge 7 – Phone Words

Yesterday, Nick (one of our forum users) posted a Phone Words challenge in our forum, which looks like a great candidate for Power Query Challenge #7.  I haven't had time to tackle it myself, but if you're up for a Friday challenge, why not give it a go?

1-800-Get-Data converts to 1-800-438-3282

Here's the Challenge (word for word):

Take any Phone # and convert it to all possible Letters aka Phone Words.
Phone # can be input any format (strip out all other characters)

Example:

536-7857 = Ken Puls, etc...

Here's the thread where you can post your solution.

Thanks Nick!

PS, if anyone else has challenges they'd like to post, I'm more than happy to set them up in this series.  Obviously it's been a bit quiet here lately, for reasons that we hope to announce soon!

Solutions for Power Query Challenge 6

This morning I logged in to check the solutions for Power Query Challenge 6 that were submitted and... Wow!  There were a bunch, and some cool variety there.  So now it's time to show you all what I came up with here.

What was Power Query Challenge 6?

The full description and source data can be found in yesterday's post, but in short it was to convert this:

Data table with nested data sets

Data table with multiple data points per cell

To this:

Data in 1NF

Data shown in First Normal Form (1NF)

So how do we do it?

Two Solutions for Power Query Challenge 6

Wait, two solutions?  Why?

As it turns out, I put together two for this one. My first attempt was cooked up to solve the issue on short notice.  Then I built another that seems a bit more elegant.  So I'll show them both, although quickly.

Solution 1 - Splitting Individual Columns and Merging Back Together

The first of my solutions for Power Query Challenge 6 is actually quite similar to what Ali posted in the solution thread.  It basically follows this method:

  • Step 1:
    • Create a Data query that connects to the source data, and load it as connection only
  • Step 2:
    • Create 3 new queries for ItemID, Quantity and Price which
      • Reference the data query
      • Keep the InvoiceID column and the other relevant column
      • Split the relevant column by delimiter, ensuring it splits to rows (not columns as it defaults to)
      • Add an Index column
  • Step 3:
    • Reference one of the Step 2 tables, and merge the other two tables to it, based on matching the Index column in each

So when complete the query chain looks like this:

And returns the table we're after:

The only real trick to this one is that - when you are building the Price query - the Price column will pick the decimal as the delimiter, so you have to force it to a line feed.  So building the Price query would go through the following steps:

  • Right click the Data query --> Reference
  • Select the InvoiceID and Price columns --> Right click --> Remove Other Columns
  • Right click the Price column --> Split column --> By Delimiter
    • Clear the decimal from the Custom area
    • Click the arrow to open the Advanced area
    • Change the selection to split to Rows
    • Check "Split using special characters"
    • Choose to insert a Line Feed character
    • Click OK
  • Set the Data Types
  • Go to Add Column --> Add Index Columns

Resulting in this:

The ItemID and Quantity queries follow the same steps, except that Power Query now correctly identifies the Line Feed as the character to split on.

Solution 2 - Group and Split

While the first solution to Power Query Challenge 6 worked, it left me less than satisfied as it took a bunch of queries.  While effective, it didn't feel elegant.  So I cooked up another solution that uses Grouping.  It's actually quite similar to the first solution that Bill Szysz posted.

The basic method is as follows:

  • Connect to the data
  • Right click the InvoiceID column --> UnPivot Other Columns
  • Right click the Value column --> Split Column --> By Delimiter --> OK

Following the steps above gets us to this state:

To unwind this, we group it:

  • Go to Transform --> Group By
    • Group By InvoiceID, Attribute
    • Aggregate a "Data" column using the All Rows operation

Grouping using the All Rows feature

At this point, we need to number these rows, so I just busted out the pattern to do that from our Power Query Recipe cards (recipe 50.125).

  • Go to Add Column --> Custom
    • Column Name:  Custom
    • Formula:  =Table.AddIndexColumn( [Data], "Row", 1, 1)
  • Right click the Custom column --> Remove Other Columns
  • Expand all fields from the Custom column

Leaving us with this data:

Data Grouped with Numbered Rows

The next step is to Pivot it:

  • Select the Attribute column --> Transform --> Pivot Column
    • Choose Value for the Values
    • Expand the Advanced arrow
    • Change the Aggregation to "Don't Aggregate"
    • Click OK
  • Select the "Row" column and Remove it.  (Yes, it was needed to unpivot this correctly, but now adds no value.)
  • Set the data types
  • Load it to the desired destination

At this point, the query (again) looks perfect:

The desired output

Now, I must admit, this felt far more professional and left me feeling good about it.

Which Solution to Power Query Challenge 6 is Better?

Naturally, solution 2 is better.  It takes less queries, and looks way cooler.  Right?  Not so fast...

The real question is in the performance.  And for this one I thought I'd test it.  But I needed more data.  I expanded the set to 11,000 rows and then used a tool we're working on to time the refreshes.  Privacy was left on, and all times shown are in seconds:

  • Solution 1:  1.43, 1.48, 1.11, 1.27  Avg ~1.32 seconds
  • Solution 2:  2.77, 2.65, 2.63, 2.68  Avg ~2.68 seconds

I'll be honest, this surprised me.  So I went back and added the GroupKind.Local parameter into the Grouped Rows step, like this (as that often speeds things up):

Table.Group(#"Changed Type1", {"InvoiceID", "Attribute"}, {{"Data", each _, type table [InvoiceID=number, Attribute=text, Value=number]}}, GroupKind.Local)

The revised timing for Solution 2 now gave me this:

  • Solution 2A:  2.54, 2.49, 2.56, 2.61.  Avg ~2.55 seconds

So while the local grouping did have a small impact, the message became pretty clear here.  Splitting this into smaller chunks was actually way more efficient than building a more elegant "all in one" solution!

My solution (including 5,000 rows of the data), can be found in the solution thread here.

Power Query Challenge 6

Are you ready for Power Query Challenge 6?  In this challenge we'll look at splitting nested data sets into a nice table in First Normal Form.

Let's take a look at Power Query Challenge 6:

Where did the challenge come from?

The inspiration for this challenge came from last week's Self Service BI Boot Camp that we hosted in Vancouver, BC (you should have been there, it was awesome).  At the outset, we talked about how to identify if your data is in First Normal Form (the format that is ideal for a PivotTable), and I showed this data set:

Data table with nested data sets

Data table with multiple data points per cell

Notice how the invoice has multiple ItemID, multiple Quantity and multiple Price fields per cell?  That's not good at all.

What we Really Need - Data in First Normal Form

As I explained in the Boot Camp, it is essential that the data source be in First Normal Form in order for a PivotTable to consume it.  What does that mean?  It means that it needs to look like this:

Data in 1NF

Data shown in First Normal Form (1NF)

Notice that in this case the data is atomic - it only has one data point per cell.  In addition, there is now one complete record per row.  The InvoiceID shows on every row now, and each data point has been correctly split up and applied to them.

So what's the thrust of the Power Query Challenge 6?

Well, as it turns out this is a bit tricky.  There are a few issues at play here:

  • The data is separated by line feeds within the cells
  • There are a different number of line feeds in each row
  • At least the number of line feeds is consistent for each cell in the entire row though!

So the challenge is this: break the table apart so that the data is in First Normal Form like the second image.

You can download the data source file (and post your solutions) in our forum here.  I'll give you my version tomorrow.

One Solution to Power Query Challenge 5

Hopefully you’ve had a chance to try and build a solution to Power Query Challenge 5.  In this thread, I’ll show you how I approached it.

Challenge Goals

The challenge in this case was to allocate a fixed amount for a series of account IDs based on the data in this list:

Can you make a series from one to the other?

Can you make a series from one to the other?

Creating this…

Desired Power Query Output

Here is the data we need to create

And as a reminder, the rules were:

  • The first 7 digits won’t change for the series
  • If there is no “To” value, then we only need the single value
  • If the final character is text, it will never exceed Z. (i.e. we’ll never go from 10A to 11C)

And you can click here to get the original data file (as well as see solutions posted by myself and others).

I would also like to quickly restate that another of my main goals was to do as much using the UI as possible, and make certain that the students in my workshop could maintain this solution.  During the three day workshop (similar to my upcoming self service BI bootcamp), we did a full day of Power Query work, but mainly around reshaping data through the user interface.  The deepest we got into M code was creating a list using a structure like {1..10}.  So moving to crazy conditional logic and List.Generate solutions was out of the scope of where I wanted to take this for them. 😉

One solution to Power Query Challenge 5

I broke my solution down into four distinct queries as follows:

Listing of all queries used

Let’s look at each of those steps.

The Base Query

The point of this query was basically to carve up the “From” and “To” portions into the components that I would need in order to get my solution.  With the preface remaining the same for every ID in the list, it was all about separating the last 3 digits to go from this table:

Can you make a series from one to the other?

To this one:

Adding From and To columns

The steps I used were as follows:

  • Select the [From] column --> Add Column --> Extract --> First Characters --> 7
  • Name the new column “Preface”
  • Select the [From] column --> Add Column --> Extract --> Last Characters --> 3
  • Name the new column “From3”
  • Add a new conditional column called “To3” with the following formula:
    • =if [To] = null then [From3] else Text.End([To],3)
  • Remove the [From] and [To] columns
  • Set the data types for each column (notice [From3] and [To3] are text, not numbers)
  • Load the query as a Connection Only query

The Numeric Query

Next up, I needed to generate the series for the numeric series.  This was easy, as it works similar to the Power Query Calendar table recipe.  (Recipe card 60.105.x which you can find in our Power Query Recipe Cards)

The steps I used do this were:

  • Reference the Base query
  • Change the type of the [From3] column to Whole Number
  • Filter the [From3] column to remove errors
  • Added a new Custom Column with the following formula:
    • ={ [From3]..[To3] }
  • Removed the [From3] and [To3] columns
  • Expanded the new [Custom] column
  • Merged the [Preface] and [Custom] columns into a new “ID” column
  • Loaded this query as Connection only

This left me with the query output shown here:

Creating the Numeric Series

And the cool this is that using this technique, the final series – which only had a “From” ID and no “To” ID just creates the single item

The Alpha Query

Next up was the alpha query.  Once again, I referenced the Base query, meaning that I started from here:

Adding From and To columns

The steps I followed here were:

  • Duplicate the [From3] column
  • Change the data type on the [From3 – Copy] column to Whole Number
  • Filtered the [From3 – Copy] column to KEEP errors
  • Removed the [From3 – Copy] column
  • Split the [From3] column by Number of Characters, splitting at the left-most 2 characters
  • Split the [To3] column by Number of Characters, splitting at the left-most 2 characters
  • Added a new Custom Column with the following formula
    • = { [From3.2]..[To3.2] }

So at this point, I’m looking at this:

Creating a text based list

Continuing on, I…

  • Removed columns [From 3.2], [To 3.1] and [To 3.2]
  • Expanded the list from the [Custom] column
  • Merged the [Preface] , [From3.1] and [Custom] columns into a new “ID” column
  • Loaded this query as Connection only

And these steps left me with a nice list of the alphanumeric series as follows:

Creating the Alphanumeric Series

The IDs Query

At this point, things became super simple.  There’s no real magic to this query, as the data has already been prepared and normalized in the Numeric and Alpha queries.  So all that needs to be done here is:

  • Reference the Numeric query
  • Append the Alpha query
  • Set the data types (just to be sure)
  • Load the query to the intended destination

And that’s it.

That’s One Solution to Power Query Challenge 5

Of course, there are countless other ways to solve this.  My goal in this case was specifically to avoid a custom function, as I didn’t want to obfuscate the code for a new Power Query user.  Could I have done this all in one query with use of more complicated Custom Columns?  Sure.  But again, would it be maintainable by people newly introduced to Power Query?  Hopefully intellisense in the Custom Column Dialog will change my answer, but right now I’d say that’s unlikely.

One of the things I love about Power Query is the ability to break this down into separate queries, then put them back together.  The reality is that this series of data was mixed, needing slightly different treatment in each case.  This setup allowed me to clearly focus on the goal, getting it done without noise from the other type of data in the set, then just append them afterwards.  I call that a win as it’s easily auditable by someone closer to the beginner side of the Power Query learning curve.

Power Query Challenge 5

It’s been a while since our last challenge, so hopefully you’re ready for Power Query Challenge 5!  This is one that came up as I was teaching one of our 3 day Excel BI Bootcamp courses for a client, (similar to this public course!) which actually made it doubly tricky.  Why?  Because I needed the person to be able to both understand how I approached the task, as well as be able to maintain it after I left.  I didn’t want to just kick up some magic voodoo and walk away, as that isn’t really fair.

The issue for Power Query Challenge 5

The challenge in this case was to allocate a fixed amount for a series of account IDs.  They provided a starting ID, an ending ID, and a value.  Sounds easy so far right?  Here’s the problem… the key part of some of those IDs were purely numeric, but the key part of others were alphanumeric!  An example of the source data:

Can you make a series from one to the other?

Can you make a series from one to the other?

And the required output:

Desired Power Query Output

Here is the data we need to create

The rules – as they were explained to me – were as follows:

  • The first 7 digits won’t change for the series
  • If there is no “To” value, then we only need the single value
  • If the final character is text, it will never exceed Z. (i.e. we’ll never go from 10A to 11C)

How would you solve Power Query Challenge 5?

My first thought was that I’d have to reach to a custom function to do this, but again, I didn’t feel that was fair to my audience with only a day of Power Query training under their belt.  (With Dimensional Modeling and DAX to cover in our workshop, we don’t have time to take people deep into custom functions.)  After playing with it for a bit though, I was able to work past that problem and come up with a UI driven solution which works.  (Having said that, how you decide to solve this is really up to you!)

Now last time, I posted my solution the same day, along with my steps.  The result was that we had less engagement, possibly because that gave too much away.  So this time, I’ve created the thread for you to post your solutions in the forum, but I’m going to hold off on posting my approach until tomorrow.

Good luck!

Creating a Fiscal Saturday Calendar

A recent question from one of our Power Query Academy registrants was about creating a fiscal Saturday calendar - or a calendar where periods end on the last Saturday of each month.  I cooked up a sample of one way to approach this task, but I'm curious if anyone out there has something better.

Basic Goal of the Fiscal Saturday Calendar

The basic goal here is to create a full calendar table with three columns:

  1. A primary "Date" column which holds a single value for every day of the period
  2. A "Fiscal ME" column that holds the fiscal month end based on the final Saturday of the month
  3. A "Fiscal YE" column that holds the fiscal year end based on the last Saturday of the year

The trick here is that, unlike a 4-4-5 calendar which has consistent repeating pattern, the number of weeks per month end could shift unpredictably - especially when you take into account leap years.

Starting with the basic Calendar framework

I started as I usually do with a Calendar table, by following my standard calendar pattern from our Power Query Recipe Card set.  I busted out pattern 60.100 to build both the calendar StartDate and EndDate, leaving me two queries with the appropriate values:

And from there, using recipe card 60.105, I expanded this into a full blown calendar with every day from StartDate to EndDate:

I named this table Calendar-Base and loaded it as a Staging (or Connection only) Query (covered in recipe card 0.110).

The only real thing to note here is that my StartDate and EndDate are the first day of the first fiscal year (a Sunday), and the EndDate is the end of the next year (a Saturday that is 768 days later.)

Creating Fiscal Month and Year ends for the Fiscal Saturday Calendar

The next step was to create a table that generated the fiscal month ends and fiscal years ends for the calendar.  So I create a new query that referenced the Calendar-Base query.

To ensure it was a fiscal Saturday calendar, the dates needed to be based on the last Saturday of the month and last Saturday of the year.  And in order to work these out, I needed two columns:  DayID and DayOfWeek.  To create these:

  • DayID:  I added an Index Column starting from 1. This generates a unique DayID for every row of the table
  • DayOfWeek:  I selected the Date column --> Add Column --> Date --> Day of Week

With these in place, I was now set to create the Month End and Year End columns as follows:

  • Add Column --> Custom Column
    • Name:  Fiscal ME
    • Formula:

=if Duration.Days( Duration.From(Date.EndOfMonth([Date]) - [Date])) <7 and [Day of Week] = 6 then [Date] else null

  • Add Column --> Custom Column
    • Name:  Fiscal ME
    • Formula:

=if Number.Mod([DayID],364)=0 then [Date] else null

These formulas flagged the fiscal Saturday calendar periods as shown here:

The final steps to this stage were then to:

  • Filter all the nulls out of the FiscalME column
  • Remove all but the FiscalME and FiscalYE columns
  • Fill the FiscalYE column up
  • Set the data types on both columns to Date
  • Name the table "Calendar-FiscalPeriods"
  • Load as a Staging query (recipe card 0.110 again)

At the end of the process, the calendar clearly shows our fiscal Saturday calendar period ends:

Finishing the Fiscal Saturday Calendar

The final step is now to put these together. The way I approached this was:

  • Create a new query that references the Calendar-Base table
  • Merge the Calendar-Fiscal Periods to get an Exact Match between the Date and Fiscal ME columns (recipe card 30.105)
  • Expand the Fiscal ME and Fiscal YE columns
  • Fill the Fiscal ME and Fiscal YE columns up
  • Name the query Calendar
  • Load it to the desired destination

Now, to be fair, the calendar only looks like this at this point:

I could certainly add other components.  For a Fiscal Year column, I just need to select Fiscal YE and add a date column.  For months, I'd add a month based on the Fiscal ME column.  And may other patterns can be applied based on the standard date transforms.

And one caveat... the dates fed in must start on the first day of a fiscal, and end on the last day of a fiscal to ensure it works correctly.

My sample file can be found here.

Do you have an easier way?

So here comes the thrust of this... I have easy patterns for standard 12 month calendars, 4-4-5 and their variants and even 13 weeks per year.  But this one, with it's shifting weeks per month threw me off a bit.  I'm curious if anyone has an easier way to generate this which wouldn't rely on splitting this out into separate tables.

One Solution to Challenge 4

Yesterday, I posted a new Power Query Challenge, and in this post I'm going to show my solution to challenge 4.  You can pick up this solution as well as solutions created by the community in this thread of the Excelguru Forum.  And as a quick note - the very first answer posted there is much slicker than what I've written up here... but hopefully some tricks here will still help you up your Power Query game. 😉

Background on the Solution to Challenge 4

The original issue was to create a header from different rows in the data.  You can read the full reason for this in the original blog post, but basically put, I needed to convert this:

image

To this:

image

On a dynamic basis so that I could easily repoint the data set to this one:

image

And return this:

image

The major wrinkles in creating a Solution to Challenge 4

The biggest issues I had to deal with were these:

  • I couldn't just promote row 1 to headers and rename the first column.  Why?  Because Power Query would have hard coded "admin" in the first set, which would have triggered an error when pointed to Sales
  • I couldn't just rename the annual columns.  Why?  Because the years change, so Column2 is 2015 in one data set and 2016 in the other
  • After promoting the header row, I couldn't declare data types.  Why?  Because the column names get hard coded, meaning that the code would trigger an error when it couldn't find 2015 or 2018 in the data sets.

Fixing these on a static basis is easy, it's wanting it to be dynamic that is the issue.

Creating the Solution to Challenge 4

So how did I accomplish the goal?  I started with the workbook I posted in the forum, then took these steps.

As described in the original post, I edited the MakeMyHeaders query in order to do the work.  I then:

  • Demoted Headers (Home -> Use First Row as Headers -> Use Headers as First Row)
  • I then right clicked the "Changed Type" step in the Applied Steps area and renamed it to "AllData" (with no space)

image

This basically gives me an easy-to-come-back-to point of reference for later.  And since I did not include a space, it's super easy to type.  (If I left a space in there, it would be #"All Data" instead.)

Next, I needed to create my header row which involved:

  • Keeping the top 2 rows only (Home -> Keep Top Rows -> 2)
  • Right clicking and renaming the step "HeaderBase"

image

I then replaced the department name (Admin) with "Name". The trick was to make this dynamic, which involved a couple of steps.

  • Right click Column1 and do a replacement as follows:

image

  • Then, in the formula bar, remove the " characters around both HeaderBase[Column1]{1} and HeaderBase[Column1]{0}

SNAGHTML23bb1ac0

So what was that all about?  It's replacing the value in the 2nd row - row {1} - with the value from the first row - row {0}.  And it's completely dynamic!  (For reference, the reason I reduced this to only 2 rows was that if I tried this when all rows were showing, the departments on the data rows would be lost.)

I then went and removed the Top 1 row, leaving me with this:

image

So far so good.  Now I just needed to add back the original data.  To do that:

  • I went to Home -> Append and appended the MakeMyHeaders query (yes, I appended it to itself)
  • I then modified the formula from:

= Table.Combine({#"Removed Top Rows", #"Removed Top Rows"})

  • To

= Table.Combine({#"Removed Top Rows", AllData})

Which left this:

image

The final cleanup took a few more steps:

  • Promote First Row to Headers
  • Delete the automatically created Changed Type step (so we don't lock down the years in the code)
  • Remove Top Rows -> Top 2 Rows (to get ride of the rows I used to create the Header)

And we're done!

image

Proving that the Solution to Challenge 4 works

Naturally, loading this query to a table will show that it works.  But to really prove it out:

  • Edit the MakeMyHeaders query
  • Select the Source step
  • Change the formula to =Sales
  • Go to Home -> Close & Load

You'll see that it updates nicely with the updated headers

Why the solution to Challenge 4 even matters

If you ever decide to combine Excel worksheets, and want to hold on to the worksheet name as well as the data, you'll need this technique!

Power Query Challenge 4

It's time for Power Query Challenge 4!  This one is a tricky little challenge with creating a header row - but from different rows in the data set.

The real world scenario driving Power Query Challenge 4

Have you ever tried to combine Excel files in a folder, and wanted to preserve the worksheet name along with the contents?  If you have, you'll end up looking at data that follows this kind of pattern:

image

Notice that in step 1 we have the sheet name and a table with the contents.  And step 2 shows what happens when we expand all columns from the table.  So what's the issue?

This is the crux of Power Query Challenge 4… we need a header row that looks like this:

SNAGHTML236b9b5f

Easy right?  Not so fast!

The value in the Name column will change for each file in the folder.  In addition, the data in the columns may also have different names.  So you can't hard code anything here…

Sample data for Power Query Challenge 4

Let's be honest, this isn't simple or it wouldn't be a challenge, but we're going to try and keep it simpler by focussing on just the header row issue. (We're going to skip the whole combine files stuff, and just use some pre-formatted Excel tables that exhibit the problem.)

To build and test your solution I'm providing a file with two different data tables (Admin and Sales) and a query called "MakeMyHeaders" that just refers to the Admin data set right now:

image

The data in the Admin table looks like this:

image

And Sales looks like this:

image

Solving Power Query Challenge 4

To solve this challenge, you should work in the MakeMyHeaders query, and convert the data so that it outputs the data shown here:

image

And, if you did it correctly, you should then be able to edit the MakeMyHeaders query, select the Source step and change the formula to =Sales.  After loading, you should get the output shown here (without any errors):

image

Simple right?  Smile

Posting Your Solution

As mentioned in The Future of Power Query Challenges, we are no longer accepting submissions by email.  And don't post your solution here either.  (Comments are still welcome!)  Instead, we are collecting answers on the Power Query Challenge 4 thread in our forum.  Post your solution there, provide a short description of the approach you took, and have a look at other's submission there.

My solution will come out as a blog post tomorrow.

The Future of Power Query Challenges

Whether by email or in person at conferences, I've received a lot of requests for more Power Query Challenges.  You all seem to love testing your skills, and I really want to keep the Power Query Challenges going.  In order to do so, however, I need to evolve the format a bit to make it sustainable.

As awesome as the Power Query Challenges are, I was a bit unprepared for how much you all would love them, and how much time it would take out of my schedule.  And I'll be honest, that's why you haven't see any more so far - I just haven't had the cycles to do them.  Does that sound odd considering you're creating the solutions?

Why do the Power Query Challenges need to evolve?

The reality is that it takes me a good few hours to come up with the data set, come up with my own solution, then write up and publish both the challenge and solution posts that I do.  So that in its own is a bit of a time eater, but one that I'm prepared to do as sharing knowledge is part of what we do here at Excelguru.

The real killer though, was my original intent to read and review all of the submissions.  That part has already become unmanageable for me. Each of the first three Power Query Challenges has seen an increase in participation, with over 45 solutions provided by you all last time.  If you consider that it takes about 5 minutes just to open and review each submission, that's a good half day of work.  (And that's if they're easy to understand.)  Then try to group and write them up… I'm into this for probably another day.  I just can't put that much time into a non-chargeable service, as I need to earn funds to pay the awesome team that helps me do what I'm doing.

So the question I've been wrestling with is: how can I keep providing these challenges to you and still sustain the time I need to run my business?  So here's my thoughts…

What's the plan for new Power Query Challenges?

I still plan to curate and write up the challenge.  I still plan to write up my solution.  But rather than have you all email us with your solutions, I'm going to set up a post in the Excelguru Power Query forum with the original data.  If you want to play along, you can download the workbook there, and even upload your solution to that thread, with a short comment of how you approached the task - or what you think is special about your solution.

I see a number of benefits here:

  1. You can get your solution out there much more quickly than waiting for us to curate and post them.
  2. Solutions posted by others are available to you immediately upon submission (rather than waiting for use to share a link later).
  3. You can review and comment on the solutions posted by others (give kudos, ask questions, etc.).
  4. With a brief description, you can focus on the ones you feel are the most interesting to you.
  5. As it is a forum, it actually allows you a place to post challenges of your own, too.

Basically, it allows the challenge process to be a lot more "self-service", and isn't that what things are all about today?

I'm hoping that this might allow more interaction between the entire community around these challenges, without me becoming a road block to the process.  Do I anticipate fewer responses?  Yes, I do, only because people will read them before attempting it on their own, or may not share theirs if they see a similar solution.  I'm not sure what we can do about that, or if it's even an issue.

Your Thoughts?

I'm curious as to your thoughts on this approach. Will it work for you?  Do you have concerns about it?  Does it make it less attractive?  Please feel free to leave your thoughts in the comments.

Power Query Challenge 3 Results

Wow… 46 submissions for Power Query Challenge 3!  Crazy stuff. So many that I haven't even had time to read them all yet!

Due to the overwhelming responses, here's how I'm going to handle this:

  1. Show you how I approached this challenge
  2. Call out a couple of the submissions that I thought were cool and/or interesting
  3. Leave it to you to throw a note in the comments if you think that your (or someone else's) submission should have been listed.

You can find access to all of the submitted solutions (including mine) stored on my OneDrive account.

My solution to Power Query Challenge 3

Step 1 of Challenge 3: Assigning scores to letters

The most important component to Challenge 3 is to have a table that assigns a value to each letter, and there are a couple of ways you could do this.  I elected to do this via the following method:

  • Create a new blank query and name it LetterValue
  • Enter the following formula in the formula bar:
    • = {"A".."Z"}

This creates a nice list of letters from capital A through capital Z:

SNAGHTML99b7af

As great as this is, I still need to add values to the numbers (1 for A, 2 for B, etc..).  It's easy to do with an Index column, but the problem is that the button to do so is not available when you're working with a list.  No problem though, we just convert it to a table first:

  • Go to List Tools --> Transform --> To Table --> OK
  • Go to Add Column --> Index Column --> From 1
  • Rename the columns to Letter and Value
  • Set the Data Types to Text and Value

And that's it.  The query (as shown below) can now be loaded as a connection only for later use:

SNAGHTML9d4b74

Other options for Step 1 of Challenge 3

To be fair, I'm pretty comfortable knocking out a quick list, as I do it all the time for Calendar tables.  But as Bill Syzyz pointed out, you get bonus points for this being 100% UI driven.  Could I have done that?  Of course!  Instead of creating the list, you would:

  • Create an Excel table that holds the letters and values
  • Pull it into Power Query
  • Right click the Letter column --> Transform --> Upper Case
  • Right click the Letter column --> Remove Duplicates
  • Load it as connection only

Is it easier?  Maybe.  It it more robust?  It could be more robust if you also want to score other characters.

Step 2 of Challenge 3: Scoring Words

So now we get into the money part of Challenge 3 - scoring the entire word.  Let's assume that we have a nice little Excel table which stores all the words in a column called "Word" like this:

image

(To be fair, the data could come from a database or anywhere else, the source is really incidental to the problem.)

To score these words we can

  • Pull the data into Power Query
  • Right click the [Word] column --> Duplicate
  • Right click the [Word - Copy] column --> Transform --> UPPERCASE

This leaves us here:

image

Now the trick…(psst… I have a cool pattern card for this… watch this space in the next few days for news!)

  • Right click [Word - Copy] --> Split Column --> By Number of Characters
    • Choose to split by 1 character, repeatedly
    • From the Advanced Options, choose Rows

Your output should now look like this:

SNAGHTMLaa6bfd

Next:

  • Go to Home --> Merge Queries
  • Choose LetterValue and merge [Word - Copy] against [Letter]
  • Expand only the [Value] column from the [LetterValue] column (by clicking the expand icon at the top right of the column)

SNAGHTMLad0db2

The data is out of order now, but it doesn't matter.  It's time to put the finishing touches on Challenge 3…

  • To to Transform --> Group By
  • Configure a basic grouping like this:
    • Group by Word
    • Create a Score column that SUMs the [Value] column

image

And you're done!

image

You can now throw any value into the Excel table, hit refresh, and it will score the words…

image

Well how about that…

And our upcoming Power Query Recipe cards look like they could be way better than hard work too!  (Stay tuned for their release next week!)

Your Challenge 3 Submissions

With 46 submissions for Power Query Challenge 3, it was impossible to go through them all.  I'm hoping that you'll consider being part of the community of reviewers here and check out a few random ones, then post in the comments any that you think I should have mentioned.  I can tell you that in the first four submissions alone there are three different versions of custom functions, and a UI driven approach.

But there are a couple of submissions I looked at that I did want to highlight, as they brought more than just a query to the game.  Smile

Bart Titulaer

Bart's solution includes not only his Power Query work (which he tried more than one way), but he also thought it might be fun to include some frequency distributions with the solution:

image

It's a good reminder that the end goal of Power Query isn't just to clean up data, it's to actually use it.  (Something I probably don't cover enough on this blog!)

Kolyu Minevski

Kolyu decided to compare scoring English vs Bulgarian, and even summed it up for me with a note:

image

Nice to know since I'll be back in Sofia at Bulgaria Excel days on November 1!