Using PowerQuery with Email

At our last MVP summit, I sat in on a session on PowerQuery. Our presenter, who is pretty passionate about it, at one point asked us “Who has wanted to use PowerQuery on their email inbox?” And honestly, I have to admit, I was kind of “meh” to the idea. After all, email is a distraction, and not something I generally like to go back and study. Why would I care how many emails someone sent me, or what time they do? Not interesting.

But oh wait… after fooling around with it when I was bored one day, I came across something pretty damn cool. PowerQuery can do much more than just count emails… it can parse the contents! (Okay, that’s uber-geek for “it can read and examine the contents of the email.”) Why is that important? It’s because PowerQuery is all about sourcing and reshaping data.

We can build this…

Check this out… this chart is created, in Excel, showing the affiliate income that I’ve earned from selling Charley Kyd’s dashboard kits from Exceluser.com.

image

Yeah, okay, it’s not a huge amount of money, but that’s not the point.

… From This…

The point is this:  Charley sends an automated email each time there is a sale, of which the contents are always in the same structure:

image

And here’s how…

Step 1: Create a folder and a rule in Outlook

The first thing I did was set up a rule in Outlook to funnel all emails like the one above into the “Affiliates’'\Exceluser.com” folder.  While it’s not entirely necessary – I could target the inbox itself – this makes the updates run a bit faster as they don’t have to examine ALL of my email.  It’s also good housekeeping.  That folder will now be the ENTIRE source for the chart above.  Even better, I didn’t even have to read the emails myself!

Step 2: Open Excel and create a new Power Query

To do this, you must be running the December 2013 update at a minimum.  (Earlier versions don’t support Exchange as a data source.)  To set it up:

POWER QUERY –> From Other Sources –> From Microsoft Exchange

If you haven’t used an Exchange connection, you’ll be prompted for your credentials, which it will then save for you.  Unfortunately only one Exchange account is supported at a time right now, so if you need to change it go to POWER QUERY –> Data Source Settings to modify your credentials.

Now, for me it pulled up a Navigator task pane on the right with a list of Mail, Calendar, People, Tasks and Meeting Requests.  I personally wish it would take you into Power Query, as this is just one more step to slow me down.  I have to select Mail, then click Edit Query (at the bottom) to go and start building my query.  Once I do so, then I see this:

image

Wow… that’s a lot of columns, but not really where I want to be.  Let’s filter this list down a bit…

I clicked the dropdown arrow beside “Folder Path”, and then clicked the “Load More” button, since there is little in the list:

SNAGHTML38a48c

Perfect, now I uncheck Select All, check \Affiliates\Exceluser.com\ and I’m set… my list will filter to only those emails from Exceluser.com.

Step 3: Removing Irrelevant Columns

There’s a lot of them, so I’m going to get rid of them.  I select and remove each of the following:

"Subject", "Sender", "DisplayTo", "DisplayCc", "ToRecipients", "CcRecipients", "BccRecipients", "DateTimeReceived", "Importance", "Categories", "IsRead", "HasAttachments", "Attachments", "Preview", "Attributes", "Id"

That leaves me with 3 columns in all:  “Folder Path”,”DateTimeSent” and “Body”:

SNAGHTML3d45e9

Step 4: Expand the Email Body

Now here’s where cool starts to get visible… the Body field contains a bunch of entries called “Record”, which are shown in green.  That means they are clickable and, if you do click it, will show you the entire body of the selected email.  But even better is that I can expand those into the table itself.  That funny little double headed arrow next to “Body” does just that.  When I click it, I can see all the items in the record which, in this case includes both the text and html bodies of the email:

image

I’d rather play with plain text than HTML, so I cleared the HTMLBody flag and click OK, resulting in the following:

image

So the body has expanded, and if I click on a specific record it shows the entire body in the window at the bottom.  Very cool, and I can see that this email shows earnings of $12.91.  Even better, as I click through the other records, Charley’s emails are identical, with only the value changing.  This is VERY good.

Step 5: Strip out the Value

Now the above is good, but it contains a bunch of hard returns and stuff.  I want to get rid of those.  So I select the Body.TextBody.1 column then go to Transform—>Text Transforms—>Clean.  That removes all the hard returns, resulting in this:

SNAGHTML46cabd

Now I’m going to do a find and replace to remove all characters before the value.  Since it’s always the same, I got to Replace values and I can replace "Hi Ken Puls,This email is to notify you that you have made a sale.You have earned $" with nothing.

The column will now start with the value, but I need to remove everything after the value.  Easiest way for me was to:

  • Replace " in commission for this sale." (notice the leading space) with a pipe: "|"
  • Choose Split Column –> By Delimiter –> Custom –> |

I then made sure that the Body.TextBody.1.1 column was selected and choose to set the Data Type to Number.

Cool, I’ve got my values now!

Step 6: Final cleanup

Finally, I did a bit of cleanup…

  • I removed the Body.TextBody.1.2 column as well as the Folder Path columns, since I don’t need either any more
  • I renamed the Body.TextBody.1.1 column to Commission
  • I renamed the table (on the right) to Commissions

And here it is:

SNAGHTML512197

And finally I clicked OK to load the table to the worksheet.

Step 7: Create the Report

And this part is probably the easiest… I created a simple Pivot:

  • DateTimeSent on rows
  • Group dates by Months and Years
  • Years on Columns
  • Commissions on Values

And then the PivotChart where I removed the Field Buttons, vertical axis and gridlines, and added data labels.  Pretty straight forward if you’re used to working with PivotTables or charts at all.

Final Thoughts

This tool is truly awesome.  All I need to do to get an update on my affiliate status is to open the file and refresh the data.  It will reach into my email folder and do the rest.  Simply amazing, really.

And while the setup may look like a lot of steps, this file took me less than 30 minutes to build, from setting up the rule in Outlook to pulling the data into PowerQuery through cleaning it and reporting on it.  Granted, I’ve been playing around with similar data lately, but still, it was quick from start to finish.

So… “Meh, it’s just email”?  Not any more.  🙂