Using PowerQuery with Email

At our last MVP summit, I sat in on a session on PowerQuery. Our presenter, who is pretty passionate about it, at one point asked us “Who has wanted to use PowerQuery on their email inbox?” And honestly, I have to admit, I was kind of “meh” to the idea. After all, email is a distraction, and not something I generally like to go back and study. Why would I care how many emails someone sent me, or what time they do? Not interesting.

But oh wait… after fooling around with it when I was bored one day, I came across something pretty damn cool. PowerQuery can do much more than just count emails… it can parse the contents! (Okay, that’s uber-geek for “it can read and examine the contents of the email.”) Why is that important? It’s because PowerQuery is all about sourcing and reshaping data.

We can build this…

Check this out… this chart is created, in Excel, showing the affiliate income that I’ve earned from selling Charley Kyd’s dashboard kits from Exceluser.com.

image

Yeah, okay, it’s not a huge amount of money, but that’s not the point.

… From This…

The point is this:  Charley sends an automated email each time there is a sale, of which the contents are always in the same structure:

image

And here’s how…

Step 1: Create a folder and a rule in Outlook

The first thing I did was set up a rule in Outlook to funnel all emails like the one above into the “Affiliates’'\Exceluser.com” folder.  While it’s not entirely necessary – I could target the inbox itself – this makes the updates run a bit faster as they don’t have to examine ALL of my email.  It’s also good housekeeping.  That folder will now be the ENTIRE source for the chart above.  Even better, I didn’t even have to read the emails myself!

Step 2: Open Excel and create a new Power Query

To do this, you must be running the December 2013 update at a minimum.  (Earlier versions don’t support Exchange as a data source.)  To set it up:

POWER QUERY –> From Other Sources –> From Microsoft Exchange

If you haven’t used an Exchange connection, you’ll be prompted for your credentials, which it will then save for you.  Unfortunately only one Exchange account is supported at a time right now, so if you need to change it go to POWER QUERY –> Data Source Settings to modify your credentials.

Now, for me it pulled up a Navigator task pane on the right with a list of Mail, Calendar, People, Tasks and Meeting Requests.  I personally wish it would take you into Power Query, as this is just one more step to slow me down.  I have to select Mail, then click Edit Query (at the bottom) to go and start building my query.  Once I do so, then I see this:

image

Wow… that’s a lot of columns, but not really where I want to be.  Let’s filter this list down a bit…

I clicked the dropdown arrow beside “Folder Path”, and then clicked the “Load More” button, since there is little in the list:

SNAGHTML38a48c

Perfect, now I uncheck Select All, check \Affiliates\Exceluser.com\ and I’m set… my list will filter to only those emails from Exceluser.com.

Step 3: Removing Irrelevant Columns

There’s a lot of them, so I’m going to get rid of them.  I select and remove each of the following:

"Subject", "Sender", "DisplayTo", "DisplayCc", "ToRecipients", "CcRecipients", "BccRecipients", "DateTimeReceived", "Importance", "Categories", "IsRead", "HasAttachments", "Attachments", "Preview", "Attributes", "Id"

That leaves me with 3 columns in all:  “Folder Path”,”DateTimeSent” and “Body”:

SNAGHTML3d45e9

Step 4: Expand the Email Body

Now here’s where cool starts to get visible… the Body field contains a bunch of entries called “Record”, which are shown in green.  That means they are clickable and, if you do click it, will show you the entire body of the selected email.  But even better is that I can expand those into the table itself.  That funny little double headed arrow next to “Body” does just that.  When I click it, I can see all the items in the record which, in this case includes both the text and html bodies of the email:

image

I’d rather play with plain text than HTML, so I cleared the HTMLBody flag and click OK, resulting in the following:

image

So the body has expanded, and if I click on a specific record it shows the entire body in the window at the bottom.  Very cool, and I can see that this email shows earnings of $12.91.  Even better, as I click through the other records, Charley’s emails are identical, with only the value changing.  This is VERY good.

Step 5: Strip out the Value

Now the above is good, but it contains a bunch of hard returns and stuff.  I want to get rid of those.  So I select the Body.TextBody.1 column then go to Transform—>Text Transforms—>Clean.  That removes all the hard returns, resulting in this:

SNAGHTML46cabd

Now I’m going to do a find and replace to remove all characters before the value.  Since it’s always the same, I got to Replace values and I can replace "Hi Ken Puls,This email is to notify you that you have made a sale.You have earned $" with nothing.

The column will now start with the value, but I need to remove everything after the value.  Easiest way for me was to:

  • Replace " in commission for this sale." (notice the leading space) with a pipe: "|"
  • Choose Split Column –> By Delimiter –> Custom –> |

I then made sure that the Body.TextBody.1.1 column was selected and choose to set the Data Type to Number.

Cool, I’ve got my values now!

Step 6: Final cleanup

Finally, I did a bit of cleanup…

  • I removed the Body.TextBody.1.2 column as well as the Folder Path columns, since I don’t need either any more
  • I renamed the Body.TextBody.1.1 column to Commission
  • I renamed the table (on the right) to Commissions

And here it is:

SNAGHTML512197

And finally I clicked OK to load the table to the worksheet.

Step 7: Create the Report

And this part is probably the easiest… I created a simple Pivot:

  • DateTimeSent on rows
  • Group dates by Months and Years
  • Years on Columns
  • Commissions on Values

And then the PivotChart where I removed the Field Buttons, vertical axis and gridlines, and added data labels.  Pretty straight forward if you’re used to working with PivotTables or charts at all.

Final Thoughts

This tool is truly awesome.  All I need to do to get an update on my affiliate status is to open the file and refresh the data.  It will reach into my email folder and do the rest.  Simply amazing, really.

And while the setup may look like a lot of steps, this file took me less than 30 minutes to build, from setting up the rule in Outlook to pulling the data into PowerQuery through cleaning it and reporting on it.  Granted, I’ve been playing around with similar data lately, but still, it was quick from start to finish.

So… “Meh, it’s just email”?  Not any more.  🙂

31 thoughts on “Using PowerQuery with Email

  1. Pingback: Power Query for Excel – on the rise | Microsoft Excel and Access Experts Blog

  2. Hey Ken - great post. Glad to know that you found value in the Exchange connectivity in Power Query.

    Looking forward to more posts about your experience with Power Query.

  3. Is there a way to deal with the email attachments from within PowerQuery? E.g. to load the content of uniformed attachments into the same table.

  4. Yes, there is. When you connect to PowerQuery and pull in the email, you get an "attachments' column. Providing the files are uniform (no necessarily named the same), and the data is in a structure that PQ can interpret, you should be able to expand them to get the content.

  5. Great post. I found power query is very helpful specially when MSFT added more data source.

    I am working on POC to create some reports pulling a mail box from the Exchange, I was able to get all the columns in the power query sheet.

    The requirement of the POC is a bit challenging. There are about 15 custom attributes were added to the mail item in Exchange. I don't see the custom attributes in the attribute column in the query sheet. I am able to see the out of the box attributes i.e. ConversationIndex, ConversationTopic, etc...

    Please suggest.

    Thanks,
    Michael Hanna

  6. Great !!!

    Thank's to your post I could done a great imporvment for my customers.

    I've created some "team sites" for my customers within Office 365, to provide some dashboards and data analysis. But the IT skills are some time really poor and the way to share data was not so easy.

    So far I've also created a "team site mailbox" with a specific mail adress, so that my customers can forward their data through emails with Excel attchements. Then I've created a POWER QUERY connexion to that mailbox to directly read the data from attachments. So I can refresh the dashboards with the new data directly forwarded by mail.

    The method is far more easy for my customers, and less time consuming for me !

    Regards.

  7. Any ideas how to troubleshoot why my PowerQuery is not reading all the records in my inbox. I have over 500 records in my inbox (its new) but my query only returns 130 records. This is really cool, but would be better if it were complete.

  8. Mike, just to be clear, are you talking about the output result, or what you see in the Power Query window? The PQ window is simply a preview of your data, but the end result should comb through it all...

  9. Ken,
    The connection to Exchange is useful, however, I am also interested in connecting to data in Outlook Data Files, which do not seem to be available from the PQ Exchange connection. In the past I have pulled this data into Excel using VBA, connecting directly to the Oulook App and the data files (not limited to Inbox, etc). Do you know if this is possible using Power Query?

    How about connecting to "additional mailboxes" associated with exchange via the Advanced Options dialogue in Outlook?

  10. Hey Alex,

    Good questions there. Not sure on the Data Files to be honest, but I thought you could connect to different mailboxes now... May be wrong there as well though.

  11. Hi Ken,

    I'm using the Exchange connection to the Calendar of Outlook. I have successfully done it and removed and filtered columns.

    The problem that I'm having is that it is not refreshing. For example, after analyzing the meetings that I had on my calendar, I went back and deleted some of them, but PowerQuery doesn't reflect that change after clicking all the refresh buttons.

    Any idea?

    Im using Excel 2010. Power Query Version: 2.23.4035.242

    Kind regards,

    Fernando

  12. Hi Ken,

    I was able to find a solution. The problem seems to be the order on which I was filtering the columns. For example, I was filter by date "2016" and then by the columns which has the DisplayTo by a name. For some reason, it didn't update. But when I changed by first filtering by the "DisplayTo" column and then the date, it worked fine.

    I don't quite understand why it would behave like that.

    Regards,
    Fernando

  13. Sorry Fernando, I just realized I never answered this. Honestly, my answer would have been to step through the query and see if you'd accidentally filtered something out. I'm not sure why this change would have affected it either, but I'm glad you got it sorted!

  14. hi there, thanks for this blog post, do you know if you can pull from a group address?

  15. Hi Vanessa,

    If you have the credentials (email & password), then yes, it shouldn't be an issue. Having said this, we can only provide one set of credentials at a time, so if you also want to build intelligence against your own email account at the same time, then this would be an issue.

  16. Is there an easy way to schedule it to run at a given time with a given frequency? E.g. pull incremental data every night at 11:59PM.

  17. Hey Guys,

    how can I get all appointments from all of my organizational Exchange calendars? I want to create a Dashboard to monitor all Events from all calendars - organization wide. Is there any "best practise" to collect all calendars from everyone? Thank you!

  18. Hi Ken,
    Thanks for yet another great post! I'm experimenting with connecting PQ to generic mailboxes. So far so good but I'm wondering if there is a way to identify who was the "real" sender of an email instead of having generic mailbox address as a sender?

  19. Hi Ken, thanks for yet another great post! I'm experimenting with connecting Power Query to generic mailboxes. So far so good but I am wondering if it is possible to extract the 'real' sender of a message rather then getting generic mailbox address for all sent items.

  20. I'm not sure I follow. If you're using a general mailbox, isn't the email going from that email? The only way I can think of to extract the real sender would be to try and parse the signature lines or something.

  21. Hi Ken. Brilliant. One thing do you have any idea where the Flag column is as can't find it anywhere. Found absolutely everything else!

    Mike

  22. Hi Mike, maybe try looking for a "follow up" or something similar? (I can't connect to my exchange mailbox at the moment to test.)

  23. Not that I'm aware of. If it doesn't show up in the meta data of email addresses that you see when querying the inbox, then you're out of luck.

  24. Hi Ken,
    Your suggest query is good for plain text in text body. However if it contains a table, a attachment, all information extract from becomes a list of data. How can I extract this table properly?

    Item# Our Job # Contact Phone
    1 5005355917 Mr Yeung 85263344233
    It looks like
    Item#
    Our Job #
    Contact
    Phone
    1
    5005355917
    Mr Yeung
    85263344233

    Thanks,
    WaiChung

Leave a Reply

Your email address will not be published. Required fields are marked *