Power Query “Trusted” Locations

My last two blog posts detailed some frustration that I ran into when working with Power Query.  First the issue that I couldn’t trust folders, but got stuck trusting files, and then the issue where I overloaded my credentials area.  I’ve actually got a solution, in a way, for both.

One thing to be aware of here… there are actually two distinct entities that show up in the Data Source Settings dialog: Security Credentials and Data Privacy Settings.  You can see this in the following picture:

image

Notice how Exchange and the Database don’t show a privacy level on the right?  Those are security credentials.  Interestingly here, you can see two entries for the same thing (like a website), where one is the security credential, the other the privacy setting.

Ok, with that out of the way, let’s get on to the solutions…

Clearing out Security Credentials

In my last post I had over 700 security credential entries for ip-api.com that needed to be cleared.  To clear them manually took clicking the item, clicking delete and clicking ok before moving on to the next item.  Even if I could do one click per second (I find this UI slow and sometimes I missed the target), it would still take me 35 minutes.  I really wanted a “remove all” feature.  Sure I’ll lose Exchange, sure I’ll lose my database, but so what.  Small pain vs spending 30-60 minutes clicking Delete.

So I coded something to do it.  The download link is at the end of the post, but one of the features of the tool is to let you clear out your entire list of security credentials with on click:

image

And voila!

image

Now, are you going to use this every day?  Hell no!  But if you screw up like I did, you just may need a weapon like this in your arsenal.  :)

“Trusted” Folders/Files and URLs

As I mentioned in my last post, Ehren, a developer on the Power Query team messaged me on Twitter to tell me of one solution to the “Trusted Folder” problem.

If you set the privacy level for the folder, it applies to everything within that folder, including files in subfolders

Now that sounded cool.  So I set out to test it out.  First thing I did was went to add a new Folder to my Data Source Settings dialog and… there’s no option to do that. Hmm… okay… so how?

Well, you could go and build a solution that references a folder, just so that you can get a folder in there, I suppose.  But that’s awkward and contrived really.  You have to waste a bunch of time concocting something you hope will work.  And the worst part to me is that I don’t just have to do it on my system, I somehow have to deploy it to other people as well.  Ugh.

Screw it, I’m building a tool.  Here’s the features of what we’ve got:

Clear out all privacy levels

To be fair, this was probably more useful for me while testing, but I included it in case you run into the same issues.  Basically you click this button:

image

And now all the Data Privacy Settings are gone too:

image

Add Folder

So this one is way more useful to solving my issues.  I’m going to click “Add Folder” on the left.  It will let me browse for a folder:

image

And prompt me for the Privacy Level:

image

And once I click OK, it sets it up as a folder in the Data Source Privacy settings.  And yes, it really does trust all files in all subfolders:

image

Add Files

Same thing really, it just allows you to select a specific file.  This is a bit redundant to just declaring them as you go along but hey, once I already wrote the code for the folder this was a snap to add:

image

Add URLs

I love this one.  Easy to use, just provide the URL and the privacy level:

image

And boom!  There you are:

image

And again, this trusts all sub-sites of the main domain.  This was particularly key for me as I’m querying data from this site and have to provide a different URL for each year.

Now, again, on this one, the first time you query the site you’ll get another entry for each URL you touch:

image

This is because you have declared the privacy level, but not the security (unfortunately I can’t get in to modify the security files, for obvious reasons.)  So here’s what I’m going to do to fix this:

  • Select the first URL in the list
  • Click Edit Credential
  • Change the setting to apply to the root domain:

image

  • Click Save

You’ll notice that the first one disappears now.  What actually happened was that it merged the security credentials with the privacy level in one entry.  Cool.  So now I’ll just go back and delete the other two. And it never asks me again on refresh.  :)

The “Big Red Button”

So when it all goes really wrong, and you want to reset Power Query to a default state, what then?  That’s what this button is for.  It will wipe out all of your security credentials, data privacy settings, reset your Power Query formula bar and more.  It’s like a total factory reset of Power Query.

image

Some More Technical Stuff

I’ve now had this installed on Office 2010, Office 2013 Professional Plus and Office 365 Pro Plus versions of Excel, both 32 and 64 bit.  In addition, it’s been installed on Windows 7 Pro and Windows 8.1 systems as well.

Interesting enough, despite being an .MSI installer file, in my tests it has NOT required admin privileges to install.  (On one machine I installed this first without prompting, then got prompted for admin credentials when I went to update Power Query.)

Some Final Observations

I found this project pretty interesting, and it’s given me some ideas for some more useful tools to work with Power Query and other Power BI add-ins.  While I can’t promise a timeline on delivery, I do plan on adding a bunch of new useful stuff to this add-in and releasing a Pro Tools version at some point.

Before you download and install this, I also want to make something very clear.  THIS IS BETA SOFTWARE.  I’m pretty sure it’s stable, and shouldn’t affect anything else, BUT YOU DOWNLOAD AND INSTALL AT YOUR OWN RISK.

The Installer

You can download the installer from this link.

Pain Points – Power Query Credential Management

After my last post on Power Query Security Woes, Ehren – a developer on the Power Query team – sent me a message on Twitter about it.  I will blog about that later, after I’ve had a chance to test it out, but before I did, I thought I’d go and clean up some PQ stuff that I messed up.

A little history…

A while back I got the bright idea to pull down our web leads (at my day job).  I then though, “Hey, wouldn’t it be cool if I could plot all of them on Power Map?”  Sure it would.  But all I had was their IP address.  “No problem!”, I figured, “I’ll just use Power Query to feed out the IP to a web service and return a function to turn it into a real address!”

Success!

I cooked up a little script fed the IP to http://ip-api.com, which did exactly that. Here’s a sample from a random address:

image

I actually tried several services before I settled on this one, but ultimately  it seemed to generate the most accurate results.  Perfect, so I let the script fly, and it was awesome!

Success turns to failure…

Except… somewhere during my batch of 3,500 addresses it stopped working.  And when I went back to the site, I found out that I’d been blocked.  Ooops.  Apparently if you send over a certain threshold of queries in a certain amount of time you’re not a good person. (That’s why I’m not sharing the script.)  Sorry about that ip-api.com, I promise I’ll be good in future!

… and creates a mess in the process…

At any rate, the issue I then ran into is that I ended up with an entry for each URL in my Data Source Settings.  I was a while back, so I don’t remember if it happened by default, or if it happened because I was learning and just clicked the wrong thing.  End result is that I have an entry in my list for each IP I queried.

image

Since I’m going to be a good boy and not steamroll their site any more, I kind of want to clean them up.  No problem, right?  Click on the first address on screen, hold down SHIFT and click on the last address on screen and… huh?  I’ve only got the last address?  Okay, let’s try with CTRL… click, click.. ARGH!!

… and the mess turns into major frustration!

This interface only allows you to select one item at a time?  Really?  So to delete my 200 entries I have to click the line item, click Delete (at the top), click “Yes I’m sure” and then move on to the next one.  Ouch.  I see a lot of wasted time in my future if I really want to clean up now.  :(

The missing feature(s)

Power Query team, (if you’re still listening,) please give us the ability to multi-select in this dialog, and multi-delete items.  We need to be able to clean up, as we’re all human and make mistakes as we go along.  This feels like a severe punishment right now!

In addition, while I’m here, why isn’t there an “Add” button in this interface?  When I go to edit one of these addresses, I see the following:

image

I’m guessing that if I’d just set that to the root at the beginning, I wouldn’t have an entry for every IP in that query.  Man!  If only I’d been able to declare this up front and realized that this was an option!  Yet there is no way to do this from a simple UI.  Instead, I (believe) I have to:

  • Create a query to one page at the domain
  • Set it’s security level as I pull data
  • Save the query
  • Go back to this interface and edit the anonymous access

Life would be SO much easier if I just had the ability to go into this UI and create my favourite and most used sites.  (And bonus points if I could push them out to users through group policy like I can with Office Trusted Locations!)

Okay, enough for now.  I’ll be back later once I’ve tested Ehren’s suggestion.  :)

Power Query Security Woes

Recently I’ve been working on automating a business process for a client.  It’s become a really interesting project that uses a mix of Excel tables, Power Query, Power Pivot and VBA, as well as a WinAutomation script.  Before I talk about the issue that is facing me, it would be helpful to provide a bit of background on what we’re doing, and why so many of the Power BI pieces are needed.

Solution Background

My client has a business in which they outsource employees to other companies.  Each of those companies maintains the records of hours, and lets my client download a spreadsheet version of the hours logged by the subcontracted employee.  This is fairly helpful, as my client is the one that pays all of the employees, so getting the hours lets him know who gets paid for how many hours.  Easy enough, right?  But each spreadsheet is in a different format, and that causes a lot of manual entry to try and standardize it into a file that can be uploaded to the payroll processor.

So here we enter Power Query.  With Power Query we can import each of the files individually, reformat the data into a consistent set of columns, and output it into a table.

Each pay period we start a new Excel file, and import the data files for that pay period by running a WinAutomation script.  The script logs in to the remote systems, passes the correct parameters to the system, and downloads the Excel and CSV files required.  It then saves them in a subdirectory of our application under the pay period end date.  My Power Query scripts then use the pay period end date, dynamically build a file path to the file and import the data.

It’s beautiful… except…

The Issue

Here’s where the pain begins…  Every time you touch a new data file in Power Query that you haven’t used before you get a security message asking you what type of data (Public, Private or Organizational) you are using.  (You can learn more about those here:  http://office.microsoft.com/en-us/excel-help/privacy-levels-HA104009800.aspx) The issue I have is that each payroll I create new files in a subdirectory, so Power Query sees them as unique.  To that end I can’t just trust the data files once and be done with it.

Now, there is a workaround… just enable the Fast Combine feature.  That kills off the warning and lets them go, but it also has an issue.  This setting is workbook specific, which means that when my VBA saves the master payroll control file under a new name (we preserve history this way) the setting doesn’t stay selected.  Grrr.  Given that there is no way to touch this setting via code, my user has to remember to click the Fast Combine button every time they run the update.  Is it minor?  Sure.  But it’s ridiculous, it gets forgotten, and concerns them when they get the permissions messages.

The Solution We Need

Now don’t get me wrong, I’m all for security, but where it makes sense.  In Office 2007 we got a new macro paradigm that allowed us to trust folders on our computers/networks.  This setting is set on an application level basis, and persists between Excel sessions.  Beautiful, as I can set it to a specific folder and forget it.  I throw trusted files in there, I don’t throw in the ones I don’t know.  It actually allows me to practice safer computing overall.

In my opinion, this setting is drastically lacking in Power Query.  I really need the ability to set my Power Query add-in to have trusted root folders with trusted subfolders.  This would allow me to trust my application’s data directory and not have to remember to click Fast Combine each time I create new files.  It also means that I might pay attention to the message when it does come up in future, as it would be unusual.  Currently I see the error so much I just cancel it and go straight to Fast Combine.  Hmm… kind of like how we set macro security to “Low” in Excel 2003 and earlier to bypass the annoying message, which essentially left us unprotected.

The Solution (Some of us) Can Implement

To be fair, I believe I do have a workaround for this now, but I don’t think it should be necessary.

Basically what I could do is use VBA to drive the refresh of my tables, and therefore the Power Query scripts.  Given that, I could use VBA to copy and replace the files in a central repository where the hierarchy does not change.  I.e. I could set up a folder like AppData\Timesheets\Current and use VBA to copy the required data files from my dated subfolders, replacing the ones in the “current” folder.  I would then target my Power Query scripts against the files in the “Current” folder, and build my solution off that.

Because Power Query holds a list of trusted files at an application level (rather than workbook level), these files should stay trusted even if I do create a new file, removing the need to constantly click the “Fast Combine” button.  So I think this should work.

So what’s the problem?  It only involves creating a VBA macro to do that.  Fine for me, as I know how to code with VBA and make this happen.  But for most of the Power Query target market I would suggest that this is probably outside of their normal skill set.

Final Thought

Even though it is technically possible to work around this issue, I still argue should not be necessary.  We need a proper option to trust the files in a local folder so that dynamically referring to a file in Power Query can be scripted without invoking a painful security paradigm.

Importing Dates in Power Query

A couple of weeks ago I was teaching a course on Power Query.  I distributed a bunch of sample files, started to walk the attendees through the steps, and a hand went up.

“I get an error in my date column…”

For a second it stumped me… I tested my examples 15 times before pushing them out to the class.  Why now?

As it turns out it’s that pesky issue that drives many non-North Americans crazy.  I keep my regional settings set to use the MM/DD/YYYY format.  I just find it makes life a lot easier for me.  What we ran up against here was a user who was using the Canadian standard of DD/MM/YYYY.  Yuck.

I promised a way to show how to fix it, and am finally getting around to posting it…

Replicating the Issue

The file I’m working with here is a csv file (saved in a txt format) that uses the DD/MM/YYYY format, as that replicates the issue for me.  So I import it and end up with this:

image

So what’s the problem?  Well, if you look at the areas I circled in red, the ChangedType step changed the data type for the TranDate column to Text.  That’s no good.  So I highlight the TranDate column and change it to Date:

image

Great.  An error.

So this is what I’m talking about.  The dates above have been converted to dates, but not the 1st of February, 2nd of February and so on.  Instead it’s January 2nd, February 2nd, etc.  As soon as it gets to the 13 it gags as there is no 13th month.

This is not an uncommon issue, by any means.  If you database exports in one format and your system settings are in another, you’ll run into this.  Multinationals that exchange files across borders see this all the time (just look in any forum for the amount of help posts on the topic.)

I REALLY wish that Power Query were smarter about this.  I also wish there was an easy way to just tell Power Query which date format your data (not your system) was using, or an easy way to convert it.  Anything to let me do something in one click versus what we need to do to fix this.  (As an added insult, the old text import wizard DID let you declare this.  Go figure!)

The Solution

All right, so how do we fix it?  I’m sure there are a variety of ways, and hopefully a new ribbon will come along to make this obsolete, but here is ONE way.

Step 1: Remove the garbage

First I deleted the “Changed Type” step of the query.

Step 2: Split by the / character

Next I selected the TranDate column, went to the Transform Tab (if you don’t have it download the latest Power Query update), chose Split Column, then By Delimiter, then Custom.  I put in a / and clicked OK.

As an aside here, I wish Power Query suggested what it thought you wanted to use as the delimiter, without actually clicking OK for you.  In this case I think it’s pretty obvious, even though it’s not in the list of defaults.  Granted, I might want to override it with something crazy like a 0, which is why I wouldn’t want it to just click OK for me, but it might save me time if it made an intelligent suggestion.

At any rate, I end up with this:

image

Step 3: Rename the columns

This part is important for the next piece.  Rename your columns to reflect which part of the date they belong to.  You can do this by right clicking the column header and choosing Rename…  (Alternately you can edit the M code in the formula bar manually, but you need to edit the “SplitColumnDelimiter” step first, then edit the ChangedType step to reflect the new column names.  Probably easier to just right click and rename them:

image

Step 4: Putting the date together… correctly

OK, now the tricky part.  We need to insert a column and write a Power Query formula.  To do this, go to the Insert tab, and click Insert Custom Column.  You’ll be presented with a dialog to write your formula.  And here’s the formula you need:

=#date([Year],[Month],[Day])

Sadly, there really isn’t a lot of documentation yet that explains this stuff.  I believe that the # sign tells PowerQuery that this is a data type, the “date” portion determines what type it is, and then we feed it the components that make it up (for each row).  I also changed the name at the top of the dialog to “Date” and clicked OK:

image

Step 5: Force the data type

I’m not sure why this is, but even after declaring this as a date, it formats the column visually as a date, but the data type is left blank.  For whatever reason, I had to select the column and force the data type to a date in order to make it recognize it properly as such.

Step 6: Cleanup

So this part is optional, but you can now delete the Day, Month and Year columns if you don’t need them any more.  Unlike standard Excel, this move won’t leave you with a bunch of #REF! errors down the Date column.  :)

Implications/Modification

The implications of this are somewhat interesting to me.  Let’s assume that your data goes the other way.  I send you a file in MM/DD/YYYY format, and you use DD/MM/YYYY format.  You should be able to follow all of the above steps, with the only difference being which column you identify as which when you rename them.  (In this case it would go Month Day Year after splitting them.)

I’d love to get some feedback from others on this, but I believe the formula in the custom column should work for you without modification as long as the columns are named correctly.

I’ve attached two files to this post for testing purposes.  Right click the links below and SaveAs to get copies to play with:

Try them both, you should be able to generate a correct date for your format from either using these steps.

PowerXL Course Live in Victoria, BC

We are very excited to announce that we will be hosting an “Introduction to Power Excel” session in Victoria, BC on June 6, 2014.

PowerPivot is revolutionizing the way that we look at data inside Microsoft Excel. Allowing us to link multiple tables together without a single VLOOKUP statement, it enables us to pull data together from different tables and databases where we never could before. But linking data from multiple sources, while powerful, only scratches the surface of the impact that it is making in the business intelligence landscape. Not only do we look at PowerPivot in this session, but we’ll also explore the incredible companion product Power Query; a tool that will surely blow your mind. Come join Ken as he walks you through the process of building a Business Intelligence system out of text files, databases and so much more.

Full details, including an early bird signup offer, can be found at http://www.excelguru.ca/forums/calendar.php?do=getinfo&e=42&day=2014-6-6

Activating Debugging Symbols with Add-In Express

Some time ago I published a blog post about an add-in that I’m building, and the reasons why I elected to start using Add-In Express to manage the process.  I still think Add-In Express is the best product for this kind of task, but figured I’d publish this post mainly to save me time the next time I get a new PC and have to go through this again.

The issue I ran into is that I need to set the CLR to work with .NET 3.5.  To do this you have to set the host file to v2.0x.  This is pretty easy, and Eugene has a good picture on how to do this here:  http://www.add-in-express.com/forum/read.php?FID=5&TID=7912

Following the steps that Eugene provided set me off to the races… with Excel 2010.  But when I tried to debug in Excel 2013 I wasn’t having much luck.  I changed the “Program to Start” to Excel 2013, set a breakpoint in my code, and started debugging.  Even though I’d set a breakpoint in my code, it was never activated.  Instead, the breakpoint goes from a red dot to a red circle with a white fill, and mouse-ing over it yields the message “The breakpoint will not currently be hit.  No symbols have been loaded for this document.”

SNAGHTML64bc16

This really sucks, as it’s pretty tough to work with.  So what’s different, and how come Excel 2010 works, but Excel 2013 doesn’t?

The action of modifying that host config file as described by Eugene actually creates a new file that is stored within the appropriate Office subfolder held within the Program Files folder.  Unfortunately, the program that is defined in your “Program to Start” settings seems to be ignored, and it actually creates the file in the folder that holds the oldest version of Office on your system.  So in my case it created the following file:

  • C:\Program Files\Microsoft Office\Office14\excel.exe.config

Now, I stand to be corrected on this, but I believe that this file is only used when launching the debugging tools from Visual Studio.  When the app is started it will look for this file and, if it finds it will use it to provides the debugging symbols back to Visual Studio for the .NET framework specified.

At any rate, the key here is to copy this file and paste it into the same folder that contains the exe file specified in the “Program to Start” area in the Project Properties of Visual Studio.  And once you do that:

SNAGHTML72dbee

Beautiful!  Works nicely.

It’s also worth mentioning that this setup works whether you are using a version of Office that comes from an MSI installer file (such as a DVD or volume license version of Excel/Office), or if you are using the Click to Run (C2R) version of Office that you can download from Office365.  The only thing you need to be concerned about is the file path in which to store the config file and find the Excel.exe executable:

  • 64bit MSI:  C:\Program Files\Microsoft Office\Office15
  • 32bit MSI:  C:\Program Files (x86)\Microsoft Office\Office15
  • 64bit C2R: C:\Program Files\Microsoft Office 15\root\office15
  • 32bit C2R: C:\Program Files (x86)\Microsoft Office 15\root\office15

I know this isn’t super Excel focussed, but hopefully it will help someone out there if they run into this error.

I should also throw a shout out to Andrei from Add-In Express who helped me sort this out last time I needed to figure this out.  The support there was awesome in helping me get it resolved.

Designing Software – How do YOU start?

Today I’m working on building a new reporting system at my day job.  Basically we’re looking at creating tools to budget and forecast results to be reviewed weekly, with a goal of being more on top of results so we can react better.  The stuff that every business really wants to do.

I’ve got the reports that we want at the end, and now I’m looking at them trying to figure out where to get the info from, how quickly it will need to be updated, where to source info that isn’t readily available and all that good stuff.  By the time we’re done I’m sure we’ll be hitting on Access, SQL Server, Excel, VBA, Power Pivot and Power Query, and possibly some other tools I haven’t thought of yet.

So at the end of the day, I’m building a system here.  Something to be used daily to capture, store and report on information.  Built of a collection of different pieces and technology, I’m essentially building my own software solution to a business problem.

I started with some sketching on paper, trying to illustrate the overall data collection/storage points and program flow.  Then I moved to Visio to try and draw it there.  (The pieces needed to move around as I added new dimensions and considerations.)  Then I got stuck on some of the considerations… some of them were big enough that they could change the way things were done.

Now I’m in Word trying to state – in English – the goals and non-goals of the overall project, so that I can get an overview on paper.

I’ve got a meeting this afternoon with a key player/driver to talk specifically about:

  • What the system should do
  • Key points that is absolutely will not be required to do
  • Key points that we want to make sure do not happen (different from above)
  • Who is going to use the system (each part)
  • How it’s going to be updated (each part)
  • How often updating will be needed for information drivers

I’m hoping with those stated that I can do two things:

  1. Get answers to the major design changing considerations I need before I build anything, and
  2. Generate a list that I can go back to later to ensure I haven’t overlooked or missed anything when I’m designing the system.

While I’m keenly aware that I’m not going to get everything covered, and that some design considerations will need to be made on the fly during development, I want to minimize the chance that any of those will major issues.  I hate coming up on a blocking point part way through the implementation that forces design re-work and retrofitting.

Now I’ve done this kind of things many times, but I don’t know that I’ve ever felt that I’ve settled into a piece of software that really works for me when trying to design this stuff.  I usually end up with Word to track goals, Visio for the flowcharts, but there’s a lot of iteration to get through it.

That’s the reason I’m reaching out here, is to see what others do in this area. I’m curious as to what tools/processes YOU use to scope and design a BI system or software.  Do you start with a goals document, do you start with flowcharts, or something else?  Are you happy with your process, or are there pain points and gotchas that you still get hit with?

Using PowerQuery with Email

At our last MVP summit, I sat in on a session on PowerQuery. Our presenter, who is pretty passionate about it, at one point asked us “Who has wanted to use PowerQuery on their email inbox?” And honestly, I have to admit, I was kind of “meh” to the idea. After all, email is a distraction, and not something I generally like to go back and study. Why would I care how many emails someone sent me, or what time they do? Not interesting.

But oh wait… after fooling around with it when I was bored one day, I came across something pretty damn cool. PowerQuery can do much more than just count emails… it can parse the contents! (Okay, that’s uber-geek for “it can read and examine the contents of the email.”) Why is that important? It’s because PowerQuery is all about sourcing and reshaping data.

We can build this…

Check this out… this chart is created, in Excel, showing the affiliate income that I’ve earned from selling Charley Kyd’s dashboard kits from Exceluser.com.

image

Yeah, okay, it’s not a huge amount of money, but that’s not the point.

… From This…

The point is this:  Charley sends an automated email each time there is a sale, of which the contents are always in the same structure:

image

And here’s how…

Step 1: Create a folder and a rule in Outlook

The first thing I did was set up a rule in Outlook to funnel all emails like the one above into the “Affiliates’’\Exceluser.com” folder.  While it’s not entirely necessary – I could target the inbox itself – this makes the updates run a bit faster as they don’t have to examine ALL of my email.  It’s also good housekeeping.  That folder will now be the ENTIRE source for the chart above.  Even better, I didn’t even have to read the emails myself!

Step 2: Open Excel and create a new Power Query

To do this, you must be running the December 2013 update at a minimum.  (Earlier versions don’t support Exchange as a data source.)  To set it up:

POWER QUERY –> From Other Sources –> From Microsoft Exchange

If you haven’t used an Exchange connection, you’ll be prompted for your credentials, which it will then save for you.  Unfortunately only one Exchange account is supported at a time right now, so if you need to change it go to POWER QUERY –> Data Source Settings to modify your credentials.

Now, for me it pulled up a Navigator task pane on the right with a list of Mail, Calendar, People, Tasks and Meeting Requests.  I personally wish it would take you into Power Query, as this is just one more step to slow me down.  I have to select Mail, then click Edit Query (at the bottom) to go and start building my query.  Once I do so, then I see this:

image

Wow… that’s a lot of columns, but not really where I want to be.  Let’s filter this list down a bit…

I clicked the dropdown arrow beside “Folder Path”, and then clicked the “Load More” button, since there is little in the list:

SNAGHTML38a48c

Perfect, now I uncheck Select All, check \Affiliates\Exceluser.com\ and I’m set… my list will filter to only those emails from Exceluser.com.

Step 3: Removing Irrelevant Columns

There’s a lot of them, so I’m going to get rid of them.  I select and remove each of the following:

“Subject”, “Sender”, “DisplayTo”, “DisplayCc”, “ToRecipients”, “CcRecipients”, “BccRecipients”, “DateTimeReceived”, “Importance”, “Categories”, “IsRead”, “HasAttachments”, “Attachments”, “Preview”, “Attributes”, “Id”

That leaves me with 3 columns in all:  “Folder Path”,”DateTimeSent” and “Body”:

SNAGHTML3d45e9

Step 4: Expand the Email Body

Now here’s where cool starts to get visible… the Body field contains a bunch of entries called “Record”, which are shown in green.  That means they are clickable and, if you do click it, will show you the entire body of the selected email.  But even better is that I can expand those into the table itself.  That funny little double headed arrow next to “Body” does just that.  When I click it, I can see all the items in the record which, in this case includes both the text and html bodies of the email:

image

I’d rather play with plain text than HTML, so I cleared the HTMLBody flag and click OK, resulting in the following:

image

So the body has expanded, and if I click on a specific record it shows the entire body in the window at the bottom.  Very cool, and I can see that this email shows earnings of $12.91.  Even better, as I click through the other records, Charley’s emails are identical, with only the value changing.  This is VERY good.

Step 5: Strip out the Value

Now the above is good, but it contains a bunch of hard returns and stuff.  I want to get rid of those.  So I select the Body.TextBody.1 column then go to Transform—>Text Transforms—>Clean.  That removes all the hard returns, resulting in this:

SNAGHTML46cabd

Now I’m going to do a find and replace to remove all characters before the value.  Since it’s always the same, I got to Replace values and I can replace “Hi Ken Puls,This email is to notify you that you have made a sale.You have earned $” with nothing.

The column will now start with the value, but I need to remove everything after the value.  Easiest way for me was to:

  • Replace ” in commission for this sale.” (notice the leading space) with a pipe: “|”
  • Choose Split Column –> By Delimiter –> Custom –> |

I then made sure that the Body.TextBody.1.1 column was selected and choose to set the Data Type to Number.

Cool, I’ve got my values now!

Step 6: Final cleanup

Finally, I did a bit of cleanup…

  • I removed the Body.TextBody.1.2 column as well as the Folder Path columns, since I don’t need either any more
  • I renamed the Body.TextBody.1.1 column to Commission
  • I renamed the table (on the right) to Commissions

And here it is:

SNAGHTML512197

And finally I clicked OK to load the table to the worksheet.

Step 7: Create the Report

And this part is probably the easiest… I created a simple Pivot:

  • DateTimeSent on rows
  • Group dates by Months and Years
  • Years on Columns
  • Commissions on Values

And then the PivotChart where I removed the Field Buttons, vertical axis and gridlines, and added data labels.  Pretty straight forward if you’re used to working with PivotTables or charts at all.

Final Thoughts

This tool is truly awesome.  All I need to do to get an update on my affiliate status is to open the file and refresh the data.  It will reach into my email folder and do the rest.  Simply amazing, really.

And while the setup may look like a lot of steps, this file took me less than 30 minutes to build, from setting up the rule in Outlook to pulling the data into PowerQuery through cleaning it and reporting on it.  Granted, I’ve been playing around with similar data lately, but still, it was quick from start to finish.

So… “Meh, it’s just email”?  Not any more.  :)

From TXT files to BI solution

Several years ago, when Power Pivot first hit the scene, I was talking to a buddy of mine about it. The conversation went something like this:

My Friend: “Ken, I’d love to use PowerPivot, but you don’t understand the database culture in my country. Our DB admins won’t let us connect directly to the databases, they will only email us text files on a daily, weekly or monthly basis.”

Me: “So what? Take the text file, suck it into PowerPivot and build your report. When they send you a new file, suck it into PowerPivot too, and you’re essentially building your own business intelligence system. You don’t need access to the database, just the data you get on a daily/weekly basis!”

My Friend: “Holy #*$&! I need to learn PowerPivot!”

Now, factually everything I said was true. Practically though, it was a bit more complicated than that. See, PowerPivot is great at importing individual files into tables and relating them. The issue here is that we really needed the data appended to an existing file, rather than related to an existing file. It could certainly be done, but it was a bit of a pain.

But now that we have Power Query the game changes in a big way for us; Power Query can actually import and append every file in a folder, and import it directly into the data model as one big contiguous table. Let’s take a look in a practical example…

Example Background

Assume we’ve got a corporate IT policy that blocks us from having direct access to the database, but our IT department sends us monthly listings of our sales categories. In the following case we’ve got 4 separate files which have common headers, but the content is different:

clip_image002

With PowerPivot it’s easy to link these as four individual tables in to the model, but it would be better to have a single table that has all the records in it, as that would be a better Pivot source setup. So how do we do it?

Building the Query

To begin, we need to place all of our files in a single folder. This folder will be the home for all current and future text files the IT department will ever send us.

Next we:

  • Go to Power Query –> From File –> From Folder
  • Browse to select our folder
  • Click OK

At this point we’ll be taken in to PowerQuery, and will be presented with a view similar to this:

clip_image004

Essentially it’s a list of all the files, and where they came from. The key piece here though, is the tiny little icon next to the “Content” header: the icon that looks like a double down arrow. Click it and something amazing will happen:

clip_image005

What’s so amazing is not that it pulled in the content. It’s that it has actually pulled in 128 rows of data which represents the entire content of all four files in this example. Now, to be fair, my data set here is small. I’ve also used this to pull in over 61,000 records from 30 csv files into the data model and it works just as well, although it only pulls the first 750 lines into Power Query’s preview for me to shape the actual query itself.

Obviously not all is right with the world though, as all the data came in as a single column. These are all Tab delimited files, (something that can be easily guessed by opening the source file in Notepad,) so this should be pretty easy to fix:

  • Go to Split Column –> Delimited –> Tab –> OK
  • Click Use First Row as Headers

We should now have this:

clip_image006

Much better, but we should still do a bit more cleanup:

  • Highlight the Date column and change the Data Type to Date
  • Highlight the Amount column and change the Data Type to Number

At this point it’s a good idea to scroll down the table to have a look at all the data. And it’s a good thing we did too, as there are some issues in this file:

clip_image007

Uh oh… errors. What gives there?

The issue is the headers in each text file. Because we changed the first column’s data type to Date, Power Query can’t render the text of “Date” in that column, which results in the error. The same is true of “Amount” which can’t be rendered as a number either since it is also text.

The manifestation as errors, while problematic at first blush, is actually a good thing. Why? Because now we have a method to filter them out:

  • Select the Date column
  • Click “Remove Errors”

We’re almost done here. The last things to do are:

  • Change the name (in the Query Settings task pane at right) from “Query 1” to “Transactions”
  • Uncheck “Load to worksheet” (in the bottom right corner)
  • Check “Load to data model” (in the bottom right corner)
  • Click “Apply & Close” in the top right

At this point, the data will all be streamed directly into a table in the Data Model! In fact, if we open PowerPivot and sort the dates from Oldest to Newest, we can see that we have indeed aggregated the records from the four individual files into one master table:

clip_image008

Implications

Because PowerQuery is pointed to the folder, it will stream in all files in the folder. This means that every month, when IT sends you new or updated files, you just need to save them into that folder. At that point refreshing the query will pull in all of the original files, as well as any you’ve added. So if IT sends me four files for February, then the February records get appended to the January ones that are already in my PowerPivot solution. If they send me a new file for a new category, (providing the file setup is 3 columns of tab delimited data,) it will also get appended to the table.

The implications of this are huge. You see, the issue that my friend complained about is not limited to his country at all. Every time I’ve taught PowerPivot to date, the same issue comes up from at least one participant. IT holds the database keys and won’t share, but they’ll send you a text file. No problem at all. Let them protect the integrity of the database, all you now need to do is make sure you receive your regular updates and pull them into your own purpose built solution.

Power Query and Power Pivot for the win here. That’s what I’m thinking. :)