Pain Points – Power Query Credential Management

After my last post on Power Query Security Woes, Ehren – a developer on the Power Query team – sent me a message on Twitter about it.  I will blog about that later, after I’ve had a chance to test it out, but before I did, I thought I’d go and clean up some PQ stuff that I messed up.

A little history…

A while back I got the bright idea to pull down our web leads (at my day job).  I then though, “Hey, wouldn’t it be cool if I could plot all of them on Power Map?”  Sure it would.  But all I had was their IP address.  “No problem!”, I figured, “I’ll just use Power Query to feed out the IP to a web service and return a function to turn it into a real address!”

Success!

I cooked up a little script fed the IP to http://ip-api.com, which did exactly that. Here’s a sample from a random address:

image

I actually tried several services before I settled on this one, but ultimately  it seemed to generate the most accurate results.  Perfect, so I let the script fly, and it was awesome!

Success turns to failure…

Except… somewhere during my batch of 3,500 addresses it stopped working.  And when I went back to the site, I found out that I’d been blocked.  Ooops.  Apparently if you send over a certain threshold of queries in a certain amount of time you’re not a good person. (That’s why I’m not sharing the script.)  Sorry about that ip-api.com, I promise I’ll be good in future!

… and creates a mess in the process…

At any rate, the issue I then ran into is that I ended up with an entry for each URL in my Data Source Settings.  I was a while back, so I don’t remember if it happened by default, or if it happened because I was learning and just clicked the wrong thing.  End result is that I have an entry in my list for each IP I queried.

image

Since I’m going to be a good boy and not steamroll their site any more, I kind of want to clean them up.  No problem, right?  Click on the first address on screen, hold down SHIFT and click on the last address on screen and… huh?  I’ve only got the last address?  Okay, let’s try with CTRL… click, click.. ARGH!!

… and the mess turns into major frustration!

This interface only allows you to select one item at a time?  Really?  So to delete my 200 entries I have to click the line item, click Delete (at the top), click “Yes I’m sure” and then move on to the next one.  Ouch.  I see a lot of wasted time in my future if I really want to clean up now.  🙁

The missing feature(s)

Power Query team, (if you’re still listening,) please give us the ability to multi-select in this dialog, and multi-delete items.  We need to be able to clean up, as we’re all human and make mistakes as we go along.  This feels like a severe punishment right now!

In addition, while I’m here, why isn’t there an “Add” button in this interface?  When I go to edit one of these addresses, I see the following:

image

I’m guessing that if I’d just set that to the root at the beginning, I wouldn’t have an entry for every IP in that query.  Man!  If only I’d been able to declare this up front and realized that this was an option!  Yet there is no way to do this from a simple UI.  Instead, I (believe) I have to:

  • Create a query to one page at the domain
  • Set it’s security level as I pull data
  • Save the query
  • Go back to this interface and edit the anonymous access

Life would be SO much easier if I just had the ability to go into this UI and create my favourite and most used sites.  (And bonus points if I could push them out to users through group policy like I can with Office Trusted Locations!)

Okay, enough for now.  I’ll be back later once I’ve tested Ehren’s suggestion.  🙂

Power Query Security Woes

Recently I’ve been working on automating a business process for a client.  It’s become a really interesting project that uses a mix of Excel tables, Power Query, Power Pivot and VBA, as well as a WinAutomation script.  Before I talk about the issue that is facing me, it would be helpful to provide a bit of background on what we’re doing, and why so many of the Power BI pieces are needed.

Solution Background

My client has a business in which they outsource employees to other companies.  Each of those companies maintains the records of hours, and lets my client download a spreadsheet version of the hours logged by the subcontracted employee.  This is fairly helpful, as my client is the one that pays all of the employees, so getting the hours lets him know who gets paid for how many hours.  Easy enough, right?  But each spreadsheet is in a different format, and that causes a lot of manual entry to try and standardize it into a file that can be uploaded to the payroll processor.

So here we enter Power Query.  With Power Query we can import each of the files individually, reformat the data into a consistent set of columns, and output it into a table.

Each pay period we start a new Excel file, and import the data files for that pay period by running a WinAutomation script.  The script logs in to the remote systems, passes the correct parameters to the system, and downloads the Excel and CSV files required.  It then saves them in a subdirectory of our application under the pay period end date.  My Power Query scripts then use the pay period end date, dynamically build a file path to the file and import the data.

It’s beautiful… except…

The Issue

Here’s where the pain begins…  Every time you touch a new data file in Power Query that you haven’t used before you get a security message asking you what type of data (Public, Private or Organizational) you are using.  (You can learn more about those here:  http://office.microsoft.com/en-us/excel-help/privacy-levels-HA104009800.aspx) The issue I have is that each payroll I create new files in a subdirectory, so Power Query sees them as unique.  To that end I can’t just trust the data files once and be done with it.

Now, there is a workaround… just enable the Fast Combine feature.  That kills off the warning and lets them go, but it also has an issue.  This setting is workbook specific, which means that when my VBA saves the master payroll control file under a new name (we preserve history this way) the setting doesn’t stay selected.  Grrr.  Given that there is no way to touch this setting via code, my user has to remember to click the Fast Combine button every time they run the update.  Is it minor?  Sure.  But it’s ridiculous, it gets forgotten, and concerns them when they get the permissions messages.

The Solution We Need

Now don’t get me wrong, I’m all for security, but where it makes sense.  In Office 2007 we got a new macro paradigm that allowed us to trust folders on our computers/networks.  This setting is set on an application level basis, and persists between Excel sessions.  Beautiful, as I can set it to a specific folder and forget it.  I throw trusted files in there, I don’t throw in the ones I don’t know.  It actually allows me to practice safer computing overall.

In my opinion, this setting is drastically lacking in Power Query.  I really need the ability to set my Power Query add-in to have trusted root folders with trusted subfolders.  This would allow me to trust my application’s data directory and not have to remember to click Fast Combine each time I create new files.  It also means that I might pay attention to the message when it does come up in future, as it would be unusual.  Currently I see the error so much I just cancel it and go straight to Fast Combine.  Hmm… kind of like how we set macro security to “Low” in Excel 2003 and earlier to bypass the annoying message, which essentially left us unprotected.

The Solution (Some of us) Can Implement

To be fair, I believe I do have a workaround for this now, but I don’t think it should be necessary.

Basically what I could do is use VBA to drive the refresh of my tables, and therefore the Power Query scripts.  Given that, I could use VBA to copy and replace the files in a central repository where the hierarchy does not change.  I.e. I could set up a folder like AppData\Timesheets\Current and use VBA to copy the required data files from my dated subfolders, replacing the ones in the “current” folder.  I would then target my Power Query scripts against the files in the “Current” folder, and build my solution off that.

Because Power Query holds a list of trusted files at an application level (rather than workbook level), these files should stay trusted even if I do create a new file, removing the need to constantly click the “Fast Combine” button.  So I think this should work.

So what’s the problem?  It only involves creating a VBA macro to do that.  Fine for me, as I know how to code with VBA and make this happen.  But for most of the Power Query target market I would suggest that this is probably outside of their normal skill set.

Final Thought

Even though it is technically possible to work around this issue, I still argue should not be necessary.  We need a proper option to trust the files in a local folder so that dynamically referring to a file in Power Query can be scripted without invoking a painful security paradigm.