From TXT files to BI solution

Several years ago, when Power Pivot first hit the scene, I was talking to a buddy of mine about it. The conversation went something like this:

My Friend: “Ken, I’d love to use PowerPivot, but you don’t understand the database culture in my country. Our DB admins won’t let us connect directly to the databases, they will only email us text files on a daily, weekly or monthly basis.”

Me: “So what? Take the text file, suck it into PowerPivot and build your report. When they send you a new file, suck it into PowerPivot too, and you’re essentially building your own business intelligence system. You don’t need access to the database, just the data you get on a daily/weekly basis!”

My Friend: “Holy #*$&! I need to learn PowerPivot!”

Now, factually everything I said was true. Practically though, it was a bit more complicated than that. See, PowerPivot is great at importing individual files into tables and relating them. The issue here is that we really needed the data appended to an existing file, rather than related to an existing file. It could certainly be done, but it was a bit of a pain.

But now that we have Power Query the game changes in a big way for us; Power Query can actually import and append every file in a folder, and import it directly into the data model as one big contiguous table. Let’s take a look in a practical example…

Example Background

Assume we’ve got a corporate IT policy that blocks us from having direct access to the database, but our IT department sends us monthly listings of our sales categories. In the following case we’ve got 4 separate files which have common headers, but the content is different:

clip_image002

With PowerPivot it’s easy to link these as four individual tables in to the model, but it would be better to have a single table that has all the records in it, as that would be a better Pivot source setup. So how do we do it?

Building the Query

To begin, we need to place all of our files in a single folder. This folder will be the home for all current and future text files the IT department will ever send us.

Next we:

  • Go to Power Query --> From File --> From Folder
  • Browse to select our folder
  • Click OK

At this point we’ll be taken in to PowerQuery, and will be presented with a view similar to this:

clip_image004

Essentially it’s a list of all the files, and where they came from. The key piece here though, is the tiny little icon next to the “Content” header: the icon that looks like a double down arrow. Click it and something amazing will happen:

clip_image005

What’s so amazing is not that it pulled in the content. It’s that it has actually pulled in 128 rows of data which represents the entire content of all four files in this example. Now, to be fair, my data set here is small. I’ve also used this to pull in over 61,000 records from 30 csv files into the data model and it works just as well, although it only pulls the first 750 lines into Power Query’s preview for me to shape the actual query itself.

Obviously not all is right with the world though, as all the data came in as a single column. These are all Tab delimited files, (something that can be easily guessed by opening the source file in Notepad,) so this should be pretty easy to fix:

  • Go to Split Column --> Delimited --> Tab --> OK
  • Click Use First Row as Headers

We should now have this:

clip_image006

Much better, but we should still do a bit more cleanup:

  • Highlight the Date column and change the Data Type to Date
  • Highlight the Amount column and change the Data Type to Number

At this point it’s a good idea to scroll down the table to have a look at all the data. And it’s a good thing we did too, as there are some issues in this file:

clip_image007

Uh oh… errors. What gives there?

The issue is the headers in each text file. Because we changed the first column’s data type to Date, Power Query can’t render the text of “Date” in that column, which results in the error. The same is true of “Amount” which can’t be rendered as a number either since it is also text.

The manifestation as errors, while problematic at first blush, is actually a good thing. Why? Because now we have a method to filter them out:

  • Select the Date column
  • Click “Remove Errors”

We’re almost done here. The last things to do are:

  • Change the name (in the Query Settings task pane at right) from “Query 1” to “Transactions”
  • Uncheck “Load to worksheet” (in the bottom right corner)
  • Check “Load to data model” (in the bottom right corner)
  • Click “Apply & Close” in the top right

At this point, the data will all be streamed directly into a table in the Data Model! In fact, if we open PowerPivot and sort the dates from Oldest to Newest, we can see that we have indeed aggregated the records from the four individual files into one master table:

clip_image008

Implications

Because PowerQuery is pointed to the folder, it will stream in all files in the folder. This means that every month, when IT sends you new or updated files, you just need to save them into that folder. At that point refreshing the query will pull in all of the original files, as well as any you’ve added. So if IT sends me four files for February, then the February records get appended to the January ones that are already in my PowerPivot solution. If they send me a new file for a new category, (providing the file setup is 3 columns of tab delimited data,) it will also get appended to the table.

The implications of this are huge. You see, the issue that my friend complained about is not limited to his country at all. Every time I’ve taught PowerPivot to date, the same issue comes up from at least one participant. IT holds the database keys and won’t share, but they’ll send you a text file. No problem at all. Let them protect the integrity of the database, all you now need to do is make sure you receive your regular updates and pull them into your own purpose built solution.

Power Query and Power Pivot for the win here. That’s what I’m thinking. 🙂

Un-Pivoting Data in Power Query

I was fooling around with the latest build of Power Query (for Excel 2010/2013), and I’ve got to say that I’m actually really impressed with this piece. I think the user experience still needs a bit of work, but once you know how, this is really simple.

Whenever I teach courses on PivotTables people want to eat them up, but often they’ve already got their data in a format that resembles the PivotTable output, rather than a format that is consumable by the Pivot engine. Take the following set of data.

SNAGHTML111988

A classic setup for how people think and want to see the data. But now what if the user wanted to see the records by month under the category breakdowns? An easy task for a Pivot, but it would require a bit of manipulation without it. So how do we get it back to a Pivot compliant format?

With Power Query it’s actually super simple:

  • Step 1: Select the data and turn it into a table
  • Step 2: Select Power Query –> From Table to suck it in to Power Query
  • Step 3: Un-pivot it

Okay, so this one takes a bit of know how right now. If you just click Unpivot, you get something really weird:

image 

What actually needs to happen, which isn’t totally intuitive, is we need to click the header for the January column, hold down CTRL and click the February and March column headers:

SNAGHTML18e3d9

Once we’ve done that we click Unpivot and TA-DA!

image

So by virtue of being able to choose one or more columns, it’s actually quite flexible as you can choose HOW to un-pivot, which is uber-awesome. It’s a shame the UI doesn’t help you realize that today, but hopefully that gets fixed in future.

At any rate, you can right click the headers to rename the columns, then click “Apply and Close” and we end up with the un-pivoted source in our worksheet. And now we can create a new PivotTable off that source:

SNAGHTML1b48f1

Even better, if we add some new data to our original report:

SNAGHTML1cfc48

We can then refresh the query on the worksheet and the records are pulled in. With a refresh on the Pivot as well (yeah, unfortunately you can’t just refresh the pivot to drive the query to update) it’s all pulled in:

SNAGHTML1dae1a

Pretty damn cool!

Newsletter and blog updates

Today marked a new milestone for me, as I actually sent out a newsletter to everyone who has ever signed up at Excelguru.ca.  I was little hesitant, but figured I'd give it a go and see how people reacted.

I actually got some really nice feedback from people, which was great, even asking how to get others to sign up.

To that end, I've added a newsletter signup button on my Facebook page at www.facebook.com/xlguru.  And, since my blog theme was getting a bit stale, I've updated that as well, adding a link on the right hand side here.

I'm not totally sure I'm sold on the new theme, but let me know what you think.

PowerPivot training live in Victoria, BC!

Anyone who follows my website or Facebook Fan Page knows that I’m a huge fan of PowerPivot. Well great news now, that you can come and learn not only why this is the best thing to happen to Excel in 20 years, but also how to take advantage of it yourself!

I’ll be teaching a course on PowerPivot and DAX in Victoria, BC on November 22nd, 2013. While the course is hosted by the Chartered Professional Accountants, it’s open to anyone who wishes to subscribe.

If you’ve been trying to figure out how to get started with Power Pivot, you’re confused as to how and why things work, or you want to master date/time intelligence in PowerPivot, this course is for you.

100% hands on, we’ll start with basic pivot tables (just as a refresher.) Next we’ll look at how to build the PowerPivot versions and why they are so much more powerful. From linking multiple tables together without a single VLOOKUP to gaining a solid understanding of relating data, you’ll learn the key aspects to building solid PowerPivot models. We’ll work through understanding filter context, measures and relationships, and finish the day by building measures that allow you to pull great stats such as Month to Date for the same month last year, among others.

Without question, you’ll get the most out of this course if you’re experienced with PivotTables. If you don’t feel like you’re a pro in that area though, don’t worry! As an added bonus, to anyone who signs up via this notice, please let me know. I’ll provide you with a free copy of my Magic of PivotTables course so that you can make sure you’re 100% Pivot Table compatible before you arrive.

This is a hands on course, so you need to bring a laptop that is pre-loaded with either:

  • Excel 2010 and the free PowerPivot download
  • Excel 2013 with Office 2013 Professional Plus installed (yes the PLUS is key!)
  • Excel 2013 with Excel 2013 standalone installed.

If you have any questions as to which you have installed, simply drop me a note via the contact form on my website, and I’ll help you figure it out.

Full details of the course contents as well as a registration link, can be found at http://www.icabc-pd.com/pd-seminars-seminar.php?id=2849. Don’t wait too long though, as registration deadline is November 14th!

Hope to see you there!