Power Query slow and unstable after February update

DirectorAlwyn

New member
Joined
Apr 4, 2016
Messages
8
Reaction score
0
Points
0
To start, I am also working with the Microsoft team on this (Minh specifically) after sending a number of frowns, but I am hoping someone has ideas on what we might be able to do to help with this, as right now Power Query is causing a number of issues, and as the local Power Query advocate I am in not the best place.

I have a tool that basically was my main learning piece for Power Query that incorporates Power Query, Power Pivot, Excel and Outlook VBA, and some other internal aspects to my company. It has been working nicely since October, with the main analysis step (where quite a few different queries are performed) taking between five and ten minutes. Unfortunately when I installed the February update, the tool completely ceased working and would just hang Excel. After a number of discussions we tried it on 64 bit Excel (32 bit Office 2013 is currently the standard at my company), and discovered that it would work there, but takes 30-50 minutes to run. Beyond this tool specifically, a number of other queries I built have started taking a lot longer than they used to, and one very simple query built to pull from SharePoint fails to even open and causes Excel to crash out entirely. While this doesn't explain everything, I have particularly noticed issues with combining files together, with even small numbers of files (three to five with less than ten rows/columns each) taking a couple of minutes. Excel is frequently using over 500MB of RAM itself, and the various Mashup processes often add enough that all told it's eating over 1 GB and 99% of my CPU. It used to be that it might spike that high for a few seconds right when opening a file, but never stayed that way, which it now does pretty much whenever I have anything with Power Query open (and the Excel usage is high even with a blank workbook, around 300 MB.

In addition, a number of others around the office that I have worked with on Power Query are now experiencing some pretty significant Excel instability even when not using Power Query actively, with Excel crashing out multiple times a day.

The details on the main tool problem are below (from my discussions with Minh at Microsoft), as that is where I have squeezed in most of the analysis on trying to lock this down. For reference most everyone (including me) was using Windows 7 Enterprise 64 bit, with Office 2013 32 bit, RAM varying between 4-16 GB (I have 8 GB), processors are a bit all over the place but nothing we are working on has more then 50,000 rows so I was not anticipating major issues.

If there is any other information I can provide, please let me know. This is an amazing tool that I love working with, I just wish it was actually working at the moment.

---------------------

The particular step of the process I’m using that was hanging for so long had a number of parts (several queries run, analysis performed in PowerPivot, updates done to the pivot table, etc.), but the part that’s having the problem is the step where multiple Excel workbooks have data combined into a single column (M code below). Before the update a couple of weeks back, this would take maybe a couple of minutes to run, but this is now taking over half an hour to run for ~3,000 workbooks, with a total of ~32,000 rows. If I remove the step where this query is refreshed, all the other steps combined complete in less than a minute, but the overall process with this is now taking ~40 minutes to run.

Visually, what I see on the bottom right corner is that it is “Connecting to Datasource…” for the entire time, once that shifts everything else completes rapidly. The workbooks are all saved locally on my PC, so it’s not a network issue. The workbook with all the Power Query normally is on our network, but I have tried it with a version on my local PC as well to see if that made a difference, but that did not change anything.

I was originally doing the combination with a custom function, but tried doing it using a different set of steps to see if that made a difference, but it seemed to run the same way.

If there is anything else I can provide on this, please let me know, I would love to get this working better.

Original Version of combination:

//Pull the raw data from the Expo Customs queries
let
//Bring in data from the folder that holds all of the query responses from Customs
Source = Folder.Files("C:\Expo Customs Query Responses"),
//Clear out other file types that might contaminate, for instance .xlsm of the tool being stored in the same folder
#"Only show .xlsx files" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
//Pull in the actual C1 records response so we can do our own formatting
#"Keep Name and Folder Path columns only" = Table.RemoveColumns(#"Only show .xlsx files",{"Content", "Date accessed", "Date modified", "Date created", "Attributes", "Extension"}),
#"Use GetC1 Function" = Table.AddColumn(#"Keep Name and Folder Path columns only", "File Contents", each GetC1([Folder Path],[Name])),
#"Pull in C1 Records" = Table.ExpandTableColumn(#"Use GetC1 Function", "File Contents", {"Column1"}, {"File Contents.Column1"}),
#"Keep C1 Records only" = Table.RemoveColumns(#"Pull in C1 Records",{"Name", "Folder Path"}),
#"Remove Header" = Table.SelectRows(#"Keep C1 Records only", each ([File Contents.Column1] <> "C1 Record")),
#"Rename C1 Records" = Table.RenameColumns(#"Remove Header",{{"File Contents.Column1", "C1 Records"}})
in
#"Rename C1 Records"

C1 Function definition:

//Function to extract the C1 Records raw data from each file in the defined folder
let ExcelFile = (FilePath, FileName) =>
let
//Pull data from a variable folder
Source = Folder.Files(FilePath),
//Pull data from a variable workbook
#"File" = Source{[#"Folder Path"=FilePath,Name=FileName]}[Content],
//Open the workbook up to see what tables are available
#"Imported Excel" = Excel.Workbook(#"File"),
//Navigate to the C1 Records sheet and pull all the data from it
#"C1 Records_Sheet" = #"Imported Excel"{[Item="C1 Records",Kind="Sheet"]}[Data]
in
#"C1 Records_Sheet"
in
ExcelFile



Alternate Version of combination:

//Pull the raw data from the Expo Customs queries
let
//Bring in data from the folder that holds all of the query responses from Customs
Source = Folder.Files("C:\Expo Customs Query Responses"),
//Clear out other file types that might contaminate, for instance .xlsm of the tool being stored in the same folder
#"Only show .xlsx files" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
#"Filtered Rows" = Table.SelectRows(#"Only show .xlsx files", each Date.IsInCurrentDay([Date created])),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "C1 Records", each Excel.Workbook([Content])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Content"}),
#"Expanded C1 Records" = Table.ExpandTableColumn(#"Removed Columns", "C1 Records", {"Name", "Data"}, {"C1 Records.Name", "C1 Records.Data"}),
#"Filtered Rows1" = Table.SelectRows(#"Expanded C1 Records", each ([C1 Records.Name] = "C1 Records")),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows1",{"C1 Records.Name"}),
#"Expanded C1 Records.Data" = Table.ExpandTableColumn(#"Removed Columns1", "C1 Records.Data", {"Column1"}, {"C1 Records.Data.Column1"}),
#"Filtered Rows2" = Table.SelectRows(#"Expanded C1 Records.Data", each ([C1 Records.Data.Column1] <> "C1 Record")),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows2",{{"C1 Records.Data.Column1", "C1 Records"}})
in
#"Renamed Columns"
 
Hey there, and welcome to the forum.

First thing I'd say is that with that drastic a change, I probably would have tried to loop you in with the Power Query team directly. It looks like you're already there though, so that's a good thing.

With regards to your actual queries, my guess is that the version using the C1 function runs slower than the other. The reason I say this is that you enumerate the list of all files in the folder every single time you call it. For that reason, I'd probably go to your alternate version, which enumerates all files once, then extracts using the Workbook([Content]) method.

In addition, you may want to try the following modification. This may or may not help anything, so I'm curious to hear the results:
Code:
[COLOR=#333333]Source = [/COLOR][COLOR=#ff0000][B]Table.Buffer([/B][/COLOR][COLOR=#333333]Folder.Files("C:\Expo Customs Query Responses")[/COLOR][COLOR=#ff0000][B])[/B][/COLOR][COLOR=#333333],[/COLOR]

And
Code:
[COLOR=#333333]#"Added Custom" = [/COLOR][COLOR=#ff0000][B]Table.Buffer([/B][/COLOR][COLOR=#333333]Table.AddColumn(#"Removed Other Columns", "C1 Records", each Excel.Workbook([Content]))[/COLOR][COLOR=#ff0000][B])[/B][/COLOR][COLOR=#333333],[/COLOR]

And finally
Code:
[COLOR=#333333]#"Expanded C1 Records.Data" = [/COLOR][COLOR=#ff0000][B]Table.Buffer([/B][/COLOR][COLOR=#333333]Table.ExpandTableColumn(#"Removed Columns1", "C1 Records.Data", {"Column1"}, {"C1 Records.Data.Column1"})[/COLOR][COLOR=#ff0000][B])[/B][/COLOR][COLOR=#333333],[/COLOR]


I've heard mixed reviews on whether this will have an effect or not. I'd be curious to know if it helps you here though.
 
Hello Ken,

Thank you for the assistance! I'll try the Table.Buffer on both versions and see how that goes. As for the relative performance of the two different solutions, it's a little hard to tell because the time is variable (and I can't do a lot of testing since this completely locks down my Excel while it runs, which as you might imagine impacts me fairly heavily) but there was not a significant difference, and the only time I did see a difference it was in the other direction, the function was faster. That's one of the reasons I think there's an actual bug somewhere in the update with regards to how my system is handling the combination (I was only using the function because I hadn't read your book yet, heh), as there shouldn't be any reason for that to be faster.

I have talked to the Power Query team on this a few times, but all of the suggestions were to modify various settings in Power Query (which I did or had already done, like Fast Data Load), haven't heard anything else in a couple of weeks.

The part that really confuses me is "CONNECTING TO DATASOURCE" being where it takes forever. Is that just a generic message for Power Query running? I've never had anything else stay showing that message for so long.
 
Okay, well if you're going to use the function, make sure you modify this inside it:

Code:
[COLOR=#333333]Source = Table.Buffer(Folder.Files(FilePath)),

That will make a difference.

Let me know how you make out with that first. I'm not ruling out an update issue by any stretch, but let's see if that fixes anything first.[/COLOR]
 
Oof,
Code:
[COLOR=#333333]Source = Table.Buffer(Folder.Files(FilePath)), [/COLOR]
added a good twenty minutes to the function version (and it was just that one, I had started running it with just the first buffer around the original folder call before I saw your second post).

I'm going to try the other version tomorrow, I have some meetings that I can run tests during.
 
Damn, sorry. That buffered inside the function. Ignore that, my brain wasn't in the right place. My intent was to buffer the table of files outside the function, avoiding any updates to Folder.Files. That can't happen without modifying the function to avoid making another call to Folder.Files the way you are. So that was totally the wrong thing to do.

Regardless, using the function should still end up slower overall. What's happening right now is you are doing this:
  • Listing all files in the folder
  • Getting the file path and passing it to a function
  • Listing all files in the folder
  • Filtering to just the file you want
  • Extracting the content

Your second alternative is doing this:
  • Listing all files in the folder
  • Filtering to just the file you want
  • Extracting the content

I'd lean towards the latter one...
 
Alrighty, I tried the alternate version without the function with the Table.Buffer at the steps indicated, but there was no significant difference in the time it took. Somewhere between 30-40 minutes again (I had to step away from my desk near the end). However, I did notice that my computer was using less in the way of resources, something like a third of the memory on the main Excel process (though the Mashups were just as resource intensive, and CPU usage did not seem to change).
 
Back
Top