I feel like I need to test this now, but my gut feel is csv. Given that it is a flat file, there should be a lot less overhead vs the more complicated structure of the Excel file.
I feel like I need to test this now, but my gut feel is csv. Given that it is a flat file, there should be a lot less overhead vs the more complicated structure of the Excel file.
Ken Puls, FCPA, FCMA, MS MVP
Learn to Master Your Data at the Power Query Academy (the world's most comprehensive online Power Query training), with my book M is for Data Monkey, or our new Power Query Recipe cards!
Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter
If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.
Okay, so based on a pretty simple data set with 4 columns and 958,751 rows...
Loading from CSV took 9.3 seconds on average vs 13.4 seconds for XLSX. SO CSV is about 30% faster based on my test.
Ken Puls, FCPA, FCMA, MS MVP
Learn to Master Your Data at the Power Query Academy (the world's most comprehensive online Power Query training), with my book M is for Data Monkey, or our new Power Query Recipe cards!
Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter
If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.
How I can measure the runtime for individual sections in VBA is clear to me. But how I can measure the runtime in Power Query, only for importing the data, is not clear to me. Can you explain how you did that.
In order to get a speed test of a block of code I create a query that performs just the steps I want to analyze, make sure it loads to a worksheet or the data model, and then run the TimeSleuth feature of my Monkey Tools add-in on it.
Originally Posted by Nick Burns
It's hard to say without knowing what your code does, as different operations can affect things in different ways. What you could try is making a backup of your file, duplicate the query that is taking too long, then split it in half by right clicking and choosing "Extract Previous" on one of the applied steps. Load both those and test the speed of each. The second query in that chain will always take longer than the first, but if the time is very close, then the problem step is in the first half. If the second query takes way longer, then you know the problem step is in that area.
To narrow it down, set the query you aren't concerned with to connection only, then repeat the process of splitting the problem one in half until you identify the step that is causing the longer load.
Honestly, this is a painful situation, and involves a lot of waiting. I wish we had the Query Diagnostics in Excel that Power BI has. But until we do, this is how I would approach it. (To be fair, the more you do this you can look at the operations in the Query Sleuth and identify things that might take longer... merges, sorts, reordering are all things that would become my first targets when perf testing.
Ken Puls, FCPA, FCMA, MS MVP
Learn to Master Your Data at the Power Query Academy (the world's most comprehensive online Power Query training), with my book M is for Data Monkey, or our new Power Query Recipe cards!
Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter
If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.
Interesting approach.
I do have some merges and sorts in the model. For reordering, are you talking about the column order? I try to keep them at a minimum and only ordering as a last step. Also I seem to recall that using Select Columns is a more efficient process then doing a reorder. Have you encountered this before?
Thanks again for your help!
Oh... by the way, YOU'RE WELCOME!
Yep, that's the type of of ordering I'm talking about. Personally, I never re-order columns as I always load data to the data model anyway.
I haven't actually tested if SelectColumns is faster that an explicit re-order step. Would be easy to do though, just duplicate the specific query, and change on to use the other method. At that point you could test to the speed pretty easily.
Ken Puls, FCPA, FCMA, MS MVP
Learn to Master Your Data at the Power Query Academy (the world's most comprehensive online Power Query training), with my book M is for Data Monkey, or our new Power Query Recipe cards!
Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter
If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.
Bookmarks