Results 1 to 8 of 8

Thread: Datasource: XLS(x) vs CSV?

  1. #1
    Conjurer Nick Burns's Avatar
    Join Date
    May 2017
    Posts
    102
    Articles
    0
    Excel Version
    Office 365

    Datasource: XLS(x) vs CSV?



    Register for a FREE account, and/
    or Log in to avoid these ads!

    I have the opportunity to either read my datasource from a CSV file or an Excel file (2007 or 2019).

    Which would be better as far as performance for running PQ?
    Oh... by the way, YOU'RE WELCOME!

  2. #2
    Administrator Ken Puls's Avatar
    Join Date
    Mar 2011
    Location
    Nanaimo, BC, Canada
    Posts
    2,450
    Articles
    44
    Blog Entries
    14
    Excel Version
    Excel Office 365 Insider
    I feel like I need to test this now, but my gut feel is csv. Given that it is a flat file, there should be a lot less overhead vs the more complicated structure of the Excel file.
    Ken Puls, FCPA, FCMA, MS MVP

    Learn to Master Your Data at the Power Query Academy (the world's most comprehensive online Power Query training), with my book M is for Data Monkey, or our new Power Query Recipe cards!

    Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
    Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter

    If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.

  3. #3
    Administrator Ken Puls's Avatar
    Join Date
    Mar 2011
    Location
    Nanaimo, BC, Canada
    Posts
    2,450
    Articles
    44
    Blog Entries
    14
    Excel Version
    Excel Office 365 Insider
    Okay, so based on a pretty simple data set with 4 columns and 958,751 rows...

    Loading from CSV took 9.3 seconds on average vs 13.4 seconds for XLSX. SO CSV is about 30% faster based on my test.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	csv_vs_xlsx.png 
Views:	10 
Size:	12.3 KB 
ID:	10286  
    Ken Puls, FCPA, FCMA, MS MVP

    Learn to Master Your Data at the Power Query Academy (the world's most comprehensive online Power Query training), with my book M is for Data Monkey, or our new Power Query Recipe cards!

    Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
    Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter

    If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.

  4. #4
    Conjurer pinarello's Avatar
    Join Date
    Jun 2019
    Location
    Germany
    Posts
    112
    Articles
    0
    Excel Version
    Office 365
    How I can measure the runtime for individual sections in VBA is clear to me. But how I can measure the runtime in Power Query, only for importing the data, is not clear to me. Can you explain how you did that.

  5. #5
    Conjurer Nick Burns's Avatar
    Join Date
    May 2017
    Posts
    102
    Articles
    0
    Excel Version
    Office 365
    Quote Originally Posted by Ken Puls View Post
    Okay, so based on a pretty simple data set with 4 columns and 958,751 rows...

    Loading from CSV took 9.3 seconds on average vs 13.4 seconds for XLSX. SO CSV is about 30% faster based on my test.
    That was what I suspected - thanks for verifying!

    Now I just have to figure out why my report (45c x 3500r) takes so long to parse.

    I have MonkeyTools installed - any guidance on what how I can use it to improve my models?
    Oh... by the way, YOU'RE WELCOME!

  6. #6
    Administrator Ken Puls's Avatar
    Join Date
    Mar 2011
    Location
    Nanaimo, BC, Canada
    Posts
    2,450
    Articles
    44
    Blog Entries
    14
    Excel Version
    Excel Office 365 Insider
    Quote Originally Posted by pinarello View Post
    How I can measure the runtime for individual sections in VBA is clear to me. But how I can measure the runtime in Power Query, only for importing the data, is not clear to me. Can you explain how you did that.
    In order to get a speed test of a block of code I create a query that performs just the steps I want to analyze, make sure it loads to a worksheet or the data model, and then run the TimeSleuth feature of my Monkey Tools add-in on it.

    Quote Originally Posted by Nick Burns
    I have MonkeyTools installed - any guidance on what how I can use it to improve my models?


    It's hard to say without knowing what your code does, as different operations can affect things in different ways. What you could try is making a backup of your file, duplicate the query that is taking too long, then split it in half by right clicking and choosing "Extract Previous" on one of the applied steps. Load both those and test the speed of each. The second query in that chain will always take longer than the first, but if the time is very close, then the problem step is in the first half. If the second query takes way longer, then you know the problem step is in that area.

    To narrow it down, set the query you aren't concerned with to connection only, then repeat the process of splitting the problem one in half until you identify the step that is causing the longer load.

    Honestly, this is a painful situation, and involves a lot of waiting. I wish we had the Query Diagnostics in Excel that Power BI has. But until we do, this is how I would approach it. (To be fair, the more you do this you can look at the operations in the Query Sleuth and identify things that might take longer... merges, sorts, reordering are all things that would become my first targets when perf testing.
    Ken Puls, FCPA, FCMA, MS MVP

    Learn to Master Your Data at the Power Query Academy (the world's most comprehensive online Power Query training), with my book M is for Data Monkey, or our new Power Query Recipe cards!

    Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
    Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter

    If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.

  7. #7
    Conjurer Nick Burns's Avatar
    Join Date
    May 2017
    Posts
    102
    Articles
    0
    Excel Version
    Office 365
    Quote Originally Posted by Ken Puls View Post
    In order to get a speed test of a block of code I create a query that performs just the steps I want to analyze, make sure it loads to a worksheet or the data model, and then run the TimeSleuth feature of my Monkey Tools add-in on it.



    [/COLOR]It's hard to say without knowing what your code does, as different operations can affect things in different ways. What you could try is making a backup of your file, duplicate the query that is taking too long, then split it in half by right clicking and choosing "Extract Previous" on one of the applied steps. Load both those and test the speed of each. The second query in that chain will always take longer than the first, but if the time is very close, then the problem step is in the first half. If the second query takes way longer, then you know the problem step is in that area.

    To narrow it down, set the query you aren't concerned with to connection only, then repeat the process of splitting the problem one in half until you identify the step that is causing the longer load.

    Honestly, this is a painful situation, and involves a lot of waiting. I wish we had the Query Diagnostics in Excel that Power BI has. But until we do, this is how I would approach it. (To be fair, the more you do this you can look at the operations in the Query Sleuth and identify things that might take longer... merges, sorts, reordering are all things that would become my first targets when perf testing.
    Interesting approach.

    I do have some merges and sorts in the model. For reordering, are you talking about the column order? I try to keep them at a minimum and only ordering as a last step. Also I seem to recall that using Select Columns is a more efficient process then doing a reorder. Have you encountered this before?

    Thanks again for your help!
    Oh... by the way, YOU'RE WELCOME!

  8. #8
    Administrator Ken Puls's Avatar
    Join Date
    Mar 2011
    Location
    Nanaimo, BC, Canada
    Posts
    2,450
    Articles
    44
    Blog Entries
    14
    Excel Version
    Excel Office 365 Insider
    Yep, that's the type of of ordering I'm talking about. Personally, I never re-order columns as I always load data to the data model anyway.

    I haven't actually tested if SelectColumns is faster that an explicit re-order step. Would be easy to do though, just duplicate the specific query, and change on to use the other method. At that point you could test to the speed pretty easily.
    Ken Puls, FCPA, FCMA, MS MVP

    Learn to Master Your Data at the Power Query Academy (the world's most comprehensive online Power Query training), with my book M is for Data Monkey, or our new Power Query Recipe cards!

    Main Site: http://www.excelguru.ca -||- Blog: http://www.excelguru.ca/blog -||- Forums: http://www.excelguru.ca/forums
    Check out the Excelguru Facebook Fan Page -||- Follow Me on Twitter

    If you've been given VBA code (a macro) for your solution, but don't know where to put it, CLICK HERE.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •