Recently, I had to import a large-ish CSV file at around 1.5 million rows into Excel using Power Query.
I used a From Text/CSV query then proceeded to transform the data and merge a couple other tables with it. I did this all in the one query then loaded it into Power Pivot to analyze. It worked great and Power Query/Pivot handled it easily.
Then I had to look at a longer time period.
I tried to export a new set of data with a longer timeline. The system I was exporting from had trouble before with a much smaller time period, and sure enough, it wasn't working with the new larger time period.
This meant I had to export a couple smaller files and glue them together. I needed a From Folder query instead of the single CSV query.
My original query had a lot of transformation steps in it after importing, so I didn't want to just throw it away and start over.
In this post, we'll look at how we can convert a From Text/CSV query into a From Folder query.
You can download the before and after workbooks along with the CSV files used in this post here.
The Original Query
Ok, this isn't my original query or data and the transformations are spurious and just for example.
Here's a From Text/CSV query. Excel will automatically create the first three steps in the query which define the source of the data, promote the first row of data to headers and then changes the data types.
The remaining steps are the transformations I've added to the query. These are the transformations I want to preserve and don't want to have to go through creating again by starting from scratch with a new From Folder query.
I've loaded this query into a table in the workbook and named it ReportData.
In this example, there aren't a lot of added steps. But you can imagine with multiple merges, added custom columns and other transformations it could be a pain to reproduce within a new query.
Create a New From Folder Query
We are going to need to create a new From Folder query.
Go to the Data tab and press the Get Data button then choose From File and then From Folder.
In the Folder dialog box we can either copy and paste the path of the folder which contains the set of reports or browse to it.
The next window will show a preview of the files in the folder and we can press the Combine button and then choose the Combine & Load To option.
The Combine Files window will appear and we can go with the default options and press the Ok button.
Now select Only Create Connection in the Import Data dialog box.
Power Query creates a new query named after the folder. In our example here, mine is called Reports.
Edit the M Code of our Previous Query
Now we are going to edit the M code of our original query which was called ReportData. In the Queries & Connections window pane, right click on the ReportData query and select Edit to open up the query editor.
We need to make it reference our newly created folder query as its source.
Go to the Home tab and press the Advanced Editor button to open up the M code editor.
We now need to edit a couple of lines of the code highlighted in yellow above. These are the lines of code from the original single file CSV import.
- The first line of code is the source of the data. It points to a single CSV file with a folder path and file name. We need to delete this part and replace it with a reference to our new folder query. The new source is now going to be the Reports folder query so we can replace this with #"Reports".
- The second line is another step automatically added to our original query which promotes the first row of data to column headers. This step now happens in our new folder query, so we don't need this and can delete it.
- The third line changes data types and is not needed we can also delete this step.
- The fourth line is the first step in our data transformation, but it references the previous step which we deleted. We need to update it to reference the Source step.
Finishing and Loading the Query
The revised M code should now look like this:
We have updated the source to reference our new folder query. We have removed the promote headers and change data type steps. The remaining step's references have been updated to reference the previous step.
Now we can press the Done button and then Close & Load the query. We should see our table update to include all the data from the files in our folder.
It's not a terribly complicated process to change the source of a query and update it from a single file import to a folder import. But why not avoid it in the first place?
Making mistakes allows for the opportunity to learn new thing and in this instance, I learned two things about importing text, CSV or Excel files into Power Query:
- Import a clean untouched version of the data as a connection only, then reference it in a new query for any transformation steps.
- Use a From Folder query instead of a From Text/CSV or From Workbook query. This way you can easily add more files to the folder for import at a later time.