Get a table from each Excel file in folder | speed up the process

marcob8986

New member
Joined
Dec 4, 2020
Messages
7
Reaction score
0
Points
0
Excel Version(s)
365
Hi,
I've setup a query to read the content of a Table inside several Excel files in a folder.
My problem is that those files are quite big (3MB+ each) and when the folder contains many files the process is getting really slow.
Is there a way to speed things up in M code?

This my origin query:

Code:
let
    Origine = Folder.Files("C:\Users\...myPath....\myFolder")
in
    Origine

and this is the query I use to extract the content of the tables:

Code:
let
    Origine = ODPcartellatutti_i_file,
    #"Filtrate righe1" = Table.SelectRows(Origine, each Text.Contains([Extension], "xls") or Text.Contains([Extension], "XLS")),
    #"Rimosse colonne" = Table.RemoveColumns(#"Filtrate righe1",{"Date accessed", "Date modified", "Date created", "Folder Path"}),
    #"Aggiunta colonna personalizzata" = Table.AddColumn(#"Rimosse colonne", "Tabella_Riepilogo", each try Excel.Workbook([Content],null, true) otherwise ""),
    #"Tabella Tabella_Riepilogo espansa" = Table.ExpandTableColumn(#"Aggiunta colonna personalizzata", "Tabella_Riepilogo", {"Name", "Data", "Item"}, {"Tabella_Riepilogo.Name", "Tabella_Riepilogo.Data", "Tabella_Riepilogo.Item"}),
    #"Filtrate righe" = Table.SelectRows(#"Tabella Tabella_Riepilogo espansa", each try ([Tabella_Riepilogo.Name] = "TAB_ODP") otherwise ""),
    #"Tabella Tabella_Riepilogo.Data espansa" = Table.ExpandTableColumn(#"Filtrate righe", "Tabella_Riepilogo.Data", {"Cliente", "Oggetto", "Materiale", "Data ordine", "Data consegna", "Autore", "Importo", "Acconto", "Saldo", "TAGLIA-B.", "DISCO-FILO", "FILO-SAG", "TELAIO", "WATER JET", "INTERMAC", "OMAG/GMM", "FRESE", "VARIE"}, {"Tabella_Riepilogo.Data.Cliente", "Tabella_Riepilogo.Data.Oggetto", "Tabella_Riepilogo.Data.Materiale", "Tabella_Riepilogo.Data.Data ordine", "Tabella_Riepilogo.Data.Data consegna", "Tabella_Riepilogo.Data.Autore", "Tabella_Riepilogo.Data.Importo", "Tabella_Riepilogo.Data.Acconto", "Tabella_Riepilogo.Data.Saldo", "Tabella_Riepilogo.Data.TAGLIA-B.", "Tabella_Riepilogo.Data.DISCO-FILO", "Tabella_Riepilogo.Data.FILO-SAG", "Tabella_Riepilogo.Data.TELAIO", "Tabella_Riepilogo.Data.WATER JET", "Tabella_Riepilogo.Data.INTERMAC", "Tabella_Riepilogo.Data.OMAG/GMM", "Tabella_Riepilogo.Data.FRESE", "Tabella_Riepilogo.Data.VARIE"}),
    #"Rimosse colonne1" = Table.RemoveColumns(#"Tabella Tabella_Riepilogo.Data espansa",{"Extension", "Attributes", "Tabella_Riepilogo.Name"}),
    #"Rinominate colonne" = Table.RenameColumns(#"Rimosse colonne1",{{"Tabella_Riepilogo.Data.Cliente", "Nome cliente"}, {"Tabella_Riepilogo.Data.Materiale", "Materiale"}, {"Tabella_Riepilogo.Data.Data ordine", "Data ordine"}, {"Tabella_Riepilogo.Data.Data consegna", "Data consegna"}, {"Tabella_Riepilogo.Data.Autore", "Autore"}, {"Tabella_Riepilogo.Data.Importo", "Importo netto"}, {"Tabella_Riepilogo.Data.Acconto", "Acconto netto"}, {"Tabella_Riepilogo.Data.Saldo", "Dalso netto"}, {"Tabella_Riepilogo.Data.Oggetto", "Oggetto"}}),
    #"Modificato tipo" = Table.TransformColumnTypes(#"Rinominate colonne",{{"Importo netto", Currency.Type}, {"Dalso netto", Int64.Type}, {"Acconto netto", Int64.Type}}),
    #"Rimosse colonne2" = Table.RemoveColumns(#"Modificato tipo",{"Tabella_Riepilogo.Item", "Content"}),
    #"Rinominate colonne1" = Table.RenameColumns(#"Rimosse colonne2",{{"Name", "Name File"}, {"Dalso netto", "Saldo netto"}})
in
    #"Rinominate colonne1"


where I think this line of code is slowing things down
Code:
#"Aggiunta colonna personalizzata" = Table.AddColumn(#"Rimosse colonne",  "Tabella_Riepilogo", each try Excel.Workbook([Content],null, true)  otherwise ""),
because is getting all Excel files content and then in the next steps I'm filtering the rows that contain the table name "TAB_ODP" which is what I'm looking for.
Is there a way to directly get the table (if exists in the file) instead of getting ALL the content and then filtering?

Many thanks in advanced!
 
Last edited:
If the tables in all of the workbooks have the same name, then yes, it should be possible.

Can you provide a sample workbook?
 
If the tables in all of the workbooks have the same name, then yes, it should be possible.

Can you provide a sample workbook?


Yes, the tables in the excel workbooks have the same name but not all the workbooks contain the table. In that case it should just be a blank field.
That's why I used this code
Code:
 each try ([Tabella_Riepilogo.Name] = "TAB_ODP") otherwise "")
.

Attached the current file View attachment ODP_ricerca.xlsx, the query I'm trying yo speed up is ODP_file excel_ODP_NuovoVeloce:

Cattura.PNG
 
I asked if the TABLES had the same name. I will have a look, but I'm not sure I'm going to be the one who can help you.
 
I asked if the TABLES had the same name. I will have a look, but I'm not sure I'm going to be the one who can help you.
Well, i don't understand your reply.
I wrote "yes all the tables have the name but not all the worksheets contain the table". If there's a table with such name I need to pull data from the table. if there isn't, I just take the file name
 
Neither do I, but that's because I'd misread your post!!!

Sorry, can't help you with this - it's above my PQ skill level.
 
In an application of mine I now use the command: Table.AddColumn(Filter_out_temp, "Benutzerdefiniert", each Excel.Workbook([Content]))

When I have expanded the new column afterwards, I can directly filter the spreadsheets I need. But as it is only a very small application, it is always quick for me.
 
[...]
This my origin query:

Code:
let
    Origine = Folder.Files("C:\Users\...myPath....\myFolder")
in
    Origine

and this is the query I use to extract the content of the tables:

Code:
let
    Origine = ODPcartellatutti_i_file,
    #"Filtrate righe1" = Table.SelectRows(Origine, each Text.Contains([Extension], "xls") or Text.Contains([Extension], "XLS")),
    #"Rimosse colonne" = Table.RemoveColumns(#"Filtrate righe1",{"Date accessed", "Date modified", "Date created", "Folder Path"}),
    #"Aggiunta colonna personalizzata" = Table.AddColumn(#"Rimosse colonne", "Tabella_Riepilogo", each try Excel.Workbook([Content],null, true) otherwise ""),
    #"Tabella Tabella_Riepilogo espansa" = Table.ExpandTableColumn(#"Aggiunta colonna personalizzata", "Tabella_Riepilogo", {"Name", "Data", "Item"}, {"Tabella_Riepilogo.Name", "Tabella_Riepilogo.Data", "Tabella_Riepilogo.Item"}),
    #"Filtrate righe" = Table.SelectRows(#"Tabella Tabella_Riepilogo espansa", each try ([Tabella_Riepilogo.Name] = "TAB_ODP") otherwise ""),
    #"Tabella Tabella_Riepilogo.Data espansa" = Table.ExpandTableColumn(#"Filtrate righe", "Tabella_Riepilogo.Data", {"Cliente", "Oggetto", "Materiale", "Data ordine", "Data consegna", "Autore", "Importo", "Acconto", "Saldo", "TAGLIA-B.", "DISCO-FILO", "FILO-SAG", "TELAIO", "WATER JET", "INTERMAC", "OMAG/GMM", "FRESE", "VARIE"}, {"Tabella_Riepilogo.Data.Cliente", "Tabella_Riepilogo.Data.Oggetto", "Tabella_Riepilogo.Data.Materiale", "Tabella_Riepilogo.Data.Data ordine", "Tabella_Riepilogo.Data.Data consegna", "Tabella_Riepilogo.Data.Autore", "Tabella_Riepilogo.Data.Importo", "Tabella_Riepilogo.Data.Acconto", "Tabella_Riepilogo.Data.Saldo", "Tabella_Riepilogo.Data.TAGLIA-B.", "Tabella_Riepilogo.Data.DISCO-FILO", "Tabella_Riepilogo.Data.FILO-SAG", "Tabella_Riepilogo.Data.TELAIO", "Tabella_Riepilogo.Data.WATER JET", "Tabella_Riepilogo.Data.INTERMAC", "Tabella_Riepilogo.Data.OMAG/GMM", "Tabella_Riepilogo.Data.FRESE", "Tabella_Riepilogo.Data.VARIE"}),
    #"Rimosse colonne1" = Table.RemoveColumns(#"Tabella Tabella_Riepilogo.Data espansa",{"Extension", "Attributes", "Tabella_Riepilogo.Name"}),
    #"Rinominate colonne" = Table.RenameColumns(#"Rimosse colonne1",{{"Tabella_Riepilogo.Data.Cliente", "Nome cliente"}, {"Tabella_Riepilogo.Data.Materiale", "Materiale"}, {"Tabella_Riepilogo.Data.Data ordine", "Data ordine"}, {"Tabella_Riepilogo.Data.Data consegna", "Data consegna"}, {"Tabella_Riepilogo.Data.Autore", "Autore"}, {"Tabella_Riepilogo.Data.Importo", "Importo netto"}, {"Tabella_Riepilogo.Data.Acconto", "Acconto netto"}, {"Tabella_Riepilogo.Data.Saldo", "Dalso netto"}, {"Tabella_Riepilogo.Data.Oggetto", "Oggetto"}}),
    #"Modificato tipo" = Table.TransformColumnTypes(#"Rinominate colonne",{{"Importo netto", Currency.Type}, {"Dalso netto", Int64.Type}, {"Acconto netto", Int64.Type}}),
    #"Rimosse colonne2" = Table.RemoveColumns(#"Modificato tipo",{"Tabella_Riepilogo.Item", "Content"}),
    #"Rinominate colonne1" = Table.RenameColumns(#"Rimosse colonne2",{{"Name", "Name File"}, {"Dalso netto", "Saldo netto"}})
in
    #"Rinominate colonne1"


where I think this line of code is slowing things down
Code:
#"Aggiunta colonna personalizzata" = Table.AddColumn(#"Rimosse colonne",  "Tabella_Riepilogo", each try Excel.Workbook([Content],null, true)  otherwise ""),
What makes you feel, this is your brake-line? To access data in Excel-Files, of course you need to 'look into' the files, one by one. Therefore I don't think of any way to avoid such a 'look-into'-step automatically.

Anyhow, do you have a chance to modify your source-files? What I could imagine, is, to edit the file-properties of the source-files - so that it indicates presence / absence of a (the desired?) table in the file-properties. I guess(!), checking file properties will be faster than opening the files. This way, you could filter your source files and only open the ones which really contain a table. These - of course - you MUST open anyhow. But I don't know of an (externally available) property to automatically indicate the existence of a table.

Ok, just guessing. Excel-files since XL2007 (with .xlsx-extension) are zipped files (with something like a folder-structure). When you open such a file with a zip-application (7-zip, for instance), you can have a more detailed look on the contents. Maybe(!) here you could see, too, whether a file carries a (or multiple?) tables. Therefore, an external application might be able to create a list (or something else), telling, which files have table(s), which don't - but, as I said: pure guessing!

Best regards,
Rainer

p.s.: just a remark: the second functional line of your code I would modify:
Code:
#"Filtrate righe1" = Table.SelectRows(Origine, each Text.Contains(Text.Upper([Extension]), "XLS")),
This avoids that files with mixed upper/lower extension might get lost...
 
What makes you feel, this is your brake-line? To access data in Excel-Files, of course you need to 'look into' the files, one by one. Therefore I don't think of any way to avoid such a 'look-into'-step automatically.

Anyhow, do you have a chance to modify your source-files? What I could imagine, is, to edit the file-properties of the source-files - so that it indicates presence / absence of a (the desired?) table in the file-properties. I guess(!), checking file properties will be faster than opening the files. This way, you could filter your source files and only open the ones which really contain a table. These - of course - you MUST open anyhow. But I don't know of an (externally available) property to automatically indicate the existence of a table.

Ok, just guessing. Excel-files since XL2007 (with .xlsx-extension) are zipped files (with something like a folder-structure). When you open such a file with a zip-application (7-zip, for instance), you can have a more detailed look on the contents. Maybe(!) here you could see, too, whether a file carries a (or multiple?) tables. Therefore, an external application might be able to create a list (or something else), telling, which files have table(s), which don't - but, as I said: pure guessing!

Best regards,
Rainer

p.s.: just a remark: the second functional line of your code I would modify:
Code:
#"Filtrate righe1" = Table.SelectRows(Origine, each Text.Contains(Text.Upper([Extension]), "XLS")),
This avoids that files with mixed upper/lower extension might get lost...

Thanks for Text.Upper trick!
 
I have now tested this with a directory containing more than 1,000 workbooks. When I create a list of contents with the command "Excel.Workbook", this happens very quickly. The subsequent filtering of tables is also very fast. Only when I want to load this list, it takes a very long time. Even if I do not expand the filtered tables. It seems as if all the workbooks are read when loading, although this is completely unnecessary.
 
Back
Top