Results 1 to 10 of 10

Thread: Get a table from each Excel file in folder | speed up the process

  1. #1
    Seeker marcob8986's Avatar
    Join Date
    Dec 2020
    Posts
    7
    Articles
    0
    Excel Version
    365

    Get a table from each Excel file in folder | speed up the process



    Register for a FREE account, and/
    or Log in to avoid these ads!

    Hi,
    I've setup a query to read the content of a Table inside several Excel files in a folder.
    My problem is that those files are quite big (3MB+ each) and when the folder contains many files the process is getting really slow.
    Is there a way to speed things up in M code?

    This my origin query:

    Code:
    let
        Origine = Folder.Files("C:\Users\...myPath....\myFolder")
    in
        Origine
    and this is the query I use to extract the content of the tables:

    Code:
    let
        Origine = ODPcartellatutti_i_file,
        #"Filtrate righe1" = Table.SelectRows(Origine, each Text.Contains([Extension], "xls") or Text.Contains([Extension], "XLS")),
        #"Rimosse colonne" = Table.RemoveColumns(#"Filtrate righe1",{"Date accessed", "Date modified", "Date created", "Folder Path"}),
        #"Aggiunta colonna personalizzata" = Table.AddColumn(#"Rimosse colonne", "Tabella_Riepilogo", each try Excel.Workbook([Content],null, true) otherwise ""),
        #"Tabella Tabella_Riepilogo espansa" = Table.ExpandTableColumn(#"Aggiunta colonna personalizzata", "Tabella_Riepilogo", {"Name", "Data", "Item"}, {"Tabella_Riepilogo.Name", "Tabella_Riepilogo.Data", "Tabella_Riepilogo.Item"}),
        #"Filtrate righe" = Table.SelectRows(#"Tabella Tabella_Riepilogo espansa", each try ([Tabella_Riepilogo.Name] = "TAB_ODP") otherwise ""),
        #"Tabella Tabella_Riepilogo.Data espansa" = Table.ExpandTableColumn(#"Filtrate righe", "Tabella_Riepilogo.Data", {"Cliente", "Oggetto", "Materiale", "Data ordine", "Data consegna", "Autore", "Importo", "Acconto", "Saldo", "TAGLIA-B.", "DISCO-FILO", "FILO-SAG", "TELAIO", "WATER JET", "INTERMAC", "OMAG/GMM", "FRESE", "VARIE"}, {"Tabella_Riepilogo.Data.Cliente", "Tabella_Riepilogo.Data.Oggetto", "Tabella_Riepilogo.Data.Materiale", "Tabella_Riepilogo.Data.Data ordine", "Tabella_Riepilogo.Data.Data consegna", "Tabella_Riepilogo.Data.Autore", "Tabella_Riepilogo.Data.Importo", "Tabella_Riepilogo.Data.Acconto", "Tabella_Riepilogo.Data.Saldo", "Tabella_Riepilogo.Data.TAGLIA-B.", "Tabella_Riepilogo.Data.DISCO-FILO", "Tabella_Riepilogo.Data.FILO-SAG", "Tabella_Riepilogo.Data.TELAIO", "Tabella_Riepilogo.Data.WATER JET", "Tabella_Riepilogo.Data.INTERMAC", "Tabella_Riepilogo.Data.OMAG/GMM", "Tabella_Riepilogo.Data.FRESE", "Tabella_Riepilogo.Data.VARIE"}),
        #"Rimosse colonne1" = Table.RemoveColumns(#"Tabella Tabella_Riepilogo.Data espansa",{"Extension", "Attributes", "Tabella_Riepilogo.Name"}),
        #"Rinominate colonne" = Table.RenameColumns(#"Rimosse colonne1",{{"Tabella_Riepilogo.Data.Cliente", "Nome cliente"}, {"Tabella_Riepilogo.Data.Materiale", "Materiale"}, {"Tabella_Riepilogo.Data.Data ordine", "Data ordine"}, {"Tabella_Riepilogo.Data.Data consegna", "Data consegna"}, {"Tabella_Riepilogo.Data.Autore", "Autore"}, {"Tabella_Riepilogo.Data.Importo", "Importo netto"}, {"Tabella_Riepilogo.Data.Acconto", "Acconto netto"}, {"Tabella_Riepilogo.Data.Saldo", "Dalso netto"}, {"Tabella_Riepilogo.Data.Oggetto", "Oggetto"}}),
        #"Modificato tipo" = Table.TransformColumnTypes(#"Rinominate colonne",{{"Importo netto", Currency.Type}, {"Dalso netto", Int64.Type}, {"Acconto netto", Int64.Type}}),
        #"Rimosse colonne2" = Table.RemoveColumns(#"Modificato tipo",{"Tabella_Riepilogo.Item", "Content"}),
        #"Rinominate colonne1" = Table.RenameColumns(#"Rimosse colonne2",{{"Name", "Name File"}, {"Dalso netto", "Saldo netto"}})
    in
        #"Rinominate colonne1"

    where I think this line of code is slowing things down
    Code:
    #"Aggiunta colonna personalizzata" = Table.AddColumn(#"Rimosse colonne",  "Tabella_Riepilogo", each try Excel.Workbook([Content],null, true)  otherwise ""),
    because is getting all Excel files content and then in the next steps I'm filtering the rows that contain the table name "TAB_ODP" which is what I'm looking for.
    Is there a way to directly get the table (if exists in the file) instead of getting ALL the content and then filtering?

    Many thanks in advanced!
    Last edited by marcob8986; 2021-04-15 at 09:59 AM. Reason: title typo

  2. #2
    Administrator AliGW's Avatar
    Join Date
    Nov 2015
    Location
    Ipswich, Suffolk, England
    Posts
    1,651
    Articles
    0
    Excel Version
    Office 365 Subscription
    If the tables in all of the workbooks have the same name, then yes, it should be possible.

    Can you provide a sample workbook?
    Ali
    Enthusiastic self-taught user of MS Excel!

  3. #3
    Seeker marcob8986's Avatar
    Join Date
    Dec 2020
    Posts
    7
    Articles
    0
    Excel Version
    365
    Quote Originally Posted by AliGW View Post
    If the tables in all of the workbooks have the same name, then yes, it should be possible.

    Can you provide a sample workbook?

    Yes, the tables in the excel workbooks have the same name but not all the workbooks contain the table. In that case it should just be a blank field.
    That's why I used this code
    Code:
     each try ([Tabella_Riepilogo.Name] = "TAB_ODP") otherwise "")
    .

    Attached the current file ODP_ricerca.xlsx, the query I'm trying yo speed up is ODP_file excel_ODP_NuovoVeloce:

    Click image for larger version. 

Name:	Cattura.PNG 
Views:	7 
Size:	15.1 KB 
ID:	10434

  4. #4
    Administrator AliGW's Avatar
    Join Date
    Nov 2015
    Location
    Ipswich, Suffolk, England
    Posts
    1,651
    Articles
    0
    Excel Version
    Office 365 Subscription
    I asked if the TABLES had the same name. I will have a look, but I'm not sure I'm going to be the one who can help you.
    Ali
    Enthusiastic self-taught user of MS Excel!

  5. #5
    Seeker marcob8986's Avatar
    Join Date
    Dec 2020
    Posts
    7
    Articles
    0
    Excel Version
    365
    Quote Originally Posted by AliGW View Post
    I asked if the TABLES had the same name. I will have a look, but I'm not sure I'm going to be the one who can help you.
    Well, i don't understand your reply.
    I wrote "yes all the tables have the name but not all the worksheets contain the table". If there's a table with such name I need to pull data from the table. if there isn't, I just take the file name

  6. #6
    Administrator AliGW's Avatar
    Join Date
    Nov 2015
    Location
    Ipswich, Suffolk, England
    Posts
    1,651
    Articles
    0
    Excel Version
    Office 365 Subscription
    Neither do I, but that's because I'd misread your post!!!

    Sorry, can't help you with this - it's above my PQ skill level.
    Ali
    Enthusiastic self-taught user of MS Excel!

  7. #7
    Conjurer pinarello's Avatar
    Join Date
    Jun 2019
    Location
    Germany
    Posts
    112
    Articles
    0
    Excel Version
    Office 365
    In an application of mine I now use the command: Table.AddColumn(Filter_out_temp, "Benutzerdefiniert", each Excel.Workbook([Content]))

    When I have expanded the new column afterwards, I can directly filter the spreadsheets I need. But as it is only a very small application, it is always quick for me.

  8. #8
    Neophyte Rainer's Avatar
    Join Date
    Apr 2021
    Posts
    2
    Articles
    0
    Excel Version
    365
    Quote Originally Posted by marcob8986 View Post
    [...]
    This my origin query:

    Code:
    let
        Origine = Folder.Files("C:\Users\...myPath....\myFolder")
    in
        Origine
    and this is the query I use to extract the content of the tables:

    Code:
    let
        Origine = ODPcartellatutti_i_file,
        #"Filtrate righe1" = Table.SelectRows(Origine, each Text.Contains([Extension], "xls") or Text.Contains([Extension], "XLS")),
        #"Rimosse colonne" = Table.RemoveColumns(#"Filtrate righe1",{"Date accessed", "Date modified", "Date created", "Folder Path"}),
        #"Aggiunta colonna personalizzata" = Table.AddColumn(#"Rimosse colonne", "Tabella_Riepilogo", each try Excel.Workbook([Content],null, true) otherwise ""),
        #"Tabella Tabella_Riepilogo espansa" = Table.ExpandTableColumn(#"Aggiunta colonna personalizzata", "Tabella_Riepilogo", {"Name", "Data", "Item"}, {"Tabella_Riepilogo.Name", "Tabella_Riepilogo.Data", "Tabella_Riepilogo.Item"}),
        #"Filtrate righe" = Table.SelectRows(#"Tabella Tabella_Riepilogo espansa", each try ([Tabella_Riepilogo.Name] = "TAB_ODP") otherwise ""),
        #"Tabella Tabella_Riepilogo.Data espansa" = Table.ExpandTableColumn(#"Filtrate righe", "Tabella_Riepilogo.Data", {"Cliente", "Oggetto", "Materiale", "Data ordine", "Data consegna", "Autore", "Importo", "Acconto", "Saldo", "TAGLIA-B.", "DISCO-FILO", "FILO-SAG", "TELAIO", "WATER JET", "INTERMAC", "OMAG/GMM", "FRESE", "VARIE"}, {"Tabella_Riepilogo.Data.Cliente", "Tabella_Riepilogo.Data.Oggetto", "Tabella_Riepilogo.Data.Materiale", "Tabella_Riepilogo.Data.Data ordine", "Tabella_Riepilogo.Data.Data consegna", "Tabella_Riepilogo.Data.Autore", "Tabella_Riepilogo.Data.Importo", "Tabella_Riepilogo.Data.Acconto", "Tabella_Riepilogo.Data.Saldo", "Tabella_Riepilogo.Data.TAGLIA-B.", "Tabella_Riepilogo.Data.DISCO-FILO", "Tabella_Riepilogo.Data.FILO-SAG", "Tabella_Riepilogo.Data.TELAIO", "Tabella_Riepilogo.Data.WATER JET", "Tabella_Riepilogo.Data.INTERMAC", "Tabella_Riepilogo.Data.OMAG/GMM", "Tabella_Riepilogo.Data.FRESE", "Tabella_Riepilogo.Data.VARIE"}),
        #"Rimosse colonne1" = Table.RemoveColumns(#"Tabella Tabella_Riepilogo.Data espansa",{"Extension", "Attributes", "Tabella_Riepilogo.Name"}),
        #"Rinominate colonne" = Table.RenameColumns(#"Rimosse colonne1",{{"Tabella_Riepilogo.Data.Cliente", "Nome cliente"}, {"Tabella_Riepilogo.Data.Materiale", "Materiale"}, {"Tabella_Riepilogo.Data.Data ordine", "Data ordine"}, {"Tabella_Riepilogo.Data.Data consegna", "Data consegna"}, {"Tabella_Riepilogo.Data.Autore", "Autore"}, {"Tabella_Riepilogo.Data.Importo", "Importo netto"}, {"Tabella_Riepilogo.Data.Acconto", "Acconto netto"}, {"Tabella_Riepilogo.Data.Saldo", "Dalso netto"}, {"Tabella_Riepilogo.Data.Oggetto", "Oggetto"}}),
        #"Modificato tipo" = Table.TransformColumnTypes(#"Rinominate colonne",{{"Importo netto", Currency.Type}, {"Dalso netto", Int64.Type}, {"Acconto netto", Int64.Type}}),
        #"Rimosse colonne2" = Table.RemoveColumns(#"Modificato tipo",{"Tabella_Riepilogo.Item", "Content"}),
        #"Rinominate colonne1" = Table.RenameColumns(#"Rimosse colonne2",{{"Name", "Name File"}, {"Dalso netto", "Saldo netto"}})
    in
        #"Rinominate colonne1"

    where I think this line of code is slowing things down
    Code:
    #"Aggiunta colonna personalizzata" = Table.AddColumn(#"Rimosse colonne",  "Tabella_Riepilogo", each try Excel.Workbook([Content],null, true)  otherwise ""),
    What makes you feel, this is your brake-line? To access data in Excel-Files, of course you need to 'look into' the files, one by one. Therefore I don't think of any way to avoid such a 'look-into'-step automatically.

    Anyhow, do you have a chance to modify your source-files? What I could imagine, is, to edit the file-properties of the source-files - so that it indicates presence / absence of a (the desired?) table in the file-properties. I guess(!), checking file properties will be faster than opening the files. This way, you could filter your source files and only open the ones which really contain a table. These - of course - you MUST open anyhow. But I don't know of an (externally available) property to automatically indicate the existence of a table.

    Ok, just guessing. Excel-files since XL2007 (with .xlsx-extension) are zipped files (with something like a folder-structure). When you open such a file with a zip-application (7-zip, for instance), you can have a more detailed look on the contents. Maybe(!) here you could see, too, whether a file carries a (or multiple?) tables. Therefore, an external application might be able to create a list (or something else), telling, which files have table(s), which don't - but, as I said: pure guessing!

    Best regards,
    Rainer

    p.s.: just a remark: the second functional line of your code I would modify:
    Code:
    #"Filtrate righe1" = Table.SelectRows(Origine, each Text.Contains(Text.Upper([Extension]), "XLS")),
    This avoids that files with mixed upper/lower extension might get lost...

  9. #9
    Seeker marcob8986's Avatar
    Join Date
    Dec 2020
    Posts
    7
    Articles
    0
    Excel Version
    365
    Quote Originally Posted by Rainer View Post
    What makes you feel, this is your brake-line? To access data in Excel-Files, of course you need to 'look into' the files, one by one. Therefore I don't think of any way to avoid such a 'look-into'-step automatically.

    Anyhow, do you have a chance to modify your source-files? What I could imagine, is, to edit the file-properties of the source-files - so that it indicates presence / absence of a (the desired?) table in the file-properties. I guess(!), checking file properties will be faster than opening the files. This way, you could filter your source files and only open the ones which really contain a table. These - of course - you MUST open anyhow. But I don't know of an (externally available) property to automatically indicate the existence of a table.

    Ok, just guessing. Excel-files since XL2007 (with .xlsx-extension) are zipped files (with something like a folder-structure). When you open such a file with a zip-application (7-zip, for instance), you can have a more detailed look on the contents. Maybe(!) here you could see, too, whether a file carries a (or multiple?) tables. Therefore, an external application might be able to create a list (or something else), telling, which files have table(s), which don't - but, as I said: pure guessing!

    Best regards,
    Rainer

    p.s.: just a remark: the second functional line of your code I would modify:
    Code:
    #"Filtrate righe1" = Table.SelectRows(Origine, each Text.Contains(Text.Upper([Extension]), "XLS")),
    This avoids that files with mixed upper/lower extension might get lost...
    Thanks for Text.Upper trick!

  10. #10
    Conjurer pinarello's Avatar
    Join Date
    Jun 2019
    Location
    Germany
    Posts
    112
    Articles
    0
    Excel Version
    Office 365
    I have now tested this with a directory containing more than 1,000 workbooks. When I create a list of contents with the command "Excel.Workbook", this happens very quickly. The subsequent filtering of tables is also very fast. Only when I want to load this list, it takes a very long time. Even if I do not expand the filtered tables. It seems as if all the workbooks are read when loading, although this is completely unnecessary.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •