Excel or Access or both? Insane number of rows and columns = very slow

Adnandos

New member
Joined
Jun 12, 2018
Messages
25
Reaction score
0
Points
0
Excel Version(s)
365
Hi all

I work with an enormous amount of data monthly in Excel, and am looking at ways of making my processing more efficient.

The following facts apply:

Workbook 1:

I received report 1 from a third party that is live, i.e. contains cumulative data in excel
  • In this excel workbook there are 6 tabs, each with about 20 columns
  • For the purposes of this thread I will only elaborate on worksheet POS (however each worksheet goes through the same processes)
    • POS contains about 120,000 rows and 20 columns
    • POS does not contain a field for "Location", and the third party does not maintain nor will they ever provide the location info.
  • This spreadsheet is provided monthly
  • In order to add the locations for each of these 120,000 rows, I do the following
    • Vlookup each row to the spreadsheet provided in the previous month which contains my completed location allocations as at the end of the prior month ("Workbook Prior Month")
    • Vlookup each row to a separate live workbook ("CDST") for the "new" data for that month
    • After performing the above two checks, I then have the locations for every row in that workbook
  • This process takes an obscene amount of time because of the number of rows and columns, and tabs, involved, i.e. formulas can run for an hour or more
  • Once the locations have been allocated to all tabs and all rows, probably 200,000 rows combined, I save this down as Workbook 2.
    • Within each of the 6 tabs I then run a pivot table to get a summary of numbers by location.

Workbook 3:


  • Internally generated one-page excel workbook which is a summary of all the pivot table summaries in Workbook 2.
  • This is not a timeous task as it is simply a copy paste from each individual pivot in Workbook 2.

Problem:

  • The activities involved in adding the locations take far too long, particularly the vlookup to the completed "Workbook Prior Month".
  • The vlookup to the "CDST" isn't as bad because CDST will contain ~1500 rows max for that month.

Question

  • How can the above problem best be resolved to reduce the time spent pulling data from Workbook Prior Month into Workbook 1 Current Month
  • Is MS access an option? (to serve as the "Workbook Prior Month")
    • If so, will I be able to "vlookup" from the externally provided Workbook 1 into the database to pull all the store locations?
    • Is this even the best way?
  • Please note, under no circumstances will the third party change the format of its report, not their problem as they see it.

Happy to give more info where I wasn't clear enough.

Thanks in advance.
 
The activities involved in adding the locations take far too long, particularly the vlookup to the completed "Workbook Prior Month".

As your project seems rather complicated, I haven't understood it all.
Have you tried replacing the notoriously slow VLOOKUP with INDEX/MATCH combinations?
Are you using entire columns for your lookups?

Perhaps POWERPIVOT would be an option?
 
As your project seems rather complicated, I haven't understood it all.
Have you tried replacing the notoriously slow VLOOKUP with INDEX/MATCH combinations?
Are you using entire columns for your lookups?

Perhaps POWERPIVOT would be an option?

Hi there

I have never used index/match - any pointers?

I have started reading about powerpivot, where exactly do you see powerpivot being useful in the processes above?
 
Adnandos

I am with Pecoflyer on not fully grasping what you are actually doing. If Index/Match does not quite do it for you consider sending some fake data, exact structure, on the sheets you are using Vlookup on as I think you can set it and forget it if you use merging within power query however without seeing the data it is notoriously difficult to help.
 
Adnandos

I am with Pecoflyer on not fully grasping what you are actually doing. If Index/Match does not quite do it for you consider sending some fake data, exact structure, on the sheets you are using Vlookup on as I think you can set it and forget it if you use merging within power query however without seeing the data it is notoriously difficult to help.

Hi again

I've used Power Query to pull the columns I need and it is so much faster!

On power query I have a question.
I've imported/queried data from a file for the month of June, which only has a few tabs, the displayed result is this new table 1.

I ran a second import/query from a file for the month of May, all the exact same headers, the displayed result is a new table 2.

How can I combine table 2 into table 1? i.e just combine the number of rows? Don't want additional columns, they're all identical.
 
@adnandos
Please do not quote entire posts unnecessarily. They make thread hard to read.
I will move your thread to the PQ forum.
In the future please start a new thread when you have a new question, eventually referring to the original one. You will get faster answers
Thanks
 
AM
Adnandos

Consider saving both as connections not tables to reduce potential bloat, and am assuming that you have taken care of the change data step in both and have named them appropriately. You can merge either from Excel by highlighting query and RHK selecting merge or from with PQ in the query pane LHSide. Chose the column common to both and probably best to use a left outer join (may have to select 2 columns if you have say account and Dept to find a unique field avoids the necessity of concatenating fields in both tables).

Note there is a reason that your doctor does not do telephone diagnosis they have you come into the surgery such that they can make a proper assessment before suggesting a treatment. By you not including fake data (visiting the surgery) you are in effect asking me to diagnose over the phone which is often prone to error and actually takes a lot longer to do. Seriously consider sending data ( 5 lines per table is sufficient) as mostly this stuff takes at most a couple of minutes to do and about 5 times longer to type best guestimate. This last note is not aimed solely at you Adnandos it is for everybody that requires help....there is an easy way... data and a really hard way ....no data, lets make this easy!
 
View attachment June 2018 - monthly report for sharing - Copy.xlsx

Hi all

Hope I attached the file correctly, and totally agree on needing an example, new to the forums, still navigating my way around.

Let me try to explain the attached file, the "fresh" version of these spreadsheets are received monthly from a third party, I update it with a few columns and add the store/location for each line using the work I did on previous month's file.

You will notice 6 tabs, all tabs with identical column headers.

In each tab I add a few columns at the end beginning with the column Concat 1.

I am finding the previous suggestion of Power Query may be the most appropriate way to handle my data going forward, it is already working out much better.

I've added concat 2 as a column to serve as the column to use when deleting duplicates on importing through Power Query.

At this point I think what I need is a way to consolidate all the tabs through PQ into one worksheet with all duplicates in concat 2 removed.

I only retain about 5 chosen columns on importing through PQ.

Given that this is a monthly file, I need to also find a way to consolidate all such monthly files, which are identical in layout.

So if I have a folder with all these files, I'd love a solution to be able to consolidate every tab in every file into just one sheet with duplicates removed [New Consol File].
 
Think this is what you need, if not let me know and will adjust

Here is the code (from File, From workbook). In the attached workbook

let
Source = Excel.Workbook(File.Contents("C:\Users\eamon\Downloads\From Gmail\June 2018 - monthly report for sharing - Copy.xlsx"), null, true),
#"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Sheet")),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Item], "Cat ")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows1",{"Name", "Data"}),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5", "Data.Column6", "Data.Column7", "Data.Column8", "Data.Column9", "Data.Column10", "Data.Column11", "Data.Column12", "Data.Column13", "Data.Column14", "Data.Column15", "Data.Column16", "Data.Column17", "Data.Column18"}),
CurrentNameRow0 = #"Expanded Data"[Name]{0},
#"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true]),
#"Renamed Columns" = Table.RenameColumns(#"Promoted Headers",{{CurrentNameRow0, "File Name"}}),
#"Filtered Rows2" = Table.SelectRows(#"Renamed Columns", each ([Dealer Code] = "Code 1")),
#"Removed Duplicates" = Table.Distinct(#"Filtered Rows2", {"Concat 2"})
in
#"Removed Duplicates"


And from folder would be the way to go. Just note rename from folder query something like Filelist, and future proof it. Then reference that query and give it a name say Consolidated. Remove change data step from Consolidated master query (last step). Do all transformations within the Consolidated Sample query. Then close and load the puppy!

Knock 'em Dead Tiger!
 

Attachments

  • June 2018 Companion.xlsm
    19.9 KB · Views: 14
Your welcome,

Knock 'em dead Tiger!
 
Back
Top