Results 1 to 2 of 2

Thread: Excel 2010 Active Column CGU, Active Row 1048572 = 2.34 billion cells, can't delete

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Neophyte Andrew's Avatar
    Join Date
    Apr 2015
    Location
    Calgary
    Posts
    2
    Articles
    0

    Excel 2010 Active Column CGU, Active Row 1048572 = 2.34 billion cells, can't delete

    I have an enormous workbook with a large number of interdependent worksheets, a beast that has grown more ungainly over time. The pre-optimization full calc speed on a dual-core i7 with 16gb was 85 seconds on 64-bit Excel (average of 10 attempts). The only way to use it is to have calculation on manual and just update as required. It is an unpleasant tool to use, hence the pursuit of optimization.

    In running individual sheet timer calcs, one worksheet stuck out like a sore thumb. Using control-end to find the last active cell puts me in column CGU (2281) and row 1048572 which translates to more than 2.3 billion cells in the active range. I have less than 500,000 actually in use. Ideally, I should be able to highlight the empty areas and delete them, or use one of the available tools to reset the active range, but nothing has worked. Highlighting a single row (less cells than highlighting a single column) grinds the machine to a halt but will eventually allow you to delete it. But I have roughly 1 million rows to delete, so that will take a very long time, and I'd still be left with the thousands of extraneous columns. This grinding behaviour happens even with calculations set to manual.

    I'm resigned to the fact that I will have to replace the worksheet entirely, but the number of interdependent formulas spread through the balance of the workbook make this a horrifying prospect as it would take days to find all of the broken links.

    I am looking for suggestions on how I might minimize the pain. Also looking for insight as to why Excel freezes when in manual mode just highlighting an empty row or column. On any other sheet, the same action does not elicit the same response.

  2. #2
    Neophyte Andrew's Avatar
    Join Date
    Apr 2015
    Location
    Calgary
    Posts
    2
    Articles
    0
    [solved]!! I'd edit the title but can't figure out quite how.

    So after 3 days of digging, experimenting and googling the far reaches of MSDN, I stumbled across the key tip: changing the file extension from .xlsm to .zip (yes) would convert the file to a zip folder stuffed with XML files. A quick check confirmed exactly that, and with the help of Notepad and some easy sleuthing, the problem worksheet was discovered - the key being it's massive, stand-out 250,000 kb file size. The document was so large, it crashed Word and XML notedpad, but left to it's own devices long enough, plain old notepad eventually opened it up. Contained within were, quite literally, billions of cell descriptions.

    The descriptions are laid out in rows: "row r = a1" [cell and contents description], "r=a2" and so on until it got to "r=CGU1048572". Dealing with the extraneous rows appeared easy enough, in theory. I searched for the last used row and highlighted all of the <sheetdata> down to </sheetdata> and deleted it. That left me with 2281 columns on 50,000 rows, but it was better than what I had before. I saved the file, changed the .zip back to .xlsm and....made Excel unhappy. It complained about unreadable data and removed something or other. Back to the drawing board.

    I continued my fight with the XML files and as a neophyte Python programmer, it was suggested by a co-worker that I write a script to trim all of the offending data out. However as a Codecademy-qualified Python "programmer", the 13 hours of lessons hardly qualified me to it on. I spent some time examining the various XML parsing modules like elemenTree, and saw some real potential before finally calling it a night. As is often the case, I have my best ideas in that groggy moment between a night of sleep and the alarm not quite gone off yet.

    When I hopped out of bed with an absurd level of excitement for a pre-coffee Saturday morning, I had a new plan. I fired up the workbook and painstakingly duplicate the relevant data in a new worksheet, then saved the workbook as a new file. changed the extension, went back inside the XML files, found the XML file for the duplicate sheet, extracted that file only so it could be renamed, then dropped it back in, over-writing the 250,000 kb albatross in the process. Change extension back to xlsm, open the file and...more complaints from Excel, this time about the calcChain losing data. The calcChain.xml file lists, cell by cell, I believe, every cell check in order. Which meant that contained within the chain were references to the billions of cells I'd just made obsolete. By now I had enough knowledge to Python import and parse the file, but after an hour it was still outputting the references so clearly my Python foo needs extensive development. What to do with a massive calculation chain that won't cooperate? I tried a handful of attempts at editing it to no avail but once again Google to the rescue. Short version: if the calcChain references are broken, Excel gets upset, but if the calcChain.xml file is missing all together? It makes a new one without complaint. Delete calcChain.xml, open the xlsm up in Excel and...no complaints! Success!

    But man was it slow. So slow. Impossibly slow. A re-run of the calculation timer exercise shot up from the previous semi-optimized 12 seconds to 87, then 92, then 147??? What's going on? I imagined a call to IT support, their first response, as always, "have you tried rebooting your computer?". With nothing to lose, I saved my excruciatingly slow file and rebooted. Straight into Excel, straight into the file and wow that opened fast. Calc check? 5 seconds! Holy happy dance. Stoked. Granted, at 5 seconds, it's not fast, but considering it had been chugging along at a pre-optimized 87 seconds, which meant operating entirely in manual calc mode all the time except for updates, for ages - I am over the moon. Learned about XML parsing in Python, that .xlsm is just zip folder of .XML files and I fixed those 2.34 billion extraneous cells!

    Couldn't be happier.

    By duplicating the data (formulas and tables contained within the sheet), the work was fairly easy - quite literally copy and paste the pivot tables and the handful of formulas that reside in a single row and autofill the balance. The new sheet had nothing linking into out, only out of it, so deleting the old sheet within Excel would have thrown all the formulas in all the other sheets that linked to that one into a frenzy. I'd have been hunting broken links for ages. By creating this simple duplicate, and then over-writing the offending table at the XML file level, nothing changed for the rest of the worksheets. Their formulas still pointed to the correct cell on the correctly named sheet! No broken links whatsoever. Dead easy once you know how.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •