Stripping data from an imported file

Posted on August 13th, 2007 in Excel, General by Ken Puls

Most of us, when we start dabbling with Excel macros, probably take the following steps:

  • Record code
  • Begin to make minor edits to the code
  • Eventually learn about loops and start using them everywhere
  • Learn that loops are "expensive" and cost time to do so start looking for fast ways
  • Discover "screenupdating=false" to cut loop execution time a bit
  • Start using Excel's built in functions to avoid loops and cut time significantly

This is pretty much the way my leaning went anyway. I now try to avoid loops wherever possible, and try to increase execution speed as I just hate waiting for my stuff to complete.

I work with a lot of ASCII files dumped out of our Property Management System, often pulling them into Excel to work with. One example of this is our Trial Balance. Because of the way the vendor built the software, it creates a Trial Balance with every account repeated for every department, even if our chart of accounts doesn't have that combination of account/department codes. This means that when I print it to file, my trial balance is over 15,700 lines long.

I actually use this report in a few different ways, and sometimes can need to update it several times in one day. Naturally I wrote a macro to go through the file, strip out garbage lines, reformat the entry so it can be used as I need it and a couple of other things. Like a total neophyte, I opened it in Excel, then proceeded to loop through every line. The end result, based on a single timer test, was 132.5 seconds to complete the process. 2.2 minutes of staring at the screen waiting for it to complete each time I ran it.

At the MVP summit, I had a discussion with Damon Longworth about doing this kind of task using Autofilters, but he recommended streaming it in from the text file and checking the data before it ever hit Excel. I decided to give it a try.

Despite re-writing to do the task (which I had to do anyway due to a change), I actually needed more case statements and if statements to work outside Excel than I did if I pulled the data in. Regardless, the execution time cut down from over 2 minutes to 25.5 seconds. Pretty darn cool. :)

For reference, I started with (what I believe was) Bernie Dietrick's large file import code, and modified it to do my stuff. His code, which I originally worked with is shown below:

Visual Basic:
  1. Sub LargeFileImport(Filename As String)
  2. 'Bernie Deitrick's code for opening very large text files in Excel
  3. 'Dimension Variables
  4. Dim ResultStr As String
  5. Dim FileNum As Integer
  6. Dim Counter As Double
  7.  
  8. 'Get Next Available File Handle Number
  9. FileNum = FreeFile()
  10.  
  11. 'Open Text File For Input
  12. Open Filename For Input As #FileNum
  13.  
  14. 'Turn Screen Updating Off
  15. Application.ScreenUpdating = False
  16.  
  17. 'Create A New WorkBook With One Worksheet In It
  18. Workbooks.Add Template:=xlWorksheet
  19.  
  20. 'Set The Counter to 1
  21. Counter = 1
  22.  
  23. 'Loop Until the End Of File Is Reached
  24. Do While Seek(FileNum) <= LOF(FileNum)
  25.  
  26. 'Display Importing Row Number On Status Bar
  27. Application.StatusBar = "Importing Row " & Counter & " of text file " & Filename
  28.  
  29. 'Store One Line Of Text From File To Variable
  30. Line Input #FileNum, ResultStr
  31.  
  32. 'Store Variable Data Into Active Cell
  33. If Left(ResultStr, 1) = "=" Then
  34. ActiveCell.Value = "'" & ResultStr
  35. Else
  36. ActiveCell.Value = ResultStr
  37. End If
  38. If ActiveCell.Row = Activesheet.Rows.Count Then
  39. 'If On The Last Row Then Add A New Sheet
  40. ActiveWorkbook.Sheets.Add
  41. Else
  42. 'If Not The Last Row Then Go One Cell Down
  43. ActiveCell.Offset(1, 0).Select
  44. End If
  45.  
  46. 'Increment the Counter By 1
  47. Counter = Counter + 1
  48.  
  49. 'Start Again At Top Of 'Do While' Statement
  50. Loop
  51.  
  52. 'Close The Open Text File
  53. Close
  54. End Sub

2 Responses to 'Stripping data from an imported file'

Subscribe to comments with RSS or TrackBack to 'Stripping data from an imported file'.

  1. Matt Vidas said,

    on August 24th, 2007 at 9:23 am

    Hey Ken,

    You really need to get more into regular expressions :) I take a 100,000 line account dump from our GL system, parse the thing with regexp, then summarize in excel and split into relevant text files in about 10 seconds. My monthly 450,000+ line data dump does take about 25 seconds to do, but there are too many calculations in there to speed it up anymore. You know how to reach me if you want to know more :)

  2. Ken Puls said,

    on August 24th, 2007 at 6:50 pm

    LOL!

    You're right, Matt. I should get into regExp. As soon as I have the time to re-write again, I'll call you up. :)

Post a comment