Merging multiple XML files

WizzardOfOz

New member
Joined
Sep 4, 2013
Messages
184
Reaction score
0
Points
0
Location
Australia
Excel Version(s)
Office 365
I've downloaded several large XML files (about 40 M each), each year is treated separate in the the files i.e. each node in the XML files has the structure of:

<report>
<Year>2000<Year>
rest of xml follows

The files change infrequently, (maybe twice a year). I need to statistically analyse the files. Should I combine the files into one large file (about 500 MB) or simply loop over each year?
Is below the best way to join the files and what am I doing wrong that it is not saving the file?

Thanks


Code:
Sub JoinXML()
Const mPath As String = "C:\tmp\data_"


Dim XDoc As MSXML2.DOMDocument60, XCombined As MSXML2.DOMDocument60
Dim Level1 As MSXML2.IXMLDOMNode, RootElement As MSXML2.IXMLDOMNode
Dim yr As Long


    Set XCombined = New MSXML2.DOMDocument60        'the new file
    Set RootElement = XCombined.createElement("data")    ' the root
    
    Set XDoc = New MSXML2.DOMDocument60             'the existing year
    
    For yr = 1999 To 2013
        XDoc.async = False
        XDoc.validateOnParse = False
        
        XDoc.Load (mPath & yr & ".xml")
        For Each Level1 In XDoc.DocumentElement.ChildNodes
            If Level1.NodeType = NODE_ELEMENT Then Call RootElement.appendChild(Level1)
        Next Level1
    Next yr
    XCombined.Save (mPath & ".xml")
    Application.StatusBar = " "
        
    Set XDoc = Nothing
    Set XCombined = Nothing


End Sub
 
Ok solved my own coding problem, after set RootElement = XCombined.createElement need to add the line
Xcombined.appendchild RootElement
Would have thought that would be defined in the set statement.

Anyway back to the original question, should one combine multiple XML files or loop over each file?
 
I combined the 15 files into one large file and then ran a test to select records that matched two conditions.
Using 15 yearly files took just under one minute (00:00:54) compared to one large file that took more than double the time (00:02:24).
Looking at the task manager, it seems like the slow step is reading the file into memory and it is much faster reading 15 files smaller files than one large file.
 
Back
Top