View Full Version : Merging multiple XML files

2014-09-15, 03:16 AM
I've downloaded several large XML files (about 40 M each), each year is treated separate in the the files i.e. each node in the XML files has the structure of:

rest of xml follows

The files change infrequently, (maybe twice a year). I need to statistically analyse the files. Should I combine the files into one large file (about 500 MB) or simply loop over each year?
Is below the best way to join the files and what am I doing wrong that it is not saving the file?


Sub JoinXML()
Const mPath As String = "C:\tmp\data_"

Dim XDoc As MSXML2.DOMDocument60, XCombined As MSXML2.DOMDocument60
Dim Level1 As MSXML2.IXMLDOMNode, RootElement As MSXML2.IXMLDOMNode
Dim yr As Long

Set XCombined = New MSXML2.DOMDocument60 'the new file
Set RootElement = XCombined.createElement("data") ' the root

Set XDoc = New MSXML2.DOMDocument60 'the existing year

For yr = 1999 To 2013
XDoc.async = False
XDoc.validateOnParse = False

XDoc.Load (mPath & yr & ".xml")
For Each Level1 In XDoc.DocumentElement.ChildNodes
If Level1.NodeType = NODE_ELEMENT Then Call RootElement.appendChild(Level1)
Next Level1
Next yr
XCombined.Save (mPath & ".xml")
Application.StatusBar = " "

Set XDoc = Nothing
Set XCombined = Nothing

End Sub

2014-09-15, 04:17 AM
Ok solved my own coding problem, after set RootElement = XCombined.createElement need to add the line
Xcombined.appendchild RootElement
Would have thought that would be defined in the set statement.

Anyway back to the original question, should one combine multiple XML files or loop over each file?

2014-09-15, 09:37 AM
I combined the 15 files into one large file and then ran a test to select records that matched two conditions.
Using 15 yearly files took just under one minute (00:00:54) compared to one large file that took more than double the time (00:02:24).
Looking at the task manager, it seems like the slow step is reading the file into memory and it is much faster reading 15 files smaller files than one large file.