For my earlier blog read-huge-xml-file-and-convert-to-csv, I needed to create a very big xml file (6GB +) without crashing my machine !!
My sample XML file would look like following with millions of <book> element.
<catalog>
<book id="001" lang="ENG">
..
</book>
<book id="002" lang="ENG">
..
</book>
...
</catalog>
Steps:
1) Since the start of the file contained <catalog> and file ended with </catalog>, I striped the start and end line and created a small file with just a few <book> elements.
//small.xml
<book id="001" lang="ENG">
..
</book>
<book id="002" lang="ENG">
..
</book>
<book id="003" lang="ENG">
..
</book>
<book id="004" lang="ENG">
..
</book>
2) Used 'cat' to join files. The following would create bigger.xml by combining five small.xml files
cat small.xml small.xml small.xml small.xml small.xml >> bigger.xml
I can further do the following to gain exponential file size
cat bigger.xml bigger.xml bigger.xml bigger.xml bigger.xml >> evenbigger.xml
3) finally I used 'sed' to add <catalog> at beginning and </catalog> at end to create a proper xml file
sed -i '1s/^/<catalog> /' bigger.xml
sed -i -e '$a</catalog>' bigger.xml
4) Let's verify using tail and head
head -10 bigger.xml
tail -10 bigger.xml
I can see the <catalog> at start and </catalog> at end. Hurray....
No comments:
Post a Comment
Your Comment and Question will help to make this blog better...