Showing posts with label XML. Show all posts
Showing posts with label XML. Show all posts

java read huge xml file and convert to csv

SAX parser uses event handler org.xml.sax.helpers.DefaultHandler to efficiently parse and handle the intermediate results of an XML file.  

It provides the following three important methods on each event where we can write custom logic to take specific action at each events:
  • startDocument() and endDocument() – Method called at the start and end of an XML document. 
  • startElement() and endElement() – Method called at the start and end of a document element.  
  • characters() – Method called with the text contents in between the start and end tags of an XML document element.
We will be using this class to read a HUGE xml file (6.58GB, it should support any size without any problem) efficiently and convert and write to CSV file.

I am going to use my existing code from my old blog xml-parsing-using-saxparser and updating it for this purpose. The final code is available on github project java-read-big-xml-to-csv


Java HUGE XML to CSV - project structure

How to Import/Run:

Its a simple maven project(with no dependencies). You can import it into your IDE or  use command line to compile and run.
If you plan on using Command Line, to compile and create a runnable jar file, go to the root of the project and run mvnw clean package .
Then you can run the executable as following:
java -jar target\xmltocsv-FINAL.jar  C:\folder\input.xml  C:\folder\output.csv

The code:

SaxParseEventHandler 
SaxParseEventHandler class takes the RecordWriter as constructor parameter
public SaxParseEventHandler(RecordWriter<Book> writer) {


We create new book record on startElement event
public void startElement(String s, String s1, String elementName, Attributes attributes) { /* handle start of a new Book tag and attributes of an element */ if (elementName.equalsIgnoreCase("book")) { //start bookTmp = new Book();


and we write the parsed book data to file on endElement() event.
public void endElement(String s, String s1, String element) { if (element.equals("book")) { //end writer.write(bookTmp, counter);





RecordWriter:
Its a simple wrapper for FileWriter to write content to file. We are currently writing T.toString() to file.
public void write(T t, int n) throws IOException { fw.write(t.toString()); if (n % 10000 == 0) { fw.flush(); } }

Main:
Its the main 'launcher' class
SAXParserFactory factory = SAXParserFactory.newInstance(); try (RecordWriter<Book> w = new RecordWriter<>(outputCSV)) { SAXParser parser = factory.newSAXParser(); parser.parse(inputXml, new SaxParseEventHandler(w)); }






Results at 16GB RAM, Core i5, 6MB L3 cache, SSD | Windows Machine
Max RAM usage: 190MB
Time Taken:
For the file big2.xml with size 118MB
- JDK8 - 8-9 sec
- JDK 11 - 6-7 sec
- JDK 14 - 5 sec 

big3.xml with size 6.58GB takes about 2 minutes


Next Steps: create a binary using GraalVM. I will keep posting !!

Java : Generate JTree from XML dynamically - code example

In this post, i am going to describe how to generate JTree in swing according to an XML document dynamically.


Used XML File : catalog.xml

XSLT : Using reusable XSL to generate HTML Form dynamically

In this article , am going to describe how to generate html form dynamically using reusable xsl file.


Problem Statement :

Suppose I have large number of xml files :  There is one condition that : there won't be any more nested tags . But the name of property tag will be different for each xmldata file.
First xml:

Xsl transform in Java working example (xml to html)

In order to display XML documents, it is necessary to have a mechanism to describe how the document should be displayed. One of these mechanisms is Cascading Style Sheets (CSS), but XSL (eXtensibleStylesheet Language) is the preferred style sheet language of XML.
XSL can be used to define how an XML file should be displayed by transforming the XML file into a format such as HTML, PDF, etc..

XML parsing using SaxParser with complete code

SAX parser use callback function (org.xml.sax.helpers.DefaultHandler) to informs clients of the XML document structure. You should extend DefaultHandler and override few methods to achieve xml parsing.
The methods to override are
  • startDocument() and endDocument() – Method called at the start and end of an XML document. 
  • startElement() and endElement() – Method called at the start and end of a document element.  
  • characters() – Method called with the text contents in between the start and end tags of an XML document element.

The following example demonstrates the uses of DefaultHandler to parse and XML document. It performs mapping of xml to model class and generate list of objects.