Create a huge data file for load testing

In this short blog, I am going to describe how you can create a big file for load testing. In most cases, you will only need step #2 to combine join big files.

For my earlier blog read-huge-xml-file-and-convert-to-csv, I needed to create a very big xml file (6GB +) without crashing my machine !!

My sample XML file would look like following with millions of <book> element.

<catalog>  
   <book id="001" lang="ENG">  
     ..  
   </book>  
   <book id="002" lang="ENG">  
     ..  
   </book>  
   ...  
 </catalog>  

Steps:
1) Since the start of the file contained <catalog> and file ended with </catalog>, I striped the start and end line and created a small file with just a few <book> elements.

//small.xml
<book id="001" lang="ENG">  
    ..  
 </book>  
 <book id="002" lang="ENG">  
    ..  
 </book>  
 <book id="003" lang="ENG">  
    ..  
 </book>  
 <book id="004" lang="ENG">  
    ..  
 </book>  


2) Used 'cat' to join files. The following would create bigger.xml by combining five small.xml files

cat small.xml  small.xml  small.xml  small.xml  small.xml >> bigger.xml


I can further do the following to gain exponential file size

cat bigger.xml  bigger.xml  bigger.xml  bigger.xml  bigger.xml >> evenbigger.xml

3) finally I used 'sed' to add <catalog> at beginning  and </catalog>  at end to create a proper xml file

 sed -i '1s/^/<catalog> /' bigger.xml
 sed -i -e '$a</catalog>' bigger.xml


4) Let's verify using tail and head

  head -10 bigger.xml
  tail -10 bigger.xml

I can see the <catalog>  at start and </catalog> at end. Hurray....

java read huge xml file and convert to csv

SAX parser uses event handler org.xml.sax.helpers.DefaultHandler to efficiently parse and handle the intermediate results of an XML file.  

It provides the following three important methods on each event where we can write custom logic to take specific action at each events:
  • startDocument() and endDocument() – Method called at the start and end of an XML document. 
  • startElement() and endElement() – Method called at the start and end of a document element.  
  • characters() – Method called with the text contents in between the start and end tags of an XML document element.
We will be using this class to read a HUGE xml file (6.58GB, it should support any size without any problem) efficiently and convert and write to CSV file.

I am going to use my existing code from my old blog xml-parsing-using-saxparser and updating it for this purpose. The final code is available on github project java-read-big-xml-to-csv


Java HUGE XML to CSV - project structure

How to Import/Run:

Its a simple maven project(with no dependencies). You can import it into your IDE or  use command line to compile and run.
If you plan on using Command Line, to compile and create a runnable jar file, go to the root of the project and run mvnw clean package .
Then you can run the executable as following:
java -jar target\xmltocsv-FINAL.jar  C:\folder\input.xml  C:\folder\output.csv

The code:

SaxParseEventHandler 
SaxParseEventHandler class takes the RecordWriter as constructor parameter
public SaxParseEventHandler(RecordWriter<Book> writer) {


We create new book record on startElement event
public void startElement(String s, String s1, String elementName, Attributes attributes) { /* handle start of a new Book tag and attributes of an element */ if (elementName.equalsIgnoreCase("book")) { //start bookTmp = new Book();


and we write the parsed book data to file on endElement() event.
public void endElement(String s, String s1, String element) { if (element.equals("book")) { //end writer.write(bookTmp, counter);





RecordWriter:
Its a simple wrapper for FileWriter to write content to file. We are currently writing T.toString() to file.
public void write(T t, int n) throws IOException { fw.write(t.toString()); if (n % 10000 == 0) { fw.flush(); } }

Main:
Its the main 'launcher' class
SAXParserFactory factory = SAXParserFactory.newInstance(); try (RecordWriter<Book> w = new RecordWriter<>(outputCSV)) { SAXParser parser = factory.newSAXParser(); parser.parse(inputXml, new SaxParseEventHandler(w)); }






Results at 16GB RAM, Core i5, 6MB L3 cache, SSD | Windows Machine
Max RAM usage: 190MB
Time Taken:
For the file big2.xml with size 118MB
- JDK8 - 8-9 sec
- JDK 11 - 6-7 sec
- JDK 14 - 5 sec 

big3.xml with size 6.58GB takes about 2 minutes


Next Steps: create a binary using GraalVM. I will keep posting !!

Java Compress/Decompress String/Data

Java provides the Deflater class for general purpose compression using the ZLIB compression library. It also provides the DeflaterOutputStream which uses the Deflater class to filter a stream of data by compressing (deflating) it and then writing the compressed data to another output stream. There are equivalent Inflater and InflaterOutputStream classes to handle the decompression.

Compression


Here is an example of how to use the DeflatorOutputStream to compress a byte array.
static byte[]compressBArray(byte[]bArray) throws IOException{
        ByteArrayOutputStream os=new ByteArrayOutputStream();
        try(DeflaterOutputStream dos=new DeflaterOutputStream(os)){
            dos.write(bArray);
        }
        return os.toByteArray();
}

Let's test:

byte[] input = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"        .getBytes();
byte[] op = CompressionUtil.compressBArray(input);
System.out.println("original data length " + input.length +
        ",  compressed data length " + op.length);

This results 'original data length 71,  compressed data length 12'

Decompression

Let's test:

public static byte[] decompress(byte[] compressedTxt) throws IOException {
        ByteArrayOutputStream os = new ByteArrayOutputStream();    
        try (OutputStream ios = new InflaterOutputStream(os)) {
            ios.write(compressedTxt);    
        }
        return os.toByteArray();
}
This prints the original 'input' string.


Let's convert the byte[] to Base64 to make it portable

In the above examples we are getting the compressed data in byte array format (byte []) which is an array of numbers.

But we might want to transmit the compressed data to a file or json or db right? So, in order to transmit, we can convert it to Base64 using the following

byte[] bytes = {}; //the byte array    
String b64Compressed = new String(Base64.getEncoder().encode(bytes));
byte[] decompressedBArray = Base64.getDecoder().decode(b64Compressed);
//convert to original string if input was string
new String(decompressedBArray, StandardCharsets.UTF_8);

Here's the complete code and the test cases

package compress;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.zip.DeflaterOutputStream;
import java.util.zip.InflaterOutputStream;

public class CompressionUtil {

    public static String compressAndReturnB64(String text) throws IOException {
        return new String(Base64.getEncoder().encode(compress(text)));
    }

    public static String decompressB64(String b64Compressed) throws IOException {
        byte[] decompressedBArray = decompress(Base64.getDecoder().decode(b64Compressed));
        return new String(decompressedBArray, StandardCharsets.UTF_8);
    }

    public static byte[] compress(String text) throws IOException {
        return compress(text.getBytes());
    }

    public static byte[] compress(byte[] bArray) throws IOException {
        ByteArrayOutputStream os = new ByteArrayOutputStream();
        try (DeflaterOutputStream dos = new DeflaterOutputStream(os)) {
            dos.write(bArray);
        }
        return os.toByteArray();
    }

    public static byte[] decompress(byte[] compressedTxt) throws IOException {
        ByteArrayOutputStream os = new ByteArrayOutputStream();
        try (OutputStream ios = new InflaterOutputStream(os)) {
            ios.write(compressedTxt);
        }
        return os.toByteArray();
    }

}

Test case:

package compress;

import org.junit.jupiter.api.Test;

import java.io.IOException;
import java.nio.charset.StandardCharsets;

public class CompressionTest {

    String testStr = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";

    @Test
    void compressByte() throws IOException {
        byte[] input = testStr.getBytes();
        byte[] op = CompressionUtil.compress(input);
        System.out.println("original data length " + input.length + ",  compressed data length " + op.length);
        byte[] org = CompressionUtil.decompress(op);
        System.out.println(org.length);
        System.out.println(new String(org, StandardCharsets.UTF_8));
    }

    @Test
    void compress() throws IOException {

        String op = CompressionUtil.compressAndReturnB64(testStr);
        System.out.println("Compressed data b64" + op);
        String org = CompressionUtil.decompressB64(op);
        System.out.println("Original text" + org);
    }

}


 Note: Since the compress and decompress method operate on byte[], we can compress/decompress any data type.