Spring data Pagination - set max page size and other customizations

Background:

HandlerMethodArgumentResolver is a strategy interface to resolve method parameters in context of given context. So, if you want to automatically resolve the parameter MyObject in the following method, you can create a bean of HandlerMethodArgumentResolver and implement logic to resolve the argument.

@GetMapping("/users")
public Page<User> getUsers(MyObject object) {

Spring Framework already provides a lot of resolvers to handle various parameters such as AuthenticationPrincipal, CSRF, Session, Cookie, MVC Model, and of course Pageable.

 

Pageable Resolver:

Spring Data comes with PageableHandlerMethodArgumentResolver to resolve pageable parameter from the request URL.

If you send a request /users?size=20&page=2, the Pageable object will be injected to the method parameter.


@GetMapping("/users")
public Page<User> getUsers(Pageable pageable) {
return userRepository.findAllByStatus(Status.ACTIVE, pageable);
}

Customize PageableHandlerMethodArgumentResolver

To customize the Pageable resolver, we need to create a bean of PageableHandlerMethodArgumentResolverCustomizer , which will be applied at SpringDataWebConfiguration#customizePageableResolver before the pageableResolver() is created SpringDataWebConfiguration#pageableResolver.

PageableHandlerMethodArgumentResolverCustomizer is a SAM (single method interface aka FunctionalInterface). 

Setting max page size

@Bean
public PageableHandlerMethodArgumentResolverCustomizer paginationCustomizer() {
return pageableResolver -> {
pageableResolver.setMaxPageSize(20); //default is 2000
pageableResolver.setPageParameterName("pageNumber"); //default is page
pageableResolver.setSizeParameterName("elementsPerPage"); //default is size
};
}

Now the url should be /users?elementsPerPage=20&pageNumber=2 instead of /users?size=20&page=2.

If you pass elementsPerPage more than 20, it will be defaulted back to 20.

Which will be helpful to prevent potential attacks trying to issue an OutOfMemoryError.

Protobuf Apache Pulsar with Spring Boot

Protocol buffers(protobuf) are language and platform neutral mechanism to serialize structured data. We define the data structure in protobuf format and generate source code to write and read data to and from a variety of data streams and using a variety of languages.

Protocol buffers currently support generated code in Java, Python and more.

Apache Pulsar is a cloud-native, distributed messaging and streaming platform.

In this blog we are going to use a simple protobuf message and use protobuf-maven-plugin to generate Java source code and use protobuf as message format(schema type) for Apache Pulsar pub-sub application.

1) Protobuf model + Java Source generation:

By default protobuf-maven-plugin looks at src/main/proto folder for the .proto files. We have following proto files to represent Person and Greeting objects


Person.proto

syntax = "proto3";
package app.model;
message Person {
string fName = 1;
string lName = 2;
}


Greeting.proto:

syntax = "proto3";
package app.model;
message Greeting {
string greeting = 1;
}


Here's the basic configuration for protobuf-maven-plugin. It also requires os-maven-plugin. Please refer to the github project for the full source.

 

<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<version>0.6.1</version>
<configuration>
<protocArtifact>com.google.protobuf:protoc:3.12.0:exe:${os.detected.classifier}</protocArtifact>
<pluginId>grpc-java</pluginId>
<pluginArtifact>io.grpc:protoc-gen-grpc-java:1.25.0:exe:${os.detected.classifier}</pluginArtifact>
<clearOutputDirectory>true</clearOutputDirectory>
</configuration>
<executions>
<execution>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>

 

2) Spring Boot + Apache Pulsar

PulsarClient Bean: Spring Boot doesn't have an official auto configuration for Apache Pulsar yet. So, we have to create the PulsarClient bean ourselves.

 

@Bean
PulsarClient pulsarClient() throws PulsarClientException {
return PulsarClient.builder()
.serviceUrl("pulsar://localhost:6650")
.build();
}

Also the Producer. Please note the parameter of newProducer. We are using PROTOBUF  as the schema type.

Producer bean: We will need to create the producer bean for Person model at main-app and for Greeting model at greeting-service.

@Bean
Producer<THE_MODEL> personProducer(PulsarClient pulsarClient) throws PulsarClientException {
return pulsarClient.newProducer(Schema.PROTOBUF(THE_MODEL.class))
.topic(THE_TOPIC)
.create();
}

 

Pulsar Listener Config: We can register the message listener using @PostConstruct as follows

final PulsarClient pulsarClient;

@PostConstruct
private void initConsumer() throws PulsarClientException {
pulsarClient
.newConsumer(Schema.PROTOBUF(THE_MODEL.class))
.topic(THE_TOPIC)
.subscriptionName(SUBSCRIPTION_NAME)
.messageListener((consumer, msg) -> {

//message handler logic

})
.subscribe();
}


That's all the configuration we will need.

3. How to run?

1) Clone the project https://github.com/gtiwari333/spring-protobuf-grpc-apache-pulsar and run mvn clean compile to generate java sources and compile the project

2) Run apache pulsar instance using docker(you can run it manually)

docker run -it   -p 6650:6650   -p 8080:8080   apachepulsar/pulsar:2.2.0   bin/pulsar standalone

3) Start GreetingApp and MainApp

4) Hit localhost:8082/greet/Joe/Biden to publish a message. You will see this being received by greeting-service and the greeting being sent back to the queue which will be received by main-app.


The project structure:

├── pom.xml
├── greeting-service
│   ├── pom.xml
│   └── src
│       └── main
│           ├── java
│           │   └── gt
│           │       └── greeting
│           │           └── GreetingApp.java
│           └── resources
│               └── application.properties
├── main-app
│   ├── pom.xml
│   └── src
│       └── main
│           ├── java
│           │   └── gt
│           │       └── mainapp
│           │           └── MainApp.java
│           └── resources
│               └── application.properties
├── protobuf-model
│   ├── pom.xml
│   └── src
│       └── main
│           └── proto
│               ├── Greeting.proto
│               └── Person.proto


 

References:

  • https://developers.google.com/protocol-buffers/docs/overview
  • https://pulsar.apache.org/docs/en/client-libraries-java/
  • GitHub project https://github.com/gtiwari333/spring-protobuf-grpc-apache-pulsar


Read all table and columns in Jpa/Hibernate

How to get metadata about all table and columns managed by JPA/Hibernate?

There are many ways to get a list of table and columns in your project that uses JPA/Hibernate. Each has pros and cons.

Option A) Direct Query on INFORMATION_SCHEMA.

Simplest way is to do direct query on INFORMATION_SCHEMA  or similar schema that the database internally uses.

For MySQL, H2, MariaDB etc the following would work. We will need specific query for each database.

SELECT * from INFORMATION_SCHEMA.TABLES
SELECT * FROM INFORMATION_SCHEMA.COLUMNS

Option B) DB Independent query using JDBC API

We can do DB independent query by using JDBC API to return the Metadata. It would use DB specific query provided by DB driver to return the metadata.

DataSource ds = ; //create/wire DataSource object
DatabaseMetaData metaData = ds.getConnection().getMetaData();
ResultSet schemasRS = metaData.getSchemas();
ResultSet tablesRS = metaData.getTables(null, null, null, new String[]{"TABLE"});

We can iterate over the ResultSet to get the schema, table and columns. It would return everything that the Database has.

Option C) Use EntityManager MetaModel to read Entity classes

In order to retrieve only the entity/tables that the application uses, we can rely on some Java Reflection magic as following:

EntityManager em; //autowire the bean
MetamodelImplementor metaModelImpl = (MetamodelImplementor) em.getMetamodel();
List<String> tableNames = metaModelImpl
.entityPersisters()
.values().stream()
.map(ep -> ((AbstractEntityPersister) ep).getTableName())
.toList();

Option D) Use Hibernate Magic

Use hibernate's Metadata class that stores the ORM model determined by provided entity mappings.

org.hibernate.boot.Metadata metadata; //getting the Metadata is tricky though

for (PersistentClass persistentClass : metadata.getEntityBindings()) {
tableNames.add(persistentClass.getTable().getExportIdentifier());
}



Download Files from FTP using JSch java library

SSH provides support for secure remote login(login to remote server similar to putty), secure file transfer(SCP or FTP/SFTP download), and secure TCP/IP and X11 forwardings. JSch is a Java implementation of SSH2 protocal.

In this example we will see how we can use JSch library to login to SFTP server and download files.

First, add the following dependency to your pom.xml
     <dependency>  
       <groupId>com.jcraft</groupId>  
       <artifactId>jsch</artifactId>  
       <version>0.1.54</version>  <!-- or latest version -->
     </dependency>  

JSch apis are pretty simple. First you create a session and open a channel then you can use one of the many function such as CD, LS, PUT, GET to change directory, list content, upload file or download respectively.

Create Session:


JSch jsch = new JSch();
Session session = jsch.getSession("demo", "test.rebex.net", 22);
session.setPassword("password");
session.connect();

Create Channel:

ChannelSftp channel = (ChannelSftp) session.openChannel("sftp");
channel.connect();

Change folder:

channelSftp.cd("/a/folder");

List content of a folder:

Vector<ChannelSftp.LsEntry> entries = channelSftp.ls(folder);

Download file:

channelSftp.get(String fileNameInFtp, String  destinationFile);

Upload File:

channelSftp. put(String src, String dst) //default is overwrite
channelSftp. put(String src, String dst, int mode)
Upload Modes:
public static final int
OVERWRITE=0;
public static final int RESUME=1;
public static final int APPEND=2;

A Complete Example Code to download files from FTP:

In this example, we are using a publicly available ftp server  as described in https://test.rebex.net/
 import com.jcraft.jsch.*;  
 import java.io.File;  
 import java.util.*;  
 public class JschDownload {  
public static void main(String[] args) { Session session = null; ChannelSftp channel = null; try { JSch jsch = new JSch(); session = jsch.getSession("demo", "test.rebex.net", 22); session.setPassword("password");
//to prevent following exception for sftp //com.jcraft.jsch.JSchException: UnknownHostKey: test.rebex.net. RSA key fingerprint is .. Properties config = new Properties(); config.put("StrictHostKeyChecking", "no"); session.setConfig(config); session.connect(); System.out.println("session connected");
//various channels are supported eg: shell, x11, channel = (ChannelSftp) session.openChannel("sftp"); channel.connect(); System.out.println("channel connected");
downloadFromFolder(channel, "/"); downloadFromFolder(channel, "/pub/example/");
     //in order to download all files including sub-folders/sub-sub-folder, we should iterate recursively System.out.println("File Uploaded to FTP Server Successfully.");
} catch (Exception e) { e.printStackTrace(); } finally { if (channel != null) { channel.disconnect(); } if (channel != null) { session.disconnect(); } } } static void downloadFromFolder(ChannelSftp channelSftp, String folder) throws SftpException { Vector<ChannelSftp.LsEntry> entries = channelSftp.ls(folder); new File("download").mkdir();
//download all files (except the ., .. and folders) from given folder for (ChannelSftp.LsEntry en : entries) { if (en.getFilename().equals(".") || en.getFilename().equals("..") || en.getAttrs().isDir()) { continue; }
System.out.println("Downloading " + (folder + en.getFilename()) + " ----> " + "download" + File.separator + en.getFilename()); channelSftp.get(folder + en.getFilename(), "download" + File.separator + en.getFilename()); } } }


Spock - call mocked method multiple times and return different results for same input

Spock - return different mock result for same input

In unit testing, we create and use mock objects for any complex/real object that's impractical or impossible to incorporate into a unit test. Generally any component that's outside of the scope of unit test are mocked. Mocking frameworks like JMock, EasyMock, Mockito provide an easy way to describe the expected behavior of a component without writing the full implementation of the object being mocked.

In Groovy world, the Spock testing framework includes powerful mocking capabilities without requiring additional mocking libraries.

In this article, I am going to describe how we can create mock objects that can be called multiple times and and each time they return multiple values.

First, let's see how we can return Fixed Values from a mocked method: for this, we use the right-shift (>>) operator to return a fixed value:
//the input parameter is _(any) and it will return "ok" everytime
mockObj.method(_) >> "ok"
To return different values for different invocation
//return 'ok' for param1 and 'not-ok' for param2
mockObj.method("param1") >> "ok"
mockObj.method("param2") >> "not-ok"

Finally, to return different values for same parameter: for this, we use triple-right shift (>>>) operator.
mockObj.method("param") >>> ["", "ok", "not-ok"]

It will return empty value for first invocation, "ok" for second and "not-ok" for third and rest of the invocation.

GraalVM setup and generate native image using maven plugin

GraalVM Setup and native image generation using Maven plugin

Today we are going to generate native image(one of many features of GraalVM) using GraalVM for the XML Parser that we developed earlier. Native image will contain the whole program in machine code ready for its immediate execution. It has the following advantages:
Ref: https://www.graalvm.org/docs/why-graal/#create-a-native-image
  • faster startup time
  • no need for JVM(JDK/JRE) to execute the application
  • low memory footprint
 Steps:

1) GraalVM setup

I used sdkman to install GraalVM SDK setup in my Linux machine. I used the following steps. First I listed all available JDK distributions and then I ran sdk install to install the latest GraalVM version. At the end of the installation I selected Yes to enable this version as default JDK.

sdk list java
sdk install java 20.1.0.r11-grl 

Then I verified the installation using following
java -version 

I got the following. So, everything working great so far:
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment GraalVM CE 20.1.0 (build 11.0.7+10-jvmci-20.1-b02)
OpenJDK 64-Bit Server VM GraalVM CE 20.1.0 (build 11.0.7+10-jvmci-20.1-b02, mixed mode, sharing)


If you want to do it manually, download the zip file and extract it and add to system path and enable that as default JDK.

2) Native image tools installation

Before you can use GraalVM native image utility,  you need to have a working C developer environment. For this:

- On Linux, you will need GCC, and the glibc and zlib headers. 
Examples for common distributions:

    # dnf (rpm-based)
    sudo dnf install gcc glibc-devel zlib-devel libstdc++-static
    # Debian-based distributions:
    sudo apt-get install build-essential libz-dev zlib1g-dev

- On MacOS
    XCode provides the required dependencies on macOS:

    xcode-select --install

- On Windows, you will need to install the Visual Studio 2017 Visual C++ Build Tools


After this, you can run the following to install the native-image utility
$JAVA_HOME/bin/gu install native-image  

Here, $JAVA_HOME is your GraalVM installation directory

3) Finally, use GraalVM native image Maven plugin to generate native-image during package phase

For this, I added the following on my XML Parser's pom.xml file:  

Dependency:
<dependency>
    <groupId>org.graalvm.sdk</groupId>
    <artifactId>graal-sdk</artifactId>
    <version>${graalvm.version}</version>
    <scope>provided</scope>
</dependency>



Plugin: It automatically detects the jar file and the main class from the jar file. I've specified the imageName = xmltocsv as the executable

<plugin>
    <groupId>org.graalvm.nativeimage</groupId>
    <artifactId>native-image-maven-plugin</artifactId>
    <version>${graalvm.version}</version>
    <executions>
        <execution>
            <goals>
                <goal>native-image</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <!--The plugin figures out what jar files it needs to pass to the native image
        and what the executable main class should be. -->
        <!--<mainClass>${app.mainClass}</mainClass>-->
        <imageName>xmltocsv</imageName>
        <buildArgs>
            --no-fallback
        </buildArgs>
        <skip>false</skip>
    </configuration>
</plugin>


The version:
<graalvm.version>20.1.0</graalvm.version>

And ran with following to generate the native image
mvnw clean package native-image:native-image




 It produced the following files on my target folder

── target
│   ├── xmltocsv   //this is the binary file, it can run without jvm
│   └── xmltocsv-FINAL.jar  //this required JVM to run


4) Testing

In my linux machine I executed the xmltocsv binary
$ ./target/xmltocsv ../big3.xml ../big3.csv

It started faster, used less memory but took little longer(because we lost JVM optimizations) to convert the file than running the jar file to do the same.

The complete example code is available here: https://github.com/gtiwari333/java-read-big-xml-to-csv

Create a huge data file for load testing

In this short blog, I am going to describe how you can create a big file for load testing. In most cases, you will only need step #2 to combine join big files.

For my earlier blog read-huge-xml-file-and-convert-to-csv, I needed to create a very big xml file (6GB +) without crashing my machine !!

My sample XML file would look like following with millions of <book> element.

<catalog>  
   <book id="001" lang="ENG">  
     ..  
   </book>  
   <book id="002" lang="ENG">  
     ..  
   </book>  
   ...  
 </catalog>  

Steps:
1) Since the start of the file contained <catalog> and file ended with </catalog>, I striped the start and end line and created a small file with just a few <book> elements.

//small.xml
<book id="001" lang="ENG">  
    ..  
 </book>  
 <book id="002" lang="ENG">  
    ..  
 </book>  
 <book id="003" lang="ENG">  
    ..  
 </book>  
 <book id="004" lang="ENG">  
    ..  
 </book>  


2) Used 'cat' to join files. The following would create bigger.xml by combining five small.xml files

cat small.xml  small.xml  small.xml  small.xml  small.xml >> bigger.xml


I can further do the following to gain exponential file size

cat bigger.xml  bigger.xml  bigger.xml  bigger.xml  bigger.xml >> evenbigger.xml

3) finally I used 'sed' to add <catalog> at beginning  and </catalog>  at end to create a proper xml file

 sed -i '1s/^/<catalog> /' bigger.xml
 sed -i -e '$a</catalog>' bigger.xml


4) Let's verify using tail and head

  head -10 bigger.xml
  tail -10 bigger.xml

I can see the <catalog>  at start and </catalog> at end. Hurray....

java read huge xml file and convert to csv

SAX parser uses event handler org.xml.sax.helpers.DefaultHandler to efficiently parse and handle the intermediate results of an XML file.  

It provides the following three important methods on each event where we can write custom logic to take specific action at each events:
  • startDocument() and endDocument() – Method called at the start and end of an XML document. 
  • startElement() and endElement() – Method called at the start and end of a document element.  
  • characters() – Method called with the text contents in between the start and end tags of an XML document element.
We will be using this class to read a HUGE xml file (6.58GB, it should support any size without any problem) efficiently and convert and write to CSV file.

I am going to use my existing code from my old blog xml-parsing-using-saxparser and updating it for this purpose. The final code is available on github project java-read-big-xml-to-csv


Java HUGE XML to CSV - project structure

How to Import/Run:

Its a simple maven project(with no dependencies). You can import it into your IDE or  use command line to compile and run.
If you plan on using Command Line, to compile and create a runnable jar file, go to the root of the project and run mvnw clean package .
Then you can run the executable as following:
java -jar target\xmltocsv-FINAL.jar  C:\folder\input.xml  C:\folder\output.csv

The code:

SaxParseEventHandler 
SaxParseEventHandler class takes the RecordWriter as constructor parameter
public SaxParseEventHandler(RecordWriter<Book> writer) {


We create new book record on startElement event
public void startElement(String s, String s1, String elementName, Attributes attributes) { /* handle start of a new Book tag and attributes of an element */ if (elementName.equalsIgnoreCase("book")) { //start bookTmp = new Book();


and we write the parsed book data to file on endElement() event.
public void endElement(String s, String s1, String element) { if (element.equals("book")) { //end writer.write(bookTmp, counter);





RecordWriter:
Its a simple wrapper for FileWriter to write content to file. We are currently writing T.toString() to file.
public void write(T t, int n) throws IOException { fw.write(t.toString()); if (n % 10000 == 0) { fw.flush(); } }

Main:
Its the main 'launcher' class
SAXParserFactory factory = SAXParserFactory.newInstance(); try (RecordWriter<Book> w = new RecordWriter<>(outputCSV)) { SAXParser parser = factory.newSAXParser(); parser.parse(inputXml, new SaxParseEventHandler(w)); }






Results at 16GB RAM, Core i5, 6MB L3 cache, SSD | Windows Machine
Max RAM usage: 190MB
Time Taken:
For the file big2.xml with size 118MB
- JDK8 - 8-9 sec
- JDK 11 - 6-7 sec
- JDK 14 - 5 sec 

big3.xml with size 6.58GB takes about 2 minutes


Next Steps: create a binary using GraalVM. I will keep posting !!