This proof of concept shows the use of Spring Boot, Kafka and iText together to execute long-running tasks asynchronously with a REST API. The implemented service checks whether PDF files contain IBANs that are suspected of being used for money laundering. The implementation makes it possible to add other checks for the PDF files later.
Kafka runs with Docker Compose, which is integrated into Spring Boot. A working Docker setup must therefore be available to start the project. Java 21 and Maven are also required.
- Clone the repo
git clone https://github.com/murygin/malware-scanner.git
- Compile
./mvnw clean compile
- Run
./mvnw spring-boot:run
The blacklist with the suspicious IBANs is configured in the file src/main/resources/application.properties. The property in the file is iban.blacklist
. IBANs are separated by commas.
iban.blacklist=BG18RZBB91550123456789,FO9264600123456789,GB33BUKB20201555555555
The API provides an endpoint for starting the check of a PDF file and an endpoint for loading the result. If the service is started with ./mvnw spring-boot:run
the base url is http://localhost:8080.
Starts the check of a PDF file. The PDFs are checked asynchronously. The result is not returned directly in the response. The response contains a confirmation of the start with the ID of the check. The response header Location
contains the URL for loading the result.
Request:
{
"url": "http://localhost:9090/pdf-with-iban.pdf",
"file-type": "pdf"
}
Response:
- Status:
202 Accepted
- Header:
Location: /check/files/b3a5896f-387b-4363-a631-cfbf467db1ce
{
"state": "CREATED",
"results": [],
"id": "b3a5896f-387b-4363-a631-cfbf467db1ce"
}
Loads the result of checking a PDF file. The PDFs are checked asynchronously. If the check has not yet been started, the status CREATED
is returned. If the check is currently running, the status RUNNING
is returned. When the check is completed, the status FINISHED
and a result is returned.
Response:
- Status:
200 OK
{
"state": "FINISHED",
"results": [
{
"state": "SUSPICIOUS",
"name": "money-laundering",
"details": "Unique IBANs: 111, suspicious IBANs: 2"
}
],
"id": "b3a5896f-387b-4363-a631-cfbf467db1ce"
}
The REST endpoint POST /check/files
can be used to trigger a PDF file check. When the endpoint is called, the method create
is called in the controller. The Spring Boot REST Controller o.d.m.rest.MalwareScannerController contains the methods that are executed when the endpoints are called. The controller is only a facade and passes the calls on to the o.d.m.service.CheckJobService.
If a new check is requested, the controller calls the method createCheckJob
in the CheckJobService
. The check is not started directly. The check is only triggered by the Kafka event. This has the advantage that the caller of the REST endpoint is not blocked and has to wait, but receives a response immediately. This method createCheckJob
in CheckJobService
creates a o.d.m.model.CheckJob with the status CREATED
and saves it in the database. A o.d.m.model.CheckEvent is then sent to event streaming platform Kafka.
The checkEvents
are consumed by the o.d.m.kafka.kafkaKafkaTopicListener. After receiving the event, the KafkaTopicListener
set the status of the CheckJob
to RUNNING
and starts a check by calling the checkPDFFile
method in the o.d.m.service.IBANCheckService.
The checkPDFFile
method in the IBANCheckService
loads the PDF file via the URL first and finds all IBANs in the file. Afterward it checks whether the IBANs found are in the blacklist, which contains the IBANs suspected of being used for money laundering. An instance of the o.d.m.service.IBANFinder is created to search for IBANs in the PDF file. After calling the run
method, the IBANFinder
collects all IBANs found in a Set
. The iText library is used to check the PDF files. After the check in the IBANCheckService
is completed, an o.d.m.model.CheckResultEvent is sent to Kafka. The CheckResultEvent
is consumed by the KafkaTopicListener
. The KafkaTopicListener
takes the result of the check from the event and saves it in the CheckJob
The status of the job is set to FINISHED
. Now the result of the job can be loaded from the client via the REST endpoint GET /check/files/<UUID>
.
- The service should only be able to be used if a client is authenticated.
- Loading arbitrary external resources during the runtime of an application is a major security risk. The URL of the PDF files that the client sends to the service must not be trusted. The URL must be checked before it is processed. Only data from selected hosts should be loaded.
- The API should be documented with Spring SpringDoc, OpenAPI and Swagger.
- Test coverage should be improved. Integration tests are to be implemented for the controller calls and the processing of Kafka events.
- A load test needs to be written to test how the system performs when many requests have to be processed simultaneously.
- Other check handlers can be added that consume Kafka events and check, for example, whether an IBAN actually exists.
- Exception handling should be improved if an invalid request body is sent to the
POST /check/files
endpoint. - Exception handling when executing file checks should be improved if errors occur during execution.
With the articles in this section you can learn more about frameworks and systems that are used in this application.
Kafka
- Apache Kafka Quickstart
- Run Kafka Streams Demo Application
- Is a Key Required as Part of Sending Messages to Kafka?
- What should I use as the key for my Kafka message?
API Design
Spring Boot
- Docker Compose Support in Spring Boot 3.1
- Getting started with Spring Boot 3, Kafka over docker with docker-compose.yaml
- Building REST services with Spring
- Spring Boot With H2 Database
- Building REST services with Spring
- Getting started with unit testing in spring boot
IBAN
- Register of countries using the IBAN standard
- IBAN Validation API V4 Documentation
- IBAN Validation and Calculation - openiban
- Global IBAN regex
Daniel Murygin - linkedin.com/in/murygin - [email protected]
Project Link: https://github.com/murygin/malware-scanner