Skip to content

Latest commit

 

History

History
141 lines (105 loc) · 8.43 KB

README.md

File metadata and controls

141 lines (105 loc) · 8.43 KB

Malware Scanner

This proof of concept shows the use of Spring Boot, Kafka and iText together to execute long-running tasks asynchronously with a REST API. The implemented service checks whether PDF files contain IBANs that are suspected of being used for money laundering. The implementation makes it possible to add other checks for the PDF files later.

Built With

Prerequisites

Kafka runs with Docker Compose, which is integrated into Spring Boot. A working Docker setup must therefore be available to start the project. Java 21 and Maven are also required.

Build & Run

  1. Clone the repo
    git clone https://github.com/murygin/malware-scanner.git
  2. Compile
    ./mvnw clean compile
  3. Run
    ./mvnw spring-boot:run

IBAN Blacklist

The blacklist with the suspicious IBANs is configured in the file src/main/resources/application.properties. The property in the file is iban.blacklist. IBANs are separated by commas.

iban.blacklist=BG18RZBB91550123456789,FO9264600123456789,GB33BUKB20201555555555

Usage

The API provides an endpoint for starting the check of a PDF file and an endpoint for loading the result. If the service is started with ./mvnw spring-boot:run the base url is http://localhost:8080.

POST /check/files

Starts the check of a PDF file. The PDFs are checked asynchronously. The result is not returned directly in the response. The response contains a confirmation of the start with the ID of the check. The response header Location contains the URL for loading the result.

Request:

{
  "url": "http://localhost:9090/pdf-with-iban.pdf",
  "file-type": "pdf"
}

Response:

  • Status: 202 Accepted
  • Header: Location: /check/files/b3a5896f-387b-4363-a631-cfbf467db1ce
{
   "state": "CREATED",
   "results": [],
   "id": "b3a5896f-387b-4363-a631-cfbf467db1ce"
}

GET /check/files/<UUID>

Loads the result of checking a PDF file. The PDFs are checked asynchronously. If the check has not yet been started, the status CREATED is returned. If the check is currently running, the status RUNNING is returned. When the check is completed, the status FINISHED and a result is returned.

Response:

  • Status: 200 OK
{
   "state": "FINISHED",
   "results": [
      {
         "state": "SUSPICIOUS",
         "name": "money-laundering",
         "details": "Unique IBANs: 111, suspicious IBANs: 2"
      }
   ],
   "id": "b3a5896f-387b-4363-a631-cfbf467db1ce"
}

How it works

The REST endpoint POST /check/files can be used to trigger a PDF file check. When the endpoint is called, the method create is called in the controller. The Spring Boot REST Controller o.d.m.rest.MalwareScannerController contains the methods that are executed when the endpoints are called. The controller is only a facade and passes the calls on to the o.d.m.service.CheckJobService.

If a new check is requested, the controller calls the method createCheckJob in the CheckJobService. The check is not started directly. The check is only triggered by the Kafka event. This has the advantage that the caller of the REST endpoint is not blocked and has to wait, but receives a response immediately. This method createCheckJob in CheckJobService creates a o.d.m.model.CheckJob with the status CREATED and saves it in the database. A o.d.m.model.CheckEvent is then sent to event streaming platform Kafka.

The checkEvents are consumed by the o.d.m.kafka.kafkaKafkaTopicListener. After receiving the event, the KafkaTopicListener set the status of the CheckJob to RUNNING and starts a check by calling the checkPDFFile method in the o.d.m.service.IBANCheckService.

The checkPDFFile method in the IBANCheckService loads the PDF file via the URL first and finds all IBANs in the file. Afterward it checks whether the IBANs found are in the blacklist, which contains the IBANs suspected of being used for money laundering. An instance of the o.d.m.service.IBANFinder is created to search for IBANs in the PDF file. After calling the run method, the IBANFinder collects all IBANs found in a Set. The iText library is used to check the PDF files. After the check in the IBANCheckService is completed, an o.d.m.model.CheckResultEvent is sent to Kafka. The CheckResultEvent is consumed by the KafkaTopicListener. The KafkaTopicListener takes the result of the check from the event and saves it in the CheckJob The status of the job is set to FINISHED. Now the result of the job can be loaded from the client via the REST endpoint GET /check/files/<UUID>.

Improvements & Enhancements

  • The service should only be able to be used if a client is authenticated.
  • Loading arbitrary external resources during the runtime of an application is a major security risk. The URL of the PDF files that the client sends to the service must not be trusted. The URL must be checked before it is processed. Only data from selected hosts should be loaded.
  • The API should be documented with Spring SpringDoc, OpenAPI and Swagger.
  • Test coverage should be improved. Integration tests are to be implemented for the controller calls and the processing of Kafka events.
  • A load test needs to be written to test how the system performs when many requests have to be processed simultaneously.
  • Other check handlers can be added that consume Kafka events and check, for example, whether an IBAN actually exists.
  • Exception handling should be improved if an invalid request body is sent to the POST /check/files endpoint.
  • Exception handling when executing file checks should be improved if errors occur during execution.

Articles

With the articles in this section you can learn more about frameworks and systems that are used in this application.

Kafka

API Design

Spring Boot

IBAN

Contact

Daniel Murygin - linkedin.com/in/murygin - [email protected]

Project Link: https://github.com/murygin/malware-scanner