Skip to content

PrankWeb Architecture

unknown edited this page Oct 29, 2020 · 7 revisions

The p2rank online architecture is different from the original version.

The idea behind the design is to have several components with smaller responsibilities, rather than one big component. This should allow us to easily replace any component. Besides, it decreases the entry barrier for new developers, as it is easier to understand a single component then a whole system. The components can be grouped into several groups mainly: web-api-gateway and web-runtime.

web-api-gateway

The web-api-gateway is responsible for handling incoming HTTP requests and serving all web application resources. In development mode, all this functionality can be covered by frontend. However, in production, we decide to use Nginx. Reasons for using Nginx are:

  • NodeJs is not optimized to serve static resources.
  • We employ Nginx to use HTTPs.
  • We can use Nginx is more suitable as API gate (passing request) then NodeJs.

API gateway

Serves frontend and proxy request to web-runtime.

frontend

Frontend static files like images, CSS, HTML, JavaScript, etc.

web-runtime

The web-runtime is responsible for executing p2rank and providing the result files.

task-runner

The external component responsible for accepting HTTP request, executing tasks and serving output data.

execution script

A single file component is responsible for running the task. The task is defined by a configuration and any user-provided files, the script is responsible for downloading any necessary content, selecting a chain, running conservation pipeline, and running p2rank itself.

protein-utils

Handle manipulation with proteins files and p2rank result. p2rank Output results in CSV files which are hard to use by JavaScript, so this component makes the conversion. Next, it can extract sequence from a PDB file, so it can be used by conservation. In general, this component is a good candidate for extraction from this repository as a general tool for protein manipulation.

conservation pipeline

Pipeline responsible for running the conservation analysis. It consists of a script that calls multiple tools. The pipeline is mostly copied from another repository.

p2rank

The p2rank itself.