a FastAPI REST service that provides somewhat bespoke inference services on medical data
pfms
is a FastAPI application that provides REST services for segmentation on input image data, in particular spleen volumetric data. Simplistically, it can be thought of as a so -called Model Server although the technical semantics of that statement are not really correct. Several API endpoints are provided, suited to consumption by software clients. A typical life-cycle involves uploading a very specific neural network weights file, which is used to initialize the inference engine. Then, NIfTI volumes can be uploaded to an inference API route, which returns a segmented NIfTI volume.
Conventional MLops uses specialized terminology, such as "model server", "inference endpoint", within the context of specialized image processing. Generally, a "model server" is a server that can accept an image, run some pre-trained "inference" (almost always to perform image segmentation), and return the results. To be general, MLops servers typcially communicate data as JSON representations.
The basic idea is simple: a client communicates with some remote server using http POST
requests to send image data to a specific API endpoint associated with a specific set of operations. The server performs some operation on this data, and returns processsed image data in response.
Broadly speaking, pfms
provides this exact behavior. However, it is uniquely tailored to providing services within the context of the pl-monai_spleenseg ChRIS plugin. Indeed, pfms
uses this exact plugin as a internal module to perform the same segmentation. Moreover, unlike more conventional MLop "model servers", pfms
accepts as input NIfTI volumes and returns NIfTI volumes as resultants. This is considerably more efficient than a JSON serialization and deserialization of payload data to encode an image.
To build a local version, clone this repo and then
set UID=$(id -u) # for bash/zsh
set UID (id -u) # for fish shell
docker build --build-arg UID=UID -t local/pfms .
To use the version available on dockerhub (note, might not be available at time of reading):
docker pull fnndsc/pfms
To start the services
SESSIONUSER=localuser
docker run --gpus all --privileged \
--env SESSIONUSER=$SESSIONUSER \
--name pfms --rm -it -d \
-p 2024:2024 \
local/pfms /start-reload.sh
To start with source code debugging and live refreshing:
SESSIONUSER=localuser
docker run --gpus all --privileged \
--env SESSIONUSER=$SESSIONUSER \
--name pfms --rm -it -d \
-p 2024:2024 \
-v $PWD/pfms:/app:ro
local/pfms /start-reload.sh
(note if you pulled from dockerhub, use fnndsc/pfms
instead of local/pfms
)
pfms
can host/provide multiple "models" -- a model is understood here to be simply a pre-trained weights file in pth
format as generated by pl-monai_spleenseg
during a training phase. This pth
file can be uploaded to pfms
by POSTing the file to this endpoint:
POST :2024/api/v1/spleenseg/modelpth/?modelID=<modelID>
Note the that the URL query parameter of modelID
is used to "name" this model.
To run the segmentation on a volume using a model file, POST
the NIfTI volume to the correct API endpoint, naming the model to use as a query parameter:
POST :2024/api/v1/spleenseg/NIfTIinference/?modelID=<modelID>
Here, a NIfTI volume is passed as a FileUpload
request. The pfms
instance will save/unpack this file within itself, and then run the pl-monai_spleenseg
inference mode using as model weights the pth
file associated with <modelID>
. The resultant NIfTI file, stored within the server, is then read and streamed back to the caller, which will typically save this file to disk or do further processing.
Note that this call will block until processing complete! Processing (depending on network speed, etc) is typically less than 30 seconds.
To get a list of available models to use, simply GET
the modelpth
endpoint:
GET :2024/api/v1/spleenseg/modelpth
-30-