Skip to content

Commit

Permalink
Uploaded readme and minor changes
Browse files Browse the repository at this point in the history
  • Loading branch information
Tomáš Houfek committed Dec 16, 2024
1 parent ff47188 commit df20db5
Show file tree
Hide file tree
Showing 6 changed files with 153 additions and 7 deletions.
14 changes: 14 additions & 0 deletions Dockerfile.prod
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM bitnami/python:3.10

RUN mkdir /uploader

WORKDIR /uploader

ADD requirements.txt .
ADD main.py .
ADD uploader/ uploader/
ADD tests/ tests/

RUN pip install -r requirements.txt

USER 1005
86 changes: 86 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,90 @@

This repository is the third part of the [FAIRification pipeline](https://github.com/BBMRI-cz/NGS-data-FAIRification) and is responsible for uploading metadata to [data.bbmri.cz](https://data.bbmri.cz/) catalogue.

## Supported sequencing types
Miseq, New Miseq, MammaPrint

## How to run the scripts
### Locally - Development
#### Using main.py
1. Install requirements
```bash
pip install -r requiremenents.txt
```
2. Run main.py
```bash
python main.py -r path/to/pseudonymized/runs/folder -o /path/to/root/organisation/folder -p /path/to/patients/folder
```
#### Using docker-compose
```bash
docker-compose up -d --build
```
### In production
Production is running on Kubernetes cluster SensitiveCloud
#### Using kubernetes (kubectl)
Deploy dependent secrets
```bash
## Supported sequencing types
Miseq, New Miseq, MammaPrint

## How to run the scripts
### Locally - Development
#### Using main.py
1. Install requirements
```bash
pip install -r requiremenents.txt
```
2. Run main.py
```bash
python main.py -r path/to/pseudonymized/runs/folder -o /path/to/root/organisation/folder -p /path/to/patients/folder
```
#### Using docker-compose
```bash
docker-compose up -f compose.yml -d --build
```
### In production
Production is running on Kubernetes cluster SensitiveCloud
#### Using kubernetes (kubectl)
Deploy dependent secrets
```bash
kubectl apply -f kubernetes/catalog-secret.yaml -n bbmri-mou-ns
```
```bash
kubectl apply -f kubernetes/uploader-job.yaml -n bbmri-mou-ns
```
#### Deploying new version in production
Build new docker image
```bash
docker build --no-cache <public-dockerhub-repository>/data-catalogue-organiser:<version> .
docker push <public-dockerhub-repository>/data-catalogue-organiser:<version>
# change version in kubernetes/organiser-job.yaml
```
#### Debigging
You can visit kubernetes UI [Rancher](https://rancher.cloud.trusted.e-infra.cz/) find the failing pod and investigate in logs.
On how to use Rancher and SensitiveCloud visit [Docs](https://docs.cerit.io/en/platform/overview)

Other option is running a testing job and investigation inside the cluster filemanager (to check user permissions etc.)
```bash
kubectl apply kubectl apply -f kubernetes/testing-job.yaml -n bbmri-mou-ns
```
Then connect to terminal of this job/pod on [Rancher](https://rancher.cloud.trusted.e-infra.cz/)

```bash
kubectl apply -f kubernetes/organiser-job.yaml -n bbmri-mou-ns
```
#### Deploying new version in production
Build new docker image
```bash
docker build --no-cache <public-dockerhub-repository>/data-catalogue-uploader:<version> .
docker push <public-dockerhub-repository>/data-catalogue-uploader:<version>
# change version in kubernetes/organiser-job.yaml
```
#### Debigging
You can visit kubernetes UI [Rancher](https://rancher.cloud.trusted.e-infra.cz/) find the failing pod and investigate in logs.
On how to use Rancher and SensitiveCloud visit [Docs](https://docs.cerit.io/en/platform/overview)

Other option is running a testing job and investigation inside the cluster filemanager (to check user permissions etc.)
```bash
kubectl apply kubectl apply -f kubernetes/testing-job.yaml -n bbmri-mou-ns
```
Then connect to terminal of this job/pod on [Rancher](https://rancher.cloud.trusted.e-infra.cz/)
File renamed without changes.
45 changes: 45 additions & 0 deletions kubernetes/testing-job.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
apiVersion: v1
kind: Pod
metadata:
name: ubuntu-running
spec:
securityContext:
fsGroup: 1000
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
runAsUser: 1005
runAsGroup: 1000
fsGroupChangePolicy: "OnRootMismatch"
containers:
- name: ubuntu-running
image: ubuntu:latest
resources:
limits:
memory: 512Mi
cpu: "1"
requests:
memory: 256Mi
cpu: "0.2"
imagePullPolicy: Always
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 30; done;"]
volumeMounts:
- name: storage-catalogue-volume
mountPath: /data/
- name: data-wsi-volume
mountPath: /wsi/
subPath: tiff
restartPolicy: "Never"
volumes:
- name: storage-catalogue-volume
persistentVolumeClaim:
claimName: pvc-storage-catalogue-secret
- name: data-wsi-volume
persistentVolumeClaim:
claimName: pvc-osd-secret
4 changes: 2 additions & 2 deletions kubernetes/uploader-job.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: batch/v1
kind: CronJob
metadata:
name: organise-sequencing
name: upload-sequencing
spec:
schedule: "0 22 * * 0,3,5"
jobTemplate:
Expand All @@ -18,7 +18,7 @@ spec:
fsGroupChangePolicy: "OnRootMismatch"
containers:
- name: organise-sequencing
image: tomashoufek/data-catalogue-uploader:1.0.0
image: tomashoufek/data-catalogue-uploader:1.0.1
imagePullPolicy: Always
securityContext:
allowPrivilegeEscalation: false
Expand Down
11 changes: 6 additions & 5 deletions uploader/file_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@ def get_all_runs_with_data_for_catalogue(organised_folder: str, wanted_run_type:
multiple_runs_path = os.path.join(organised_folder, year, run_type, "complete-runs")
else:
multiple_runs_path = os.path.join(organised_folder, year, run_type)
for run in os.listdir(multiple_runs_path):
run_catalog_info = os.path.join(multiple_runs_path, run, "catalog_info_per_pred_number")
already_uploaded = os.path.join(multiple_runs_path, run, ".uploaded")
if os.path.exists(run_catalog_info) and os.listdir(run_catalog_info) and not already_uploaded:
runs_to_precess_for_catalogue.append(os.path.join(multiple_runs_path, run))
if os.path.exists(multiple_runs_path):
for run in os.listdir(multiple_runs_path):
run_catalog_info = os.path.join(multiple_runs_path, run, "catalog_info_per_pred_number")
already_uploaded = os.path.join(multiple_runs_path, run, ".uploaded")
if os.path.exists(run_catalog_info) and os.listdir(run_catalog_info) and not already_uploaded:
runs_to_precess_for_catalogue.append(os.path.join(multiple_runs_path, run))

return runs_to_precess_for_catalogue

0 comments on commit df20db5

Please sign in to comment.