From 36a977f11c7dda429f75828714f70b28ac4b5e11 Mon Sep 17 00:00:00 2001 From: brokkoli71 <44113112+brokkoli71@users.noreply.github.com> Date: Wed, 6 Sep 2023 19:36:06 +0200 Subject: [PATCH 01/11] Create dev-guide.md --- dev-guide.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 dev-guide.md diff --git a/dev-guide.md b/dev-guide.md new file mode 100644 index 000000000..83615f4dc --- /dev/null +++ b/dev-guide.md @@ -0,0 +1,13 @@ +# dev guide +This is a guide for Developers, starting to work on PROTzilla and will provide an overview on how to extend the application + +### workflow structure +To analyse protein data in PROTzilla a user will go through **sections** "importing", "data_preprocessing", "data_analysis" and "data_integration". +Each section is divided into **steps** that can be added to the workflow. A step uses a **method** to achieve what it's supposed to do. +Also methods can have **parameters**. +All implemented methods are listed in `protzilla/constants/workflow_meta.json` following the tree structure `section > step > method > parameter`. + +### adding a new method +- add your method in `workflow_meta.json` within the corresponding step +- implement your method in `protzilla/
/.py` corresponding to section and step you added the method in `workflow_meta.json` +- link the implementation to `workflow_meta.json` in the `method_map` data structure located in `protzilla/constants/location_mapping.py` From 06ce5de7eaacaa3000af381cd9a08ce5e6455f96 Mon Sep 17 00:00:00 2001 From: brokkoli71 <44113112+brokkoli71@users.noreply.github.com> Date: Wed, 6 Sep 2023 20:59:12 +0200 Subject: [PATCH 02/11] add user_data to dev-guide.md --- dev-guide.md | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/dev-guide.md b/dev-guide.md index 83615f4dc..2a3ff59ec 100644 --- a/dev-guide.md +++ b/dev-guide.md @@ -1,6 +1,11 @@ + # dev guide This is a guide for Developers, starting to work on PROTzilla and will provide an overview on how to extend the application +### TODO project structure +[...] + + ### workflow structure To analyse protein data in PROTzilla a user will go through **sections** "importing", "data_preprocessing", "data_analysis" and "data_integration". Each section is divided into **steps** that can be added to the workflow. A step uses a **method** to achieve what it's supposed to do. @@ -10,4 +15,25 @@ All implemented methods are listed in `protzilla/constants/workflow_meta.json` f ### adding a new method - add your method in `workflow_meta.json` within the corresponding step - implement your method in `protzilla/
/.py` corresponding to section and step you added the method in `workflow_meta.json` -- link the implementation to `workflow_meta.json` in the `method_map` data structure located in `protzilla/constants/location_mapping.py` + - e.g. look at `protzilla/data_preprocessing/imputation.py` +- preprocessing methods also need a plot generation implementation. +- link the implementation to `workflow_meta.json` in the `method_map` and `plot_map` data structures located in `protzilla/constants/location_mapping.py` +- TODO: ist das alles? + +### what is the user_data directory? +The directory `user_data/workflows` contains workflow templates, that can be shared among users for reproducibility. +This workflows store the order of steps also their corresponding selected methods and parameters. +If no default methods or parameters are selected, default values from `workflow_meta.json` will be used. + +When a user creates a new run in PROTzilla a new folder in `user_data/runs` will be generated. +The user is asked to select a workflow template. +A copy of that workflow is copied into the run folder with the name `workflow.json`. +It shaddows the state of `run.workflow_config` in `protzilla/run.py`. + + +> [!WARNING] +> ⚠️ never delete the `hello123` run as for some reason the tests on github will fail then. We tried to identify the problem but our best solution was to just leave the run + +### why are there #TODOs with a number in the code? +Todos are small issues or suggestions listed in [the issues list with label "todo"](https://github.com/cschlaffner/PROTzilla2/issues?q=is%3Aissue+is%3Aopen+label%3Atodo). +The number corresponds to the issue id. From ce4442d0d2baf545269a02f15f71e83630081cf8 Mon Sep 17 00:00:00 2001 From: brokkoli71 <44113112+brokkoli71@users.noreply.github.com> Date: Wed, 6 Sep 2023 22:02:20 +0200 Subject: [PATCH 03/11] improve readability of dev-guide.md --- dev-guide.md | 41 +++++++++++++++++++++++------------------ 1 file changed, 23 insertions(+), 18 deletions(-) diff --git a/dev-guide.md b/dev-guide.md index 2a3ff59ec..78bcb47f7 100644 --- a/dev-guide.md +++ b/dev-guide.md @@ -1,39 +1,44 @@ # dev guide -This is a guide for Developers, starting to work on PROTzilla and will provide an overview on how to extend the application +This is a guide for developers starting to work on PROTzilla and will provide an overview of how to extend the application. -### TODO project structure +### project structure [...] - ### workflow structure -To analyse protein data in PROTzilla a user will go through **sections** "importing", "data_preprocessing", "data_analysis" and "data_integration". +To analyze protein data in PROTzilla a user will go through **sections** "importing", "data_preprocessing", "data_analysis" and "data_integration". Each section is divided into **steps** that can be added to the workflow. A step uses a **method** to achieve what it's supposed to do. -Also methods can have **parameters**. +Also, methods can have **parameters**. All implemented methods are listed in `protzilla/constants/workflow_meta.json` following the tree structure `section > step > method > parameter`. +> [!NOTE] +> remember the hierarchy `section > step > method > parameter` + ### adding a new method -- add your method in `workflow_meta.json` within the corresponding step -- implement your method in `protzilla/
/.py` corresponding to section and step you added the method in `workflow_meta.json` +- add your method in `workflow_meta.json` within the corresponding step and also specify parameters and their defaults. +- implement your method in `protzilla/
/.py` corresponding to the section and step you added in `workflow_meta.json` with the parameters specified there - e.g. look at `protzilla/data_preprocessing/imputation.py` - preprocessing methods also need a plot generation implementation. - link the implementation to `workflow_meta.json` in the `method_map` and `plot_map` data structures located in `protzilla/constants/location_mapping.py` - TODO: ist das alles? -### what is the user_data directory? -The directory `user_data/workflows` contains workflow templates, that can be shared among users for reproducibility. -This workflows store the order of steps also their corresponding selected methods and parameters. -If no default methods or parameters are selected, default values from `workflow_meta.json` will be used. +### what is the difference between run, workflow and history? +The directory `user_data/workflows` contains workflow templates. +They store the configuration of an analysis, which contains the order of steps and also their corresponding selected methods and parameters. +A workflow can be shared among users for reproducible results. + +When a user starts PROTzilla, they are asked to create a new run and select a workflow template. +A Run object from `protzilla/run.py` will be created, and a new folder with the name of the run will be generated in `user_data/runs`. +The selected workflow will be copied into the run directory and will be the `workflow_config` of this run. -When a user creates a new run in PROTzilla a new folder in `user_data/runs` will be generated. -The user is asked to select a workflow template. -A copy of that workflow is copied into the run folder with the name `workflow.json`. -It shaddows the state of `run.workflow_config` in `protzilla/run.py`. +The run will follow the steps listed in the `workflow_config` and selected methods and parameter values as default. +If no default methods or parameters are selected within `workflow_config`, default values from `workflow_meta.json` will be used. +The history contains the outputs of steps that have already been executed and is necessary for the back button. > [!WARNING] -> ⚠️ never delete the `hello123` run as for some reason the tests on github will fail then. We tried to identify the problem but our best solution was to just leave the run +> never delete the `hello123` run as for some reason the tests on github will fail then. We tried to identify the problem but our best solution was to just leave `hello123` where it was. ### why are there #TODOs with a number in the code? -Todos are small issues or suggestions listed in [the issues list with label "todo"](https://github.com/cschlaffner/PROTzilla2/issues?q=is%3Aissue+is%3Aopen+label%3Atodo). -The number corresponds to the issue id. +To-dos are small issues or suggestions listed in [the issues list with label "todo"](https://github.com/cschlaffner/PROTzilla2/issues?q=is%3Aissue+is%3Aopen+label%3Atodo). +The number corresponds to the issue ID. From bd5081b174d5d42305ee0ade4e4b1df9f9ac2a3b Mon Sep 17 00:00:00 2001 From: brokkoli71 <44113112+brokkoli71@users.noreply.github.com> Date: Sun, 10 Sep 2023 21:26:10 +0200 Subject: [PATCH 04/11] tutorial for adding step or methods --- dev-guide.md | 43 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 36 insertions(+), 7 deletions(-) diff --git a/dev-guide.md b/dev-guide.md index 78bcb47f7..314084919 100644 --- a/dev-guide.md +++ b/dev-guide.md @@ -1,3 +1,25 @@ +# tutorial + +### adding a new method/step +To provide an example, I will display the `knn` method within the step `imputation`. Feel free to have a look at the code by yourself. +1. Add your method in `protzilla/constants/workflow_meta.json` within the corresponding step and also specify parameters and their defaults. +

+ +2. Implement your method in `protzilla/
/.py` corresponding to the section and step you added in `workflow_meta.json` with the parameters specified there.
+ Importing and preprocessing steps should return the dataframe, that gets handed to the next step and a dict with further results. Data analysis and integration steps just return a dict.
+ > [!NOTE] + > Do not forget to explain your method with python docstrings! + + in `protzilla/data_preprocessing/imputation.py`: +

+ +3. Preprocessing methods also need a plot generation implementation which returns a list of `plotly.graph_objects.Figure`: +

+4. Link the implementation to the entry in `workflow_meta.json` in the `method_map` and `plot_map` data structures located in `protzilla/constants/location_mapping.py` +

+

+5. Write tests for your new method (experiment with TDD and write tests before implementing the method, it can save you some time:) ) + # dev guide This is a guide for developers starting to work on PROTzilla and will provide an overview of how to extend the application. @@ -14,13 +36,7 @@ All implemented methods are listed in `protzilla/constants/workflow_meta.json` f > [!NOTE] > remember the hierarchy `section > step > method > parameter` -### adding a new method -- add your method in `workflow_meta.json` within the corresponding step and also specify parameters and their defaults. -- implement your method in `protzilla/
/.py` corresponding to the section and step you added in `workflow_meta.json` with the parameters specified there - - e.g. look at `protzilla/data_preprocessing/imputation.py` -- preprocessing methods also need a plot generation implementation. -- link the implementation to `workflow_meta.json` in the `method_map` and `plot_map` data structures located in `protzilla/constants/location_mapping.py` -- TODO: ist das alles? + ### what is the difference between run, workflow and history? The directory `user_data/workflows` contains workflow templates. @@ -42,3 +58,16 @@ The history contains the outputs of steps that have already been executed and is ### why are there #TODOs with a number in the code? To-dos are small issues or suggestions listed in [the issues list with label "todo"](https://github.com/cschlaffner/PROTzilla2/issues?q=is%3Aissue+is%3Aopen+label%3Atodo). The number corresponds to the issue ID. + +# tutorial + +### adding a new method/step +1. add your method in `workflow_meta.json` within the corresponding step and also specify parameters and their defaults. +2. +- ![image]( | width=100) + +- implement your method in `protzilla/
/.py` corresponding to the section and step you added in `workflow_meta.json` with the parameters specified there + - e.g. look at `protzilla/data_preprocessing/imputation.py` +- preprocessing methods also need a plot generation implementation. +- link the implementation to `workflow_meta.json` in the `method_map` and `plot_map` data structures located in `protzilla/constants/location_mapping.py` +- TODO: ist das alles? From e554392c93b0376cdb5b94de17328ac500204a81 Mon Sep 17 00:00:00 2001 From: brokkoli71 <44113112+brokkoli71@users.noreply.github.com> Date: Sun, 10 Sep 2023 21:48:04 +0200 Subject: [PATCH 05/11] dev-guide for runner --- dev-guide.md | 58 ++++++++++++++++++++++++---------------------------- 1 file changed, 27 insertions(+), 31 deletions(-) diff --git a/dev-guide.md b/dev-guide.md index 314084919..e5a53facf 100644 --- a/dev-guide.md +++ b/dev-guide.md @@ -1,25 +1,3 @@ -# tutorial - -### adding a new method/step -To provide an example, I will display the `knn` method within the step `imputation`. Feel free to have a look at the code by yourself. -1. Add your method in `protzilla/constants/workflow_meta.json` within the corresponding step and also specify parameters and their defaults. -

- -2. Implement your method in `protzilla/
/.py` corresponding to the section and step you added in `workflow_meta.json` with the parameters specified there.
- Importing and preprocessing steps should return the dataframe, that gets handed to the next step and a dict with further results. Data analysis and integration steps just return a dict.
- > [!NOTE] - > Do not forget to explain your method with python docstrings! - - in `protzilla/data_preprocessing/imputation.py`: -

- -3. Preprocessing methods also need a plot generation implementation which returns a list of `plotly.graph_objects.Figure`: -

-4. Link the implementation to the entry in `workflow_meta.json` in the `method_map` and `plot_map` data structures located in `protzilla/constants/location_mapping.py` -

-

-5. Write tests for your new method (experiment with TDD and write tests before implementing the method, it can save you some time:) ) - # dev guide This is a guide for developers starting to work on PROTzilla and will provide an overview of how to extend the application. @@ -27,6 +5,14 @@ This is a guide for developers starting to work on PROTzilla and will provide an ### project structure [...] +### the runner +PROTzilla can be used not only in the browser but also from the command line without a graphical user interface. The runner is a practical tool for this. +It can execute an entire workflow without further user input. In the future, the runner should also be executable from the browser. +This way, as soon as researchers created a workflow with the analysis they want to perform, they can get their results for multiple datasets with less effort. +PROTzilla is thus intended to be an independent package that also works independently of Django. +> [!NOTE] +> think about whether new features should be implemented in the `protzilla` or `ui` folder. + ### workflow structure To analyze protein data in PROTzilla a user will go through **sections** "importing", "data_preprocessing", "data_analysis" and "data_integration". Each section is divided into **steps** that can be added to the workflow. A step uses a **method** to achieve what it's supposed to do. @@ -62,12 +48,22 @@ The number corresponds to the issue ID. # tutorial ### adding a new method/step -1. add your method in `workflow_meta.json` within the corresponding step and also specify parameters and their defaults. -2. -- ![image]( | width=100) - -- implement your method in `protzilla/
/.py` corresponding to the section and step you added in `workflow_meta.json` with the parameters specified there - - e.g. look at `protzilla/data_preprocessing/imputation.py` -- preprocessing methods also need a plot generation implementation. -- link the implementation to `workflow_meta.json` in the `method_map` and `plot_map` data structures located in `protzilla/constants/location_mapping.py` -- TODO: ist das alles? +To provide an example, I will display the `knn` method within the step `imputation`. Feel free to have a look at the code by yourself. +1. Add your method in `protzilla/constants/workflow_meta.json` within the corresponding step and also specify parameters and their defaults. +

+ +2. Implement your method in `protzilla/
/.py` corresponding to the section and step you added in `workflow_meta.json` with the parameters specified there.
+ Importing and preprocessing steps should return the dataframe, that gets handed to the next step and a dict with further results. Data analysis and integration steps just return a dict.
+ > [!NOTE] + > Do not forget to explain your method with python docstrings! + + in `protzilla/data_preprocessing/imputation.py`: +

+ +3. Preprocessing methods also need a plot generation implementation which returns a list of `plotly.graph_objects.Figure`: +

+4. Link the implementation to the entry in `workflow_meta.json` in the `method_map` and `plot_map` data structures located in `protzilla/constants/location_mapping.py` +

+

+5. Write tests for your new method (experiment with TDD and write tests before implementing the method, it can save you some time:) ) + From 6e608fe2dc07a3ecbbd25f23318a353b96fce531 Mon Sep 17 00:00:00 2001 From: brokkoli71 <44113112+brokkoli71@users.noreply.github.com> Date: Sun, 10 Sep 2023 21:56:37 +0200 Subject: [PATCH 06/11] restructure dev-guide.md --- dev-guide.md | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/dev-guide.md b/dev-guide.md index e5a53facf..c36d74c03 100644 --- a/dev-guide.md +++ b/dev-guide.md @@ -5,13 +5,6 @@ This is a guide for developers starting to work on PROTzilla and will provide an ### project structure [...] -### the runner -PROTzilla can be used not only in the browser but also from the command line without a graphical user interface. The runner is a practical tool for this. -It can execute an entire workflow without further user input. In the future, the runner should also be executable from the browser. -This way, as soon as researchers created a workflow with the analysis they want to perform, they can get their results for multiple datasets with less effort. -PROTzilla is thus intended to be an independent package that also works independently of Django. -> [!NOTE] -> think about whether new features should be implemented in the `protzilla` or `ui` folder. ### workflow structure To analyze protein data in PROTzilla a user will go through **sections** "importing", "data_preprocessing", "data_analysis" and "data_integration". @@ -22,21 +15,16 @@ All implemented methods are listed in `protzilla/constants/workflow_meta.json` f > [!NOTE] > remember the hierarchy `section > step > method > parameter` - - ### what is the difference between run, workflow and history? The directory `user_data/workflows` contains workflow templates. They store the configuration of an analysis, which contains the order of steps and also their corresponding selected methods and parameters. -A workflow can be shared among users for reproducible results. - +A workflow can be shared among users for reproducible results.
When a user starts PROTzilla, they are asked to create a new run and select a workflow template. A Run object from `protzilla/run.py` will be created, and a new folder with the name of the run will be generated in `user_data/runs`. -The selected workflow will be copied into the run directory and will be the `workflow_config` of this run. - +The selected workflow will be copied into the run directory and will be the `workflow_config` of this run.
The run will follow the steps listed in the `workflow_config` and selected methods and parameter values as default. -If no default methods or parameters are selected within `workflow_config`, default values from `workflow_meta.json` will be used. - -The history contains the outputs of steps that have already been executed and is necessary for the back button. +If no default methods or parameters are selected within `workflow_config`, default values from `workflow_meta.json` will be used.
+The history contains the outputs of steps that have already been executed. If the user chooses to return to the previous step and presses the back button, the previous step will be loaded from history. > [!WARNING] > never delete the `hello123` run as for some reason the tests on github will fail then. We tried to identify the problem but our best solution was to just leave `hello123` where it was. @@ -45,6 +33,14 @@ The history contains the outputs of steps that have already been executed and is To-dos are small issues or suggestions listed in [the issues list with label "todo"](https://github.com/cschlaffner/PROTzilla2/issues?q=is%3Aissue+is%3Aopen+label%3Atodo). The number corresponds to the issue ID. +### the runner +PROTzilla can be used not only in the browser but also from the command line without a graphical user interface. The runner is a practical tool for this. +It can execute an entire workflow without further user input. In the future, the runner should also be executable from the browser. +This way, as soon as researchers created a workflow with the analysis they want to perform, they can get their results for multiple datasets with less effort. +PROTzilla is thus intended to be an independent package that also works independently of Django. +> [!NOTE] +> think about whether new features should be implemented in the `protzilla` or `ui` folder. + # tutorial ### adding a new method/step From 32375d920b1c8236c62e822744e6f0b334e496b8 Mon Sep 17 00:00:00 2001 From: brokkoli71 <44113112+brokkoli71@users.noreply.github.com> Date: Sun, 10 Sep 2023 22:02:29 +0200 Subject: [PATCH 07/11] formating and highlighting in dev-guide.md --- dev-guide.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/dev-guide.md b/dev-guide.md index c36d74c03..c3847caf0 100644 --- a/dev-guide.md +++ b/dev-guide.md @@ -16,15 +16,15 @@ All implemented methods are listed in `protzilla/constants/workflow_meta.json` f > remember the hierarchy `section > step > method > parameter` ### what is the difference between run, workflow and history? -The directory `user_data/workflows` contains workflow templates. +The directory `user_data/workflows` contains **workflow** templates. They store the configuration of an analysis, which contains the order of steps and also their corresponding selected methods and parameters. A workflow can be shared among users for reproducible results.
-When a user starts PROTzilla, they are asked to create a new run and select a workflow template. +When a user starts PROTzilla, they are asked to create a new **run** and select a workflow template. A Run object from `protzilla/run.py` will be created, and a new folder with the name of the run will be generated in `user_data/runs`. The selected workflow will be copied into the run directory and will be the `workflow_config` of this run.
The run will follow the steps listed in the `workflow_config` and selected methods and parameter values as default. If no default methods or parameters are selected within `workflow_config`, default values from `workflow_meta.json` will be used.
-The history contains the outputs of steps that have already been executed. If the user chooses to return to the previous step and presses the back button, the previous step will be loaded from history. +The **history** contains the outputs of steps that have already been executed. If the user chooses to return to the previous step and presses the back button, the previous step will be loaded from history. > [!WARNING] > never delete the `hello123` run as for some reason the tests on github will fail then. We tried to identify the problem but our best solution was to just leave `hello123` where it was. @@ -34,10 +34,10 @@ To-dos are small issues or suggestions listed in [the issues list with label "to The number corresponds to the issue ID. ### the runner -PROTzilla can be used not only in the browser but also from the command line without a graphical user interface. The runner is a practical tool for this. +PROTzilla can be used not only in the browser but also from the **command line** without a graphical user interface. The runner is a practical tool for this. It can execute an entire workflow without further user input. In the future, the runner should also be executable from the browser. This way, as soon as researchers created a workflow with the analysis they want to perform, they can get their results for multiple datasets with less effort. -PROTzilla is thus intended to be an independent package that also works independently of Django. +PROTzilla is thus intended to be an **independent python package** that also works independently of UI and Django. > [!NOTE] > think about whether new features should be implemented in the `protzilla` or `ui` folder. @@ -49,9 +49,8 @@ To provide an example, I will display the `knn` method within the step `imputati

2. Implement your method in `protzilla/
/.py` corresponding to the section and step you added in `workflow_meta.json` with the parameters specified there.
- Importing and preprocessing steps should return the dataframe, that gets handed to the next step and a dict with further results. Data analysis and integration steps just return a dict.
- > [!NOTE] - > Do not forget to explain your method with python docstrings! + Importing and preprocessing steps should return the **dataframe**, that gets handed to the next step and a **dict** with further results. Data analysis and integration steps just return a dict.
+ **Do not forget to explain your method with python docstrings!** in `protzilla/data_preprocessing/imputation.py`:

@@ -61,5 +60,5 @@ To provide an example, I will display the `knn` method within the step `imputati 4. Link the implementation to the entry in `workflow_meta.json` in the `method_map` and `plot_map` data structures located in `protzilla/constants/location_mapping.py`

-5. Write tests for your new method (experiment with TDD and write tests before implementing the method, it can save you some time:) ) +5. Write **tests** for your new method (experiment with TDD and write tests before implementing the method, it can save you some time:) ) From 248ba211a8f67b5ae0f79b48a6e764c4b3bf79ad Mon Sep 17 00:00:00 2001 From: Sara Grau Date: Fri, 6 Oct 2023 21:56:33 +0200 Subject: [PATCH 08/11] added ui, project structure and workflow meta file sections. Added also some sections from architecture --- dev-guide.md | 83 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 74 insertions(+), 9 deletions(-) diff --git a/dev-guide.md b/dev-guide.md index c3847caf0..2867072b5 100644 --- a/dev-guide.md +++ b/dev-guide.md @@ -2,20 +2,39 @@ # dev guide This is a guide for developers starting to work on PROTzilla and will provide an overview of how to extend the application. -### project structure -[...] +### Project Structure +PROTzilla is structured as follows: +- [PROTzilla2/protzilla](./protzilla) package: contains all methods to perform calculations and generate plots. It can be used without the UI. You can find more information about how each specific method works by looking up the docstring attached to each method. +- [PROTzilla2/protzilla/constants/workflow_meta.json](./protzilla/constants/workflow_meta.json): contains the metadata of all available methods in PROTzilla. +- [PROTzilla2/run.py](./run.py): The Run class oversees run management, calculations, workflow configuration, and history tracking, including result accumulation upon calling next. +- [PROTzilla2/history.py](./history.py): the History class stores the chosen method, parameters, plots, output dataframe and specific outputs of all already calculated steps. +- [PROTzilla2/runner.py](./runner.py): the runner is able to execute a given workflow without the the UI. +- [PROTzilla2/ui](./ui): contains the Django apps main and runs, therefore all code that has to do with the frontend. +- [PROTzilla2/user_data](./user_data): contains all data produced by the user, including the user's workflows and the processed data and plots produced by a run. For each run a new folder is created -### workflow structure -To analyze protein data in PROTzilla a user will go through **sections** "importing", "data_preprocessing", "data_analysis" and "data_integration". -Each section is divided into **steps** that can be added to the workflow. A step uses a **method** to achieve what it's supposed to do. -Also, methods can have **parameters**. +### Concept of a Workflow and Workflow Structure +A workflow serves as a blueprint for a run, containing all the necessary statistical methods, visualizations, and their respective parameters to perform a protein analysis. + +A workflow in protzilla is organized in four **sections** "importing", "data_preprocessing", "data_analysis" and "data_integration". Each section is divided into **steps**, which are an agrupation of similar **methods** that can be added to the workflow. Also, methods can have **parameters**. All implemented methods are listed in `protzilla/constants/workflow_meta.json` following the tree structure `section > step > method > parameter`. > [!NOTE] > remember the hierarchy `section > step > method > parameter` + + +### Run class +On the class side, the Run deals with being able to continue runs and starting new ones. +On the instance side, the Run deals with making calculations by calling methods that return a dataframe and other outputs in a dict (importing, data preprocessing) or just a dict of outputs (data analysis, data integration). It also creates plots that belong to another same step (preprocessing) or are a step on their own (analysis, integration) and exports them. +Workflow manipulation is also handled in the Run. The run class is responsible for knowing the configuration of the run's workflow and the current location. +Each run has a History, which holds the previous results that get added when `next` gets called. + + +### History class +The History is mainly responsible for knowing the chosen method, parameters, plots, output dataframe and specific outputs (e.g. dropouts or p-values) of all previous steps. Depending on the set storage mode, it returns the data from memory or disk when they are requested. It also holds the information what outputs were saved by the user during data analysis. -### what is the difference between run, workflow and history? + +### How do the run, workflow, and history interact with one another? The directory `user_data/workflows` contains **workflow** templates. They store the configuration of an analysis, which contains the order of steps and also their corresponding selected methods and parameters. A workflow can be shared among users for reproducible results.
@@ -29,11 +48,32 @@ The **history** contains the outputs of steps that have already been executed. I > [!WARNING] > never delete the `hello123` run as for some reason the tests on github will fail then. We tried to identify the problem but our best solution was to just leave `hello123` where it was. -### why are there #TODOs with a number in the code? +### Calculation methods +The three folders below each represent a **section**. Each file in these folders represents a **step**. In a file there is a function for each method. + +#### PROTzilla2/protzilla/importing +This folder contains the methods used for importing mass spectrometry data and metadata. Its methods always have the following signature: an input dataframe and any other method parameters. They return a protein intensities dataframe and a dict of other outputs. + +#### PROTzilla2/protzilla/data_preprocessing +This folder contains the methods for filtering, imputation, normalisation, outlier detection and methods to create the corresponding plots. Its methods have the same signature as importing, receiving an input dataframe plus the method's parameters and returning a transformed dataframe together with a dict that contains other outputs specific to the method, such as dropouts, p-values or other values. The dataframe that is used as input is the output of the previous importing/data preprocessing step. + +#### PROTzilla2/protzilla/data_analysis +This folder contains methods that are used to analyse the outputs of the data preprocessing section and previous data analysis steps. Always using the previous output dataframe as input gets replaced by naming the steps that should be used and choosing the right named step as input. On this section the calculation and the plots be can separated from each other. In the case of the calculation methods, they return a dictionary including dataframes and other outputs. On the other hand, plot methods return a list of figures. + + + +### Workflow Meta File +The workflow meta file contains all the needed metainformation for each of the implemented methods in PROTzilla. A method's position in the JSON hierarchy determines its location (`section > step > method`). Also other information such as the name, description and parameters of the method are specified on this file. It is a key file to then build PROTzilla's frontend. + + +#### Method parameters specification +The parameters for each method can be specified in the form of a dict in the `protzilla/constants/workflow_meta.json` file. To each parameter a name, a type and a default are assigned. Then depending on the type of the parameter also other information needs to be specified in the workflow meta file, that will then affect how the input parameter fields in the frontend are built. + +### Why are there #TODOs with a number in the code? To-dos are small issues or suggestions listed in [the issues list with label "todo"](https://github.com/cschlaffner/PROTzilla2/issues?q=is%3Aissue+is%3Aopen+label%3Atodo). The number corresponds to the issue ID. -### the runner +### The runner PROTzilla can be used not only in the browser but also from the **command line** without a graphical user interface. The runner is a practical tool for this. It can execute an entire workflow without further user input. In the future, the runner should also be executable from the browser. This way, as soon as researchers created a workflow with the analysis they want to perform, they can get their results for multiple datasets with less effort. @@ -41,6 +81,31 @@ PROTzilla is thus intended to be an **independent python package** that also wor > [!NOTE] > think about whether new features should be implemented in the `protzilla` or `ui` folder. +### UI + +#### Structure of Django project. +The UI is built with the web framework Django. The folder /ui encapsulates all the frontend related code. + +A Django project is structured in such way that the root folder (in this case /ui) contains application folders. In PROTzilla there are two applications main and runs. The main functionality of PROTzilla is in the runs application. + +In Django, web pages and other content are delivered by views. Each view is represented by a Python function, which is usually in the views.py file. A view function takes a HTTP request and returns a HTTP response, for example this can be a HTML page or a HTTP error. Also, Django allows to create HTML pages dynamically with the approach of templates. A template contains the static parts of the desired HTML output as well as some special syntax describing how dynamic content will be inserted. Templates are usually located at the templates folder inside the app folder. For further information on Django see their documentation here (add link!). + +PROTzilla's frontend has two main pages: the home page, where runs can be created and continued and the run page, where the user can upload, process, analyse and plot proteomics data according to the selected workflow or add and delete any steps in the current workflow. + +#### The "run" page +The "run" page is generated by the Django view function called `detail`, which uses the Django template `details.html`. This view, along with the template, serves as the foundation for the PROTzillas frontend and therefore a good starting point to understand how the frontend is structured. + +##### What is a good title for this? +The layout of the "run" page consists of the History displayed at the top, the current step at the bottom, and a sidebar. Regardless of the current step, the HTML page's structure remains the same. Whenever an action, such as adding a step or pressing the next/back button, occurs, it triggers the detail view to be called again by the corresponding add, next, or back view, and subsequently renders the page with the updated state obtained from the Run and History classes. + +##### Input fields +The `fields.py` file contains methods for creating various input fields required for the method parameters. The field type and default information is specified in the workflow meta file. The `make_parameter_input` function selects the appropriate template based on the field type. But before creating the fields, the method `insert_special_parameters` in `run_helper` manages the display and retrieval of runtime-specific information, like outputs from previous steps, the groups present in the metadata uploaded by the user or protein database information. To see a more detailed example on how each special parameter works see (link to architecture) + +Certain fields need to be loaded without the web page being reloaded. This includes: selecting the method for the current step, populating dropdown categories based on a previous selection, and loading dynamically additional fields based on specific choices. To achieve this, AJAX asyncronous server requests are employed. HTML elements with special functionalities in the dropdowns are identified by class IDs that initiate this behavior. The JavaScript code responsible for handling these AJAX requests can be found in `run/templates/runs/dynamic_methods.html`. + + + + # tutorial ### adding a new method/step From 03a185c15ae87aa2b394caace25262cb11c179b9 Mon Sep 17 00:00:00 2001 From: brokkoli71 <44113112+brokkoli71@users.noreply.github.com> Date: Sat, 14 Oct 2023 18:05:41 +0200 Subject: [PATCH 09/11] Update README.md link dev-guide in README --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 4dab5e29e..9fad5aed0 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,9 @@ Once the script has done most of its work, something along the lines of `Startin ## Start-Guide - a little more technical +> [!NOTE] +> For further information on how to contribute on PROTzilla read our [dev-guide](./dev-guide.md). + PROTzilla2 uses Python 3.11 and conda to manage the environment and pip for installing packages. Use `conda create -n python=3.11` to create the environment, activate it with `conda activate ` (you might need to reopen your shell for this to work) and use `pip install -r requirements.txt` to install the relevant requirements in your environment. From d6fc3dceacc775677d2352a5ceb04d3cafed24e3 Mon Sep 17 00:00:00 2001 From: brokkoli71 <44113112+brokkoli71@users.noreply.github.com> Date: Thu, 2 Nov 2023 12:10:38 +0100 Subject: [PATCH 10/11] Apply suggestions from code review Co-authored-by: fynnkroeger <37335646+fynnkroeger@users.noreply.github.com> --- dev-guide.md | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/dev-guide.md b/dev-guide.md index 2867072b5..9415ba7f0 100644 --- a/dev-guide.md +++ b/dev-guide.md @@ -4,7 +4,7 @@ This is a guide for developers starting to work on PROTzilla and will provide an ### Project Structure PROTzilla is structured as follows: -- [PROTzilla2/protzilla](./protzilla) package: contains all methods to perform calculations and generate plots. It can be used without the UI. You can find more information about how each specific method works by looking up the docstring attached to each method. +- [PROTzilla2/protzilla](./protzilla) package: contains all methods to perform calculations and generate plots. It can be used without the UI. You can find more information about each method in the corresponding docstring. - [PROTzilla2/protzilla/constants/workflow_meta.json](./protzilla/constants/workflow_meta.json): contains the metadata of all available methods in PROTzilla. - [PROTzilla2/run.py](./run.py): The Run class oversees run management, calculations, workflow configuration, and history tracking, including result accumulation upon calling next. - [PROTzilla2/history.py](./history.py): the History class stores the chosen method, parameters, plots, output dataframe and specific outputs of all already calculated steps. @@ -14,9 +14,9 @@ PROTzilla is structured as follows: ### Concept of a Workflow and Workflow Structure -A workflow serves as a blueprint for a run, containing all the necessary statistical methods, visualizations, and their respective parameters to perform a protein analysis. +A workflow serves as a blueprint for a run, containing all the necessary statistical methods, visualizations, and their respective parameters used to analyse protein data. This blueprints can be reused by scientists to get reproducible results. -A workflow in protzilla is organized in four **sections** "importing", "data_preprocessing", "data_analysis" and "data_integration". Each section is divided into **steps**, which are an agrupation of similar **methods** that can be added to the workflow. Also, methods can have **parameters**. +A workflow in protzilla is organized in four **sections** "importing", "data_preprocessing", "data_analysis" and "data_integration". Each section is divided into **steps**, which are groups of similar **methods** that can be added to the workflow. Also, methods can have **parameters**. All implemented methods are listed in `protzilla/constants/workflow_meta.json` following the tree structure `section > step > method > parameter`. > [!NOTE] @@ -25,13 +25,13 @@ All implemented methods are listed in `protzilla/constants/workflow_meta.json` f ### Run class On the class side, the Run deals with being able to continue runs and starting new ones. -On the instance side, the Run deals with making calculations by calling methods that return a dataframe and other outputs in a dict (importing, data preprocessing) or just a dict of outputs (data analysis, data integration). It also creates plots that belong to another same step (preprocessing) or are a step on their own (analysis, integration) and exports them. +On the instance side, the Run deals with making calculations by calling methods that return a dataframe and other outputs in a dict (importing, data preprocessing) or just a dict of outputs (data analysis, data integration). It also creates plots that belong to another step (preprocessing) or are a step on their own (analysis, integration) and exports them. Workflow manipulation is also handled in the Run. The run class is responsible for knowing the configuration of the run's workflow and the current location. -Each run has a History, which holds the previous results that get added when `next` gets called. +Each run has a History, which holds the previous results. When `next` is called, another step is added to the history. ### History class -The History is mainly responsible for knowing the chosen method, parameters, plots, output dataframe and specific outputs (e.g. dropouts or p-values) of all previous steps. Depending on the set storage mode, it returns the data from memory or disk when they are requested. It also holds the information what outputs were saved by the user during data analysis. +The History is mainly responsible for knowing the chosen method, parameters, plots, output dataframe and specific outputs (e.g. dropouts or p-values) of all previous steps. Depending on the chosen storage mode, it returns the data from memory or disk when it is requested. It also holds the names users can attach to executed methods to use them in later calcuations. ### How do the run, workflow, and history interact with one another? @@ -65,7 +65,6 @@ This folder contains methods that are used to analyse the outputs of the data pr ### Workflow Meta File The workflow meta file contains all the needed metainformation for each of the implemented methods in PROTzilla. A method's position in the JSON hierarchy determines its location (`section > step > method`). Also other information such as the name, description and parameters of the method are specified on this file. It is a key file to then build PROTzilla's frontend. - #### Method parameters specification The parameters for each method can be specified in the form of a dict in the `protzilla/constants/workflow_meta.json` file. To each parameter a name, a type and a default are assigned. Then depending on the type of the parameter also other information needs to be specified in the workflow meta file, that will then affect how the input parameter fields in the frontend are built. @@ -88,7 +87,7 @@ The UI is built with the web framework Django. The folder /ui encapsulates all t A Django project is structured in such way that the root folder (in this case /ui) contains application folders. In PROTzilla there are two applications main and runs. The main functionality of PROTzilla is in the runs application. -In Django, web pages and other content are delivered by views. Each view is represented by a Python function, which is usually in the views.py file. A view function takes a HTTP request and returns a HTTP response, for example this can be a HTML page or a HTTP error. Also, Django allows to create HTML pages dynamically with the approach of templates. A template contains the static parts of the desired HTML output as well as some special syntax describing how dynamic content will be inserted. Templates are usually located at the templates folder inside the app folder. For further information on Django see their documentation here (add link!). +In Django, web pages and other content are delivered by views. Each view is represented by a Python function, which is usually found in the views.py file. A view function takes a HTTP request and returns a HTTP response, for example this can be a HTML page or a HTTP error. Also, Django allows to create HTML pages dynamically with templates. A template contains the static parts of the desired HTML output as well as some special syntax describing how dynamic content will be inserted. Templates are usually located at the templates folder inside the app folder. For further information on Django see their [documentation](https://docs.djangoproject.com/). PROTzilla's frontend has two main pages: the home page, where runs can be created and continued and the run page, where the user can upload, process, analyse and plot proteomics data according to the selected workflow or add and delete any steps in the current workflow. From c52c24a85f49f0589dd29ceaeb803f30e931929a Mon Sep 17 00:00:00 2001 From: brokkoli71 <44113112+brokkoli71@users.noreply.github.com> Date: Wed, 8 Nov 2023 13:40:58 +0100 Subject: [PATCH 11/11] Apply suggestions from code review --- dev-guide.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/dev-guide.md b/dev-guide.md index 9415ba7f0..026ba2a18 100644 --- a/dev-guide.md +++ b/dev-guide.md @@ -85,7 +85,7 @@ PROTzilla is thus intended to be an **independent python package** that also wor #### Structure of Django project. The UI is built with the web framework Django. The folder /ui encapsulates all the frontend related code. -A Django project is structured in such way that the root folder (in this case /ui) contains application folders. In PROTzilla there are two applications main and runs. The main functionality of PROTzilla is in the runs application. +A Django project is structured in such way that the root folder (in this case /ui) contains application folders. In PROTzilla there are two applications main and runs. The primary functionality of PROTzilla is in the runs application. In Django, web pages and other content are delivered by views. Each view is represented by a Python function, which is usually found in the views.py file. A view function takes a HTTP request and returns a HTTP response, for example this can be a HTML page or a HTTP error. Also, Django allows to create HTML pages dynamically with templates. A template contains the static parts of the desired HTML output as well as some special syntax describing how dynamic content will be inserted. Templates are usually located at the templates folder inside the app folder. For further information on Django see their [documentation](https://docs.djangoproject.com/). @@ -94,11 +94,11 @@ PROTzilla's frontend has two main pages: the home page, where runs can be create #### The "run" page The "run" page is generated by the Django view function called `detail`, which uses the Django template `details.html`. This view, along with the template, serves as the foundation for the PROTzillas frontend and therefore a good starting point to understand how the frontend is structured. -##### What is a good title for this? +##### Layout The layout of the "run" page consists of the History displayed at the top, the current step at the bottom, and a sidebar. Regardless of the current step, the HTML page's structure remains the same. Whenever an action, such as adding a step or pressing the next/back button, occurs, it triggers the detail view to be called again by the corresponding add, next, or back view, and subsequently renders the page with the updated state obtained from the Run and History classes. ##### Input fields -The `fields.py` file contains methods for creating various input fields required for the method parameters. The field type and default information is specified in the workflow meta file. The `make_parameter_input` function selects the appropriate template based on the field type. But before creating the fields, the method `insert_special_parameters` in `run_helper` manages the display and retrieval of runtime-specific information, like outputs from previous steps, the groups present in the metadata uploaded by the user or protein database information. To see a more detailed example on how each special parameter works see (link to architecture) +The `fields.py` file contains methods for creating various input fields required for the method parameters. The field type and default information is specified in the workflow meta file. The `make_parameter_input` function selects the appropriate template based on the field type. But before creating the fields, the method `insert_special_params` in `run_helper` manages the display and retrieval of runtime-specific information, like outputs from previous steps, the groups present in the metadata uploaded by the user or protein database information. To see a more detailed example on how each special parameter works see [`insert_special_params`](protzilla/run_helper.py) Certain fields need to be loaded without the web page being reloaded. This includes: selecting the method for the current step, populating dropdown categories based on a previous selection, and loading dynamically additional fields based on specific choices. To achieve this, AJAX asyncronous server requests are employed. HTML elements with special functionalities in the dropdowns are identified by class IDs that initiate this behavior. The JavaScript code responsible for handling these AJAX requests can be found in `run/templates/runs/dynamic_methods.html`. @@ -109,6 +109,8 @@ Certain fields need to be loaded without the web page being reloaded. This inclu ### adding a new method/step To provide an example, I will display the `knn` method within the step `imputation`. Feel free to have a look at the code by yourself. +0. Think about what method you want to implement, what parameters should be selectable by the user and to what step does the method belong. + 1. Add your method in `protzilla/constants/workflow_meta.json` within the corresponding step and also specify parameters and their defaults.