-
Notifications
You must be signed in to change notification settings - Fork 1
The architecture of PROTzilla
This is class describes a run with data. The instance of this class holds all subclasses, that are associated with a specific run, namely the steps attribute, which holds all steps. Also holds a run-specific instance of a DiskOperator. Also encapsulates general error handling and is responsible for performing high-level operations like exporting a workflow, saving a run, and passing the inputs to the current step for calculation or adding / removing a step.
The Run
class in the protzilla
project is responsible for managing and executing a single run in a data processing pipeline. It handles the execution of steps, error handling, and saving the run state.
Key attributes of the Run
class include:
-
run_name
: The name of the run. -
workflow_name
: The name of the workflow associated with the run. -
disk_operator
: An instance of theDiskOperator
class that handles reading and writing data to disk. -
steps
: An instance of theStepManager
class that contains all the steps of the processing pipeline.
Key methods of the Run
class include:
-
step_add
: Adds a step to the run. -
step_remove
: Removes a step from the run. -
step_calculate
: Executes the calculation for the current step. -
step_change_method
: Changes the method of the current step.
The Run
class also includes decorators for error handling and automatic saving of the run state. These decorators can be applied to methods to handle errors and save the run state automatically after the method is called.
The StepManager
class in the protzilla
project manages the execution of steps in a data processing pipeline. It keeps track of the current step, the previous steps, and the future steps. It also provides methods to navigate through the steps.
Key attributes of the StepManager
class include:
-
df_mode
: Represents the mode of data storage, either in memory or on disk. -
disk_operator
: An instance of theDiskOperator
class that handles reading and writing data to disk. -
current_step_index
: The index of the current step in the workflow. -
importing
,data_preprocessing
,data_analysis
,data_integration
: Lists of steps in each section of the pipeline.
Key methods of the StepManager
class include:
-
get_step_output
: Gets the output of a specific step. -
get_step_input
: Gets the input of a specific step.
The StepManager
class is designed to manage the execution of steps in a data processing pipeline. It ensures that the inputs and outputs of each step are validated and that the steps are executed in the correct order.
The Step
class in the protzilla
project represents a single step in a data processing pipeline. Each step has a specific operation it performs, and it can have inputs and outputs.
Key attributes of the Step
class include:
-
section
: Represents the section of the pipeline this step belongs to. -
display_name
: The display name of the step. -
operation
: The operation this step performs. -
input_keys
: The keys for the input data this step requires. -
output_keys
: The keys for the output data this step produces.
Key methods of the Step
class include:
-
calculate
: The core calculation method for all steps. It receives the inputs from the front-end and calculates the output. -
method
: The main method that performs the operation of the step. This method must be implemented in a subclass. -
handle_outputs
: This method handles the outputs from the calculation method and creates anOutput
object from it. -
validate_inputs
: This method validates the inputs of the step. -
validate_outputs
: This method validates the outputs of the step.
The Step
class is designed to be subclassed for specific types of steps. Each subclass should implement the method
method, and may also need to override other methods depending on its specific requirements.
The DiskOperator
class in the protzilla
project is responsible for reading and writing data to and from the disk. It uses instances of YamlOperator
and DataFrameOperator
to handle YAML files and dataframes respectively.
Key attributes of the DiskOperator
class include:
-
run_name
: The name of the run. -
workflow_name
: The name of the workflow associated with the run. -
yaml_operator
: An instance of theYamlOperator
class that handles reading and writing YAML files. -
dataframe_operator
: An instance of theDataFrameOperator
class that handles reading and writing dataframes.
Key methods of the DiskOperator
class include:
-
read_run
: Reads a run from a YAML file and returns aStepManager
object. -
write_run
: Writes a run to a YAML file. -
read_workflow
: Reads a workflow from a YAML file and returns aStepManager
object. -
export_workflow
: Exports a workflow to a YAML file. -
check_file_validity
: Checks if a file is still needed or if it can be deleted. -
clean_dataframes_dir
: Deletes unnecessary dataframes from the directory.
Determine the correct section where to add the new function, e.g. protzilla/data_analysis/t_test.py
and implement the function - e.g. t_test(...)
.
Create a subclass of Step
(or an appropiate subclass of Step
) the correct file in protzilla/methods/
. Determine the required inputs for the function you created in 1. as well as outputs and the other metadata - take a look at other steps for reference.
Important to note: if not all keys defined in the ´input_keysare present, the input validation will fail. Also, all keys NOT mentioned in the
input_keys` will be removed, as to avoid passing too many parameters.
If necessary, implement the insert_dataframes(...)
method, if the input cannot be passed directly from the frontend and/or information from previous steps is required for calculation.
IMPORTANT: for getting the outputs of previous steps, ALWAYS use the method get_step_output()
of the StepManager
Add the MethodForm class in the corresponding file in ui/runs/forms/
- parameters are added by creating an attribute in the class as a Field - for reference, take a look at other steps.
- in the
input
dictionary that will be passed to the backend, the keys will be the name of the attribute AS IT IS IN THE CODE ITSELF
Example:
class TTestForm(MethodForm):
alpha = CustomFloatField(label="Error rate (alpha)", min_value=0, max_value=1, initial=0.05)
will result in an input dictionary like this: {"alpha": 0.05}
- keep this in mind, as they need to match the parameter names of the function of 1. If this is not feasible, you can rename the keys in the insert_dataframes(...)
method of 3.
If you need to reference other steps, or want to give choices based on previous data of the run, or want to make the input fields dynamic in some way, implement the fill_form(...)
method of the form. This will allow you to activate/deactivate fields, and fill the fields with data. fill_helper.py
provides some useful tools. For reference usage, take a look at other steps that implemented this method.
In the ui/runs/form_mapping.py
file, you need to link your Method
of 2. to the Form
of 3. After doing this, you should be able to add use your new step in PROTzilla.