Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

298 histogram of imputed and original values #305

Merged
merged 25 commits into from
Nov 28, 2023

Conversation

JanniRoebbecke
Copy link
Collaborator

In preprocessing/plots.py adapted create_histograms and create_box_plots to show a logarithmic scale, depending on the new parameter visual_transformation. A new button has been added to the ui (in workflow_metha.json) to set this parameter. This is implemented differently in histograms and box-plots, because plotly doesn't offers this option for histograms (where the x-axis needs to be transformed) with dynamic binsize. A new button has been added to the ui (in ) to set this parameter.
To show only the imputed values and not all values we inverted the positions where there are nan values from the original dataframe in imputation.py _build_box_hist_plot.
Histograms can be shown as in the same plot or side by side depending on the new parameter overlay.

PR checklist

  • main-branch has been merged into local branch to resolve conflicts
  • tests and linter have passed AFTER local merge
  • at least one other dev reviewed and approved
  • documentation
  • tests

@JanniRoebbecke JanniRoebbecke linked an issue Nov 17, 2023 that may be closed by this pull request
3 tasks
Copy link

github-actions bot commented Nov 17, 2023

Coverage report

The coverage rate went from 70.25% to 70.09% ⬇️
The branch rate is 63%.

61.22% of new lines are covered.

Diff Coverage details (click to unfold)

protzilla/data_preprocessing/imputation.py

100% of new lines are covered (97.05% of the complete file).

protzilla/data_preprocessing/plots.py

81.48% of new lines are covered (85.21% of the complete file).
Missing lines: 233, 245, 246, 278, 289

protzilla/data_preprocessing/plots_helper.py

22.22% of new lines are covered (13.04% of the complete file).
Missing lines: 15, 16, 17, 18, 19, 20, 21, 23, 24, 31, 38, 39, 40, 47

@selenabr
Copy link
Collaborator

linear histogram of imputed and original values is not working anymore (loading)...

@@ -379,7 +444,7 @@ def number_of_imputed_values(input_df, result_df):


def _build_box_hist_plot(
df, result_df, graph_type, graph_type_quantities, group_by
df, result_df, graph_type, graph_type_quantities, group_by, visual_transformation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not consistent to previous format (e.g. 433-438)

@@ -116,6 +119,7 @@ def create_box_plots(
y_title="",
x_title="",
group_by: str = "None",
visual_transformation="linear",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify the types for the parameters for consistency.
(e.g. visual_transformation: str = "linear")

@@ -204,6 +208,8 @@ def create_box_plots(
"yanchor": "top",
},
)
if visual_transformation == "log10":
fig.update_yaxes(type="log")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there this inconsistency with "log10" and "log"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"log" is the definition used in plotly, because log10 is the only possible (intended) use in plotly. we decided to use "log10" to make explicit which base is used

@@ -216,6 +222,8 @@ def create_histograms(
heading="",
y_title="",
x_title="",
visual_transformation="linear",
overlay=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

temp = math.floor(np.log10(max - min) / 2)
step_size = pow(10, temp)
first_step = math.ceil(min / step_size) * step_size
last_step = math.ceil(max / step_size) * step_size + 3 * step_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment to explain what your calculations are doing (why 3*step_size)

@@ -227,7 +227,7 @@ def test_delete_step(example_workflow_short, tests_folder_name):
assert len(importing_steps["steps"]) == count - 1


def test_export_plot(tests_folder_name):
"""def test_export_plot(tests_folder_name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be removed?

janni.roebbecke added 3 commits November 22, 2023 19:11
@JanniRoebbecke JanniRoebbecke merged commit 13b0abb into main Nov 28, 2023
1 check passed
@JanniRoebbecke JanniRoebbecke deleted the 298-histogram-of-imputed-and-original-values branch November 30, 2023 10:30
@JanniRoebbecke JanniRoebbecke restored the 298-histogram-of-imputed-and-original-values branch November 30, 2023 10:30
@JanniRoebbecke JanniRoebbecke deleted the 298-histogram-of-imputed-and-original-values branch December 5, 2023 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Histogram of imputed and original Values
4 participants