Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample size calculation #519

Draft
wants to merge 35 commits into
base: dev
Choose a base branch
from
Draft

Sample size calculation #519

wants to merge 35 commits into from

Conversation

selenabr
Copy link
Collaborator

Description

Implementation of sample_size_calculation to calculate the required sample size for a selected protein. For this purpose, the variance method was added and an output field was implemented to display the result. The variance and the sample size calculation-method are tested with test-data in test_power_analysis.py

Changes

sample_size_calculation + variance method:

  • protzilla/data_analysis/power_analysis.py
  • ui/runs/forms/data_analysis.py
  • protzilla/methods/data_analysis.py

Display output field:

  • ui/runs/templates/runs/details.html
  • ui/runs/views.py
  • protzilla/steps.py

Test:

  • tests/protzilla/data_analysis/test_power_analysis.py

Mergeability

  • main-branch has been merged into local branch to resolve conflicts
  • The tests and linter have passed AFTER local merge
  • The code has been formatted with black

Code review

  • I have self-reviewed my code.
  • At least one other developer reviewed and approved the changes

Copy link
Collaborator

@hendraet hendraet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. I have written pretty much the same code for the calculation :)

As usual, there are still minor things that could be improved, but that's mainly code style.

Comment on lines 28 to 29
if intensity_name is None:
intensity_name = "Normalised iBAQ"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumes that data has to be normalized before feeding into the step. Otherwise the column doesn't exist. I would say that is an unnecessary limitation that is not transparent to the user

Comment on lines 28 to 29
if intensity_name is None:
intensity_name = "Normalised iBAQ"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this could just the default argument if it is set anyways. Or is there a reason why the default has to be None?

protzilla/data_analysis/power_analysis.py Outdated Show resolved Hide resolved
protzilla/data_analysis/power_analysis.py Outdated Show resolved Hide resolved
self.fields["t_test_results"].choices = get_t_test_results(run)

class PowerAnalysisSampleSizeCalculationForm(MethodForm):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, you should run a code formatter. Often there too few blank lines (between methods or here above the class, you should leave two empty lines) and unnecessary whitespaces around equal signs (as below in the ...Field()

intensity_name=intensity_name,
)
sample_size = differentially_expressed_proteins_df.groupby('Group')['Sample'].count()
z_beta = fc_threshold * np.sqrt(sample_size/(2*variance_protein_group**2))-z_alpha
Copy link
Collaborator

@hendraet hendraet Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
z_beta = fc_threshold * np.sqrt(sample_size/(2*variance_protein_group**2))-z_alpha
z_beta = fc_threshold * np.sqrt(sample_size / (2 * variance_protein_group)) - z_alpha

I think the square is too much since we are already dealing with variances and not standard deviations. (Also some minor formatting issues)

Comment on lines 310 to 320
fig = go.Figure()

fig.add_trace(
go.Violin(
x=["Protein Groups"] * len(required_sample_sizes),
y=required_sample_sizes,
line_color=colors[1],
**violin_plot_args
)
)
fig.update_layout(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't add traces to a figure dynamically (e.g. in a for loop), you can also pass the trace directly to go.Figure()

Comment on lines 301 to 308
violin_plot_args = dict(
meanline_visible=True,
box_visible=True,
scalemode='width',
spanmode='hard',
span=[0, required_sample_size_for_all_proteins],
fillcolor='rgba(0,0,0,0)'
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would pass these arguments directly to go.Violin(). Since you are not reusing them somewhere else, it just makes the code harder to read because these args are in a different place


fig.add_trace(
go.Violin(
x=["Protein Groups"] * len(required_sample_sizes),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would omit the x parameter and just use name="Protein Groups". It's easier to read

@henninggaertner henninggaertner marked this pull request as draft November 21, 2024 14:41
@henninggaertner henninggaertner changed the title WIP: Sample size calculation Sample size calculation Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants