Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move calculation and plot methods #17

Merged
merged 56 commits into from
Mar 16, 2023
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
a2ce1e5
history, test fixes
brokkoli71 Feb 28, 2023
d77e804
refactor history, introduce ExecutedStep dataclass
fynnkroeger Mar 4, 2023
7d8d651
start using history in run, think about CLI
fynnkroeger Mar 4, 2023
e1dbf08
add max_quant_import
fynnkroeger Mar 5, 2023
1c0cd40
expand CLI example
fynnkroeger Mar 5, 2023
6dfc5ad
add History.pop_step, disk_memory mode
fynnkroeger Mar 5, 2023
f36c286
added back function to run
fynnkroeger Mar 6, 2023
0a63fed
moved "small plots" to protzilla2 and created single methods for each…
saragrau Mar 8, 2023
627b79e
initial exploration of integration plots into data preprocessing
saragrau Mar 8, 2023
f3662a5
add docstrings for history
fynnkroeger Mar 9, 2023
77a6c5e
separated calculation and plots in data preprocessing
saragrau Mar 9, 2023
7f4704f
Merge branch 'history' into move-calculation-and-plot-methods
saragrau Mar 9, 2023
82468db
added create_plot and create_plot_from_location to Run class
saragrau Mar 9, 2023
bf21a93
black
saragrau Mar 9, 2023
55d47c9
small changes and fix test protein filter
saragrau Mar 10, 2023
e49ee78
moved normalisation methods and tests
saragrau Mar 10, 2023
281e2d4
fixed normalisation tests
saragrau Mar 12, 2023
988d0b1
added group_by parameter in normalisation
saragrau Mar 12, 2023
b8cf179
renamed fixtures
saragrau Mar 12, 2023
1a8fb36
moved test_plotting to test_run and renamed it to test_run_calculate_…
saragrau Mar 12, 2023
09a78cb
Merge remote-tracking branch 'origin/main' into move-calculation-and-…
saragrau Mar 12, 2023
02adc23
fixed some merge conflicts
saragrau Mar 12, 2023
546c0bf
added docstring to migrated methds
saragrau Mar 12, 2023
51509f8
add docstring to filter proteins and minor changes
saragrau Mar 12, 2023
b61713e
Merge branch 'main' into move-calculation-and-plot-methods
antonneubauer Mar 13, 2023
7f6c299
reformatting
antonneubauer Mar 13, 2023
62a52cc
Update protzilla/constants/method_mapping.py
saragrau Mar 13, 2023
572475f
renamed method_mapping to location_mapping and added short descriptio…
saragrau Mar 13, 2023
6eeed28
Update protzilla/data_preprocessing/normalisation.py
saragrau Mar 13, 2023
35b2b46
Update protzilla/data_preprocessing/plots.py
saragrau Mar 13, 2023
8e4d0d3
Update protzilla/data_preprocessing/plots.py
saragrau Mar 13, 2023
c942184
added conftest with option show-figures
saragrau Mar 13, 2023
0c87e46
added show_figure to test_filter_proteins and renamed fixture
saragrau Mar 13, 2023
d3c6f3c
normalisation plot methods return now a list
saragrau Mar 13, 2023
b615c79
added z_score normalisation
saragrau Mar 13, 2023
26aece1
small changes in docstring from normalisation and filter proteins
saragrau Mar 13, 2023
1937592
added log_transformation
brokkoli71 Mar 13, 2023
b7f1ffe
added outlier detection methds
saragrau Mar 13, 2023
7c7d0b6
Merge branch 'move-calculation-and-plot-methods' of https://github.co…
saragrau Mar 14, 2023
2803c37
added outlier detection plots
saragrau Mar 14, 2023
e3f204e
added tests for filter_samples
saragrau Mar 14, 2023
b01c6a2
added filter_samples plot
saragrau Mar 14, 2023
8d9b509
added knn imputation
brokkoli71 Mar 14, 2023
b43229c
added all imputation methods
brokkoli71 Mar 14, 2023
f2027be
fixed tests, added plot tests for imputation
brokkoli71 Mar 14, 2023
3ed7475
starting test_plots, not finished
brokkoli71 Mar 14, 2023
5a73658
adding test_plots and fixing pytest_dependencies
brokkoli71 Mar 15, 2023
8260a25
Merge branch 'main' into move-calculation-and-plot-methods
brokkoli71 Mar 15, 2023
330f46a
merge from main
brokkoli71 Mar 15, 2023
b417b11
Apply suggestions from code review
brokkoli71 Mar 16, 2023
119c6d4
adopting changes
brokkoli71 Mar 16, 2023
3870944
Merge branch 'move-calculation-and-plot-methods' of https://github.co…
brokkoli71 Mar 16, 2023
87aa9e5
reformatted with black
brokkoli71 Mar 16, 2023
fc877f6
reformat with precommithook
brokkoli71 Mar 16, 2023
425580e
Merge branch 'main' into move-calculation-and-plot-methods
brokkoli71 Mar 16, 2023
f41b977
remove comment
brokkoli71 Mar 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions protzilla/constants/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,13 @@
PATH_TO_PROJECT = Path(__file__).resolve().parent.parent.parent
PATH_TO_RUNS = Path(f"{PATH_TO_PROJECT}/user_data/runs")
PATH_TO_WORKFLOWS = Path(f"{PATH_TO_PROJECT}/user_data/workflows")

# color schemes
PROTZILLA_DISCRETE_COLOR_SEQUENCE = [
"#4A536A",
"#87A8B9",
"#CE5A5A",
"#8E3325",
"#E2A46D",
]
PROTZILLA_DISCRETE_COLOR_OUTLIER_SEQUENCE = ["#4A536A", "#CE5A5A"]
61 changes: 54 additions & 7 deletions protzilla/constants/method_mapping.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,62 @@
from ..data_preprocessing import filter_proteins, filter_samples
from ..importing import main_data_import
from ..data_preprocessing import filter_proteins, filter_samples, normalisation


saragrau marked this conversation as resolved.
Show resolved Hide resolved
method_map = {
saragrau marked this conversation as resolved.
Show resolved Hide resolved
(
"data_preprocessing",
"filter_proteins",
"by_low_frequency",
"importing",
"main-data-import",
"ms-data-import",
): main_data_import.max_quant_import,
(
"data-preprocessing",
"filter-proteins",
"low-frequency-filter",
): filter_proteins.by_low_frequency,
(
"data_preprocessing",
"filter_samples",
"by_protein_intensity_sum",
"data-preprocessing",
"filter-proteins",
"protein-intensity-sum-filter",
): filter_samples.by_protein_intensity_sum,
(
"data_preprocessing",
"normalisation",
"median",
): normalisation.by_median,
(
"data_preprocessing",
"normalisation",
"totalsum",
): normalisation.by_totalsum,
(
"data_preprocessing",
"normalisation",
"ref-protein",
): normalisation.by_reference_protein,
}

# reverse_map = {v: k for k, v in method_map.items()}
saragrau marked this conversation as resolved.
Show resolved Hide resolved


plot_map = {
(
"data-preprocessing",
saragrau marked this conversation as resolved.
Show resolved Hide resolved
"filter-proteins",
"low-frequency-filter",
): filter_proteins.by_low_frequency_plot,
(
"data_preprocessing",
"normalisation",
"median",
): normalisation.by_median_plot,
(
"data_preprocessing",
"normalisation",
"totalsum",
): normalisation.by_totalsum_plot,
(
"data_preprocessing",
"normalisation",
"ref-protein",
): normalisation.by_reference_protein_plot,
}
Empty file.
48 changes: 47 additions & 1 deletion protzilla/data_preprocessing/filter_proteins.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,28 @@
from protzilla.data_preprocessing.plots import create_bar_plot, create_pie_plot

from ..utilities.transform_dfs import long_to_wide


def by_low_frequency(intensity_df, threshold):
"""
This function filters proteins with a low frequency of occurrence from
a protein dataframe based on a set threshold. The threshold is defined
by the relative amount of samples a protein is detected in.

:param intensity_df: the dataframe that should be filtered\
in long format
:type intensity_df: pd.DataFrame
:param threshold: float ranging from 0 to 1. Defining the\
relative share of samples the proteins should be present in\
order to be included. Example 0.5 - all proteins with intensities\
brokkoli71 marked this conversation as resolved.
Show resolved Hide resolved
equal to zero in at least 50% of samples are discarded. Default: 0.5
:type threshold: float
:return: returns a Dataframe with the samples that meet the\
filtering criteria, a dict with a list with names of samples\
that were discarded and a list with names of samples\
that were kept
:rtype: list
"""
min_threshold = threshold * len(intensity_df.Sample.unique())
transformed_df = long_to_wide(intensity_df)

Expand All @@ -14,5 +35,30 @@ def by_low_frequency(intensity_df, threshold):
# TODO: might be redundant to remaining_proteins
return (
intensity_df[~(intensity_df["Protein ID"].isin(filtered_proteins_list))],
dict(filtered_proteins=filtered_proteins_list),
dict(
filtered_proteins=filtered_proteins_list,
remaining_proteins=remaining_proteins.tolist(),
),
)


def by_low_frequency_plot(df, result_df, current_out, graph_type):
if graph_type == "Pie chart":
fig = create_pie_plot(
values_of_sectors=[
len(current_out["remaining_proteins"]),
len(current_out["filtered_proteins"]),
],
names_of_sectors=["Proteins kept", "Proteins filtered"],
heading="Number of Filtered Proteins",
)
if graph_type == "Bar chart":
brokkoli71 marked this conversation as resolved.
Show resolved Hide resolved
fig = create_bar_plot(
values_of_sectors=[
len(current_out["remaining_proteins"]),
len(current_out["filtered_proteins"]),
],
names_of_sectors=["Proteins kept", "Proteins filtered"],
heading="Number of Filtered Proteins",
)
return [fig]
Loading