function to find top 10 recalibrant series #44

KristinaGomoryova · 2024-08-22T17:44:01Z

This PR implements a function to find 10 most suitable recalibrant series and resolves #22

hechth

Good first step, please see the comments on what should be changed.

R/FindRecalSeries.R

hechth · 2024-09-02T09:24:30Z

R/FindRecalSeries.R

+tolerance <- 100
+global_min <- min(df$Min.Mass.Range) + tolerance
+global_max <- max(df$Max.Mass.Range) - tolerance
+
+# Create all combinations of ions
+iter <- combinations(nrow(df), 5, v = 1:nrow(df))
+
+# Helper dataframe with information which combinations do cover range
+coversRange <- data.frame(iter, coversRange = 0)
+
+# Check if the combinations cover the whole data range
+for (i in 1:nrow(iter)) {
+  comb <- iter[i, ]
+  subset <- df[comb, ]
+  local_min <- min(subset$Min.Mass.Range)
+  local_max <- max(subset$Max.Mass.Range)
+  if (local_min <= global_min & local_max >= global_max) {
+    coversRange$coversRange[i] <- 1
+  } 
+}
+
+# Subset only those, which cover whole range
+coversRangeTrue <- coversRange[coversRange$coversRange == 1, ]


This should also be its own function which creates the combinations and filters them based on the coverage criteria. The tall peaks every 100mz should also be included as a optional criterion - so a boolean parameter should be added.

R/FindRecalSeries.R

hechth · 2024-09-02T09:27:22Z

R/FindRecalSeries.R

+for (i in 1:nrow(coversRangeTrue)) {
+  comb <- iter[i, ]
+  subset <- df[comb, ]
+  comb_score <- score_combination(subset)
+  scores <- append(scores, list(comb_score))
+}


This section can be parallelized single function call

hechth · 2024-09-02T09:27:52Z

R/FindRecalSeries.R

+# Append all scored combinations into a dataframe
+scores_df <- do.call(rbind, lapply(scores, as.data.frame))


This can be part of the combination statement of the parallel for loop

hechth · 2024-09-02T09:30:51Z

R/FindRecalSeries.R

+  local_min <- min(subset$Min.Mass.Range)
+  local_max <- max(subset$Max.Mass.Range)
+  if (local_min <= global_min & local_max >= global_max) {
+    coversRange$coversRange[i] <- 1
+  } 


This first rough check should actually be removed and replaced with the finer detailed check and the actual coverage percentage which is currently part of the score calculation.

hechth · 2024-09-02T09:34:05Z

R/FindRecalSeries.R

+# Filter for the 10 top scoring series
+finalSeries <- scores_df %>%
+  filter(coverage_percent > 90) %>%
+  rowwise() %>%
+  mutate(sum_score = sum(total_abundance, total_series_length, peak_proximity, peak_distance_proximity, coverage_percent)) %>%
+  arrange(desc(sum_score)) %>%
+  filter(!duplicated(series)) %>%
+  head(10)


This should be moved into its own function to choose the best series and the coverage percent should be part of the combination filtering.

hechth · 2024-09-02T09:35:25Z

R/FindRecalSeries.R

+  head(10)
+
+# Return the top scoring series
+return(finalSeries)


this should be kept as the main output, maybe we can also do something with the scoring and create some report or so?

hechth · 2024-09-02T09:36:00Z

R/FindRecalSeries.R

+  mutate(sum_score = sum(total_abundance, total_series_length, peak_proximity, peak_distance_proximity, coverage_percent)) %>%
+  arrange(desc(sum_score)) %>%
+  filter(!duplicated(series)) %>%
+  head(10)


The filling in to 10 should be removed or actually made optional.

R/FindRecalSeries.R

hechth · 2024-09-02T12:23:01Z

R/FindRecalSeries.R

 #' @param df An output from RecalList, containing recalibrant CH2 series.
 #' @return A dataframe of 10 best-scoring series.



missing descriptions for the other parameters

R/FindRecalSeries.R

…ents

function to find top 10 recalibrant series

9503c51

KristinaGomoryova requested a review from hechth August 22, 2024 17:44

typos corrected

4502399

hechth reviewed Sep 2, 2024

View reviewed changes

KristinaGomoryova added 4 commits September 2, 2024 13:26

filter_input as a function

52978ba

description of the filter_input function added

99a1a37

description of filter_input function added

7ae9894

compute_scores moved out of top-level

bc75cf5

hechth reviewed Sep 2, 2024

View reviewed changes

KristinaGomoryova and others added 21 commits September 2, 2024 14:41

number of combinations and tolerance as arguments

eb6893e

abundance_score_threshold and peak_distance_threshold as global argum…

49dba75

…ents

filter_input changed to filter_recal_series

f65564b

test for compute_coverage

60898f7

test and test data for compute_coverage function

4368a58

gtools added as dependency

368abe0

tests for the compute_combinations and compute_subsets

4c23182

test for selecting final series

8a9445d

test data

7497e64

code refactored

204450f

test for filtering the input recal list

5f37bdd

functions documented

f388877

test for computing final scores

a0c72c8

test for compute_scores

6da3fca

linting done

00fd06b

linting done

066d94a

fixed filling

495a0b4

type annotations, slice_head

e1b7755

Merge branch 'master' into findRecalSeries

3617285

arrange instead of order

54613e2

find_final_series works only partially

dcd4051

upsated test output to ungrouped df

0380c83

hechth merged commit e08ee5f into RECETOX:master Sep 5, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

function to find top 10 recalibrant series #44

function to find top 10 recalibrant series #44

KristinaGomoryova commented Aug 22, 2024

hechth left a comment

hechth Sep 2, 2024

hechth Sep 2, 2024

hechth Sep 2, 2024

hechth Sep 2, 2024

hechth Sep 2, 2024

hechth Sep 2, 2024

hechth Sep 2, 2024

hechth Sep 2, 2024

		# Append all scored combinations into a dataframe
		scores_df <- do.call(rbind, lapply(scores, as.data.frame))

		#' @param df An output from RecalList, containing recalibrant CH2 series.
		#' @return A dataframe of 10 best-scoring series.

function to find top 10 recalibrant series #44

function to find top 10 recalibrant series #44

Conversation

KristinaGomoryova commented Aug 22, 2024

hechth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment