-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
function to find top 10 recalibrant series #44
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good first step, please see the comments on what should be changed.
R/FindRecalSeries.R
Outdated
tolerance <- 100 | ||
global_min <- min(df$Min.Mass.Range) + tolerance | ||
global_max <- max(df$Max.Mass.Range) - tolerance | ||
|
||
# Create all combinations of ions | ||
iter <- combinations(nrow(df), 5, v = 1:nrow(df)) | ||
|
||
# Helper dataframe with information which combinations do cover range | ||
coversRange <- data.frame(iter, coversRange = 0) | ||
|
||
# Check if the combinations cover the whole data range | ||
for (i in 1:nrow(iter)) { | ||
comb <- iter[i, ] | ||
subset <- df[comb, ] | ||
local_min <- min(subset$Min.Mass.Range) | ||
local_max <- max(subset$Max.Mass.Range) | ||
if (local_min <= global_min & local_max >= global_max) { | ||
coversRange$coversRange[i] <- 1 | ||
} | ||
} | ||
|
||
# Subset only those, which cover whole range | ||
coversRangeTrue <- coversRange[coversRange$coversRange == 1, ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be its own function which creates the combinations and filters them based on the coverage criteria. The tall peaks every 100mz should also be included as a optional criterion - so a boolean parameter should be added.
R/FindRecalSeries.R
Outdated
for (i in 1:nrow(coversRangeTrue)) { | ||
comb <- iter[i, ] | ||
subset <- df[comb, ] | ||
comb_score <- score_combination(subset) | ||
scores <- append(scores, list(comb_score)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section can be parallelized single function call
R/FindRecalSeries.R
Outdated
# Append all scored combinations into a dataframe | ||
scores_df <- do.call(rbind, lapply(scores, as.data.frame)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be part of the combination statement of the parallel for loop
R/FindRecalSeries.R
Outdated
local_min <- min(subset$Min.Mass.Range) | ||
local_max <- max(subset$Max.Mass.Range) | ||
if (local_min <= global_min & local_max >= global_max) { | ||
coversRange$coversRange[i] <- 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This first rough check should actually be removed and replaced with the finer detailed check and the actual coverage percentage which is currently part of the score calculation.
R/FindRecalSeries.R
Outdated
# Filter for the 10 top scoring series | ||
finalSeries <- scores_df %>% | ||
filter(coverage_percent > 90) %>% | ||
rowwise() %>% | ||
mutate(sum_score = sum(total_abundance, total_series_length, peak_proximity, peak_distance_proximity, coverage_percent)) %>% | ||
arrange(desc(sum_score)) %>% | ||
filter(!duplicated(series)) %>% | ||
head(10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved into its own function to choose the best series and the coverage percent should be part of the combination filtering.
R/FindRecalSeries.R
Outdated
head(10) | ||
|
||
# Return the top scoring series | ||
return(finalSeries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be kept as the main output, maybe we can also do something with the scoring and create some report or so?
R/FindRecalSeries.R
Outdated
mutate(sum_score = sum(total_abundance, total_series_length, peak_proximity, peak_distance_proximity, coverage_percent)) %>% | ||
arrange(desc(sum_score)) %>% | ||
filter(!duplicated(series)) %>% | ||
head(10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filling in to 10 should be removed or actually made optional.
R/FindRecalSeries.R
Outdated
#' @param df An output from RecalList, containing recalibrant CH2 series. | ||
#' @return A dataframe of 10 best-scoring series. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing descriptions for the other parameters
This PR implements a function to find 10 most suitable recalibrant series and resolves #22