Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error messages in MFAssignCHO() and MFAssign() #28

Open
mooaw opened this issue Mar 3, 2023 · 9 comments
Open

Error messages in MFAssignCHO() and MFAssign() #28

mooaw opened this issue Mar 3, 2023 · 9 comments

Comments

@mooaw
Copy link

mooaw commented Mar 3, 2023

Hi,

Thanks for building this beautiful package, and it has been extremely helpful for my research.

I encountered the following error messages while using MFAssignCHO() and MFAssign():

1: There were 5210 warnings in `dplyr::filter()`.
The first warning was:
ℹ In argument: `... & CH2_num != (min(CH2_num[CH2_num != min(CH2_num)]) + 3)`.
ℹ In group 1: `KMDTest = -0.641`, `zstar = -6`.
Caused by warning in `min()`:
! no non-missing arguments to min; returning Inf
ℹ Run `dplyr::last_dplyr_warnings()` to see the 5209 remaining warnings. 
2: There were 5210 warnings in `dplyr::filter()`.
The first warning was:
ℹ In argument: `... | CH2_num == (min(CH2_num[CH2_num != min(CH2_num)]) + 3)`.
ℹ In group 1: `KMDTest = -0.641`, `zstar = -6`.
Caused by warning in `min()`:
! no non-missing arguments to min; returning Inf
ℹ Run `dplyr::last_dplyr_warnings()` to see the 5209 remaining warnings.

I can still get assigned formulas despite these warnings. Somehow it is slower to assign formulas when these errors are present. It takes close to 10 minutes when using MFAssignCHO() on around 4000–5000 masses, but faster when using MFAssign() for the final assignment.

@skschum
Copy link
Owner

skschum commented Mar 9, 2023 via email

@mooaw
Copy link
Author

mooaw commented Mar 9, 2023

Hi,

Sure. I'm using R 4.1.3 and in RStudio 2021.09.0. I used the latest version of MFAssignR, and specifically the files in the folder that says it's compatible with R > 4.0. I suspect the issue may result from unreasonable mass peaks after filtering out noise peaks, but I may be wrong. Attached are my script (converted to a .txt file) and a peak list file (.csv) generated on an Orbitrap MS. When I tested my script on the example dataset included in the package, it didn't give me any error messages. They only showed up when I was processing my own data.

DOM_2020.csv
MF assignment for DOM_2020.txt

Thanks for your help!

@skschum
Copy link
Owner

skschum commented Mar 13, 2023 via email

@skschum
Copy link
Owner

skschum commented Mar 13, 2023 via email

@skschum
Copy link
Owner

skschum commented Mar 13, 2023 via email

@mooaw
Copy link
Author

mooaw commented Mar 15, 2023

Hi Simeon,

Thanks for looking into this issue. I upgraded my R to 4.2.2 (and also RStudio) following your suggestions. The issue persisted when I ran my own code. I couldn't find the RMarkdown file you mentioned in the previous comment, so I tested the example RMarkdown file you included in the package.

When I first ran the RMarkdown script, there were no errors or warning messages after running MFAssignCHO_RMD(). But I noticed in the chunk options, you set warning to be FALSE, which I suspected of suppressing the display of warning messages. I tested again by setting warning to be TRUE, and these warning messages appeared again, as expected.

The other test I did was to set the signal-noise ratio much higher, which gave me many fewer peaks to be processed by the formula assignment function. After doing that, the number of warnings greatly decreased. This echos my previous feeling that these warnings might come from chemically unreasonable mass peaks, such that they didn't form a series. I'm not sure if this makes sense or not.

Anyway, the assignment function works and I still get molecular formulas. I can look into your code when I have time and try to figure out where exactly went wrong. Don't worries about not having clear notes :)

PS: thanks for sharing the note on the CHOFIT algorithm, and now I understand what Ox does.

@mooaw
Copy link
Author

mooaw commented Mar 17, 2023

Hi Simeon,

I have found a solution to the issue here. The warnings result from lines 254, 260, and 274 of the MFAssign() function and lines 203, 209, and 223 of the MFAssignCHO() function. The original codes are:

peaksend <- dplyr::filter(Test, CH2_num !=0 & CH2_num != (min(CH2_num[CH2_num!=min(CH2_num)])+1)& CH2_num != (min(CH2_num[CH2_num!=min(CH2_num)])+3)) or
peaks <- dplyr::filter(Test, CH2_num ==0 | CH2_num == (min(CH2_num[CH2_num!=min(CH2_num)])+1) | CH2_num == (min(CH2_num[CH2_num!=min(CH2_num)])+3))

In my data, when masses are grouped by their Kendrick mass defects, many groups contain only one mass. When these groups are present, CH2_num[CH2_num!=min(CH2_num)] gives an empty vector, which can't be handled by the base min() function. This causes the code to spit out the above error messages and significantly slows down the run. My understanding of your code here is that you wanted to select the ones that have CH2 numbers of 0, or 1 more than the second smallest CH2 number, or 3 more than the second smallest CH2 number, all within a Kendrick mass series. To fix this issue, I changed the code to following:

peaksend <- dplyr::filter(Test, CH2_num !=0 & CH2_num != (hablar::min_(CH2_num[CH2_num!=min(CH2_num)]) + 1) & CH2_num != (hablar::min_(CH2_num[CH2_num!=min(CH2_num)]) + 3)) and
peaks <- dplyr::filter(Test, CH2_num == 0 | CH2_num == (hablar::min_(CH2_num[CH2_num!=min(CH2_num)]) + 1) | CH2_num == (hablar::min_(CH2_num[CH2_num!=min(CH2_num)]) + 3))

I'm using the min_() function in the hablar package. This way, min_() will simply return NA if a vector is empty and solves the problem. It significantly speeds up my code, and now I can get assignments in less than a minute even for over 5000 masses.

I also noticed other issues during debugging:

  • The first is that peaksend doesn't have an obvious use in these functions. It is not referenced again later in the code after you define it. Do you have something in mind for the use of this data?
  • A very small number of the masses produce CH2 numbers that are not integers, and they tend be the higher MW ones. Maybe this comes from a rounding error?

Let me know what you think.

@skschum
Copy link
Owner

skschum commented Mar 17, 2023 via email

@mooaw
Copy link
Author

mooaw commented Mar 17, 2023

Hi Simeon,

Sounds good. Looking forward to your code, and I'm happy to test it too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants