-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error messages in MFAssignCHO() and MFAssign() #28
Comments
Hello Jianshu,
Thank you for your interest in MFAssignR.
Yes, those times are definitely much longer than they should be to run
those samples. Could you let me know what version of R, RStudio and
MFAssignR you are using? A year or so back I had to fix a few parts of
MFAssignR to ensure its compatibility with R. Also, if you would be willing
to send the script that you are using so I can see if I can recreate the
error, that would be helpful.
If you have any other questions, please let me know.
Thanks,
Simeon
…On Fri, Mar 3, 2023 at 2:12 PM Jianshu Duan ***@***.***> wrote:
Hi,
Thanks for building this beautiful package, and it has been extremely
helpful for my research.
I encountered the following error messages while using MFAssignCHO() and
MFAssign():
1: There were 5210 warnings in `dplyr::filter()`.
The first warning was:
ℹ In argument: `... & CH2_num != (min(CH2_num[CH2_num != min(CH2_num)]) + 3)`.
ℹ In group 1: `KMDTest = -0.641`, `zstar = -6`.
Caused by warning in `min()`:
! no non-missing arguments to min; returning Inf
ℹ Run `dplyr::last_dplyr_warnings()` to see the 5209 remaining warnings.
2: There were 5210 warnings in `dplyr::filter()`.
The first warning was:
ℹ In argument: `... | CH2_num == (min(CH2_num[CH2_num != min(CH2_num)]) + 3)`.
ℹ In group 1: `KMDTest = -0.641`, `zstar = -6`.
Caused by warning in `min()`:
! no non-missing arguments to min; returning Inf
ℹ Run `dplyr::last_dplyr_warnings()` to see the 5209 remaining warnings.
I can still get assigned formulas despite these warnings. Somehow it is
slower to assign formulas when these errors are present. It takes close to
10 minutes when using MFAssignCHO() on around 4000–5000 masses, but faster
when using MFAssign() for the final assignment.
—
Reply to this email directly, view it on GitHub
<#28>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF72HFEK3VKGPMLYVRDXBE3W2I7BNANCNFSM6AAAAAAVO64PRE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi, Sure. I'm using R 4.1.3 and in RStudio 2021.09.0. I used the latest version of MFAssignR, and specifically the files in the folder that says it's compatible with R > 4.0. I suspect the issue may result from unreasonable mass peaks after filtering out noise peaks, but I may be wrong. Attached are my script (converted to a .txt file) and a peak list file (.csv) generated on an Orbitrap MS. When I tested my script on the example dataset included in the package, it didn't give me any error messages. They only showed up when I was processing my own data. DOM_2020.csv Thanks for your help! |
Hello Jianshu,
Thanks for the documents, I will be looking at them. However, it may be
worth it to you to upgrade to a newer version of R and try it again, there
were some issues in a newer version of R (between 4.0 and now) that I had
to fix in MFAssignR. I know that R version 4.2.1 works with MFAssignR, but
I am not sure about version 4.2.3 (the latest version) so you can try to
upgrade to 4.2.1 perhaps and see if the issue is addressed. Or you can go
to R version 4.2.3 and see if that works for you as well.
Apologies for the unclear notes on MFAssignR, I am still more of a chemist
than a computer code writer.
I will take a look at what you sent me, but my first suggestion would be to
update R to the newest version and see if that addresses your issue. If it
doesn't, or causes other problems, I would recommend installing a previous
version of R such as 4.2.1 because that is what is currently on my computer
and it seems to work. I will be checking whether or not I have problems
with the newest version and working to fix any problems that may come up.
Also, I am attaching the version of MFAssignR I have on my computer, it
should be the same as the version on Github, but sometimes people have had
issues getting the files to work off of Github.
Let me know if you have any other questions.
Thanks,
Simeon
…On Thu, Mar 9, 2023 at 10:47 AM Jianshu Duan ***@***.***> wrote:
Hi,
Sure. I'm using R 4.1.3 and in RStudio 2021.09.0. I used the latest
version of MFAssignR, and specifically the files in the folder that says
it's compatible with R > 4.0. I suspect the issue may result from
unreasonable mass peaks after filtering out noise peaks, but I may be
wrong. Attached are my script (converted to a .txt file) and a peak list
file (.csv) generated on an Orbitrap MS. When I tested my script on the
example dataset included in the package, it didn't give me any error
messages. They only showed up when I was processing my own data.
DOM_2020.csv
<https://github.com/skschum/MFAssignR/files/10933391/DOM_2020.csv>
MF assignment for DOM_2020.txt
<https://github.com/skschum/MFAssignR/files/10933401/MF.assignment.for.DOM_2020.txt>
Thanks for your help!
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF72HFAZT64FK55OVAMDO23W3H3PTANCNFSM6AAAAAAVO64PRE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hello Jianshu,
As a quick additional comment, it looks like R version 4.2.3 is still in
pre-release. It will be released soon, but if you just want to update to
version 4.2.2 that would be fine as well.
Thanks,
Simeon
…On Mon, Mar 13, 2023 at 3:22 AM Simeon Schum ***@***.***> wrote:
Hello Jianshu,
Thanks for the documents, I will be looking at them. However, it may be
worth it to you to upgrade to a newer version of R and try it again, there
were some issues in a newer version of R (between 4.0 and now) that I had
to fix in MFAssignR. I know that R version 4.2.1 works with MFAssignR, but
I am not sure about version 4.2.3 (the latest version) so you can try to
upgrade to 4.2.1 perhaps and see if the issue is addressed. Or you can go
to R version 4.2.3 and see if that works for you as well.
Apologies for the unclear notes on MFAssignR, I am still more of a chemist
than a computer code writer.
I will take a look at what you sent me, but my first suggestion would be
to update R to the newest version and see if that addresses your issue. If
it doesn't, or causes other problems, I would recommend installing a
previous version of R such as 4.2.1 because that is what is currently on my
computer and it seems to work. I will be checking whether or not I have
problems with the newest version and working to fix any problems that may
come up.
Also, I am attaching the version of MFAssignR I have on my computer, it
should be the same as the version on Github, but sometimes people have had
issues getting the files to work off of Github.
Let me know if you have any other questions.
Thanks,
Simeon
On Thu, Mar 9, 2023 at 10:47 AM Jianshu Duan ***@***.***>
wrote:
> Hi,
>
> Sure. I'm using R 4.1.3 and in RStudio 2021.09.0. I used the latest
> version of MFAssignR, and specifically the files in the folder that says
> it's compatible with R > 4.0. I suspect the issue may result from
> unreasonable mass peaks after filtering out noise peaks, but I may be
> wrong. Attached are my script (converted to a .txt file) and a peak list
> file (.csv) generated on an Orbitrap MS. When I tested my script on the
> example dataset included in the package, it didn't give me any error
> messages. They only showed up when I was processing my own data.
>
> DOM_2020.csv
> <https://github.com/skschum/MFAssignR/files/10933391/DOM_2020.csv>
> MF assignment for DOM_2020.txt
> <https://github.com/skschum/MFAssignR/files/10933401/MF.assignment.for.DOM_2020.txt>
>
> Thanks for your help!
>
> —
> Reply to this email directly, view it on GitHub
> <#28 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AF72HFAZT64FK55OVAMDO23W3H3PTANCNFSM6AAAAAAVO64PRE>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
Hello Jianshu,
I just ran through your code on my machine and didn't run into any issues
(your data looks nice), so I think it may be related to the version of R
that you are currently running. I am attaching the script that I ran, but
it is just the txt file you sent me pasted into an RMarkdown file, so there
shouldn't be any differences really. Also, just so you know, you don't have
to set Ox to 40, the CHOFIT core algorithm takes care of C, H, and O
elements without having to set explicit limits. All Ox really does is
explicitly define the number of loops that the function will run. It
doesn't hurt your assignment either way, but I just thought I would let you
know.
I would recommend updating to version 4.2.2 and trying again, hopefully it
will work. If not, please let me know.
Thanks,
Simeon
…On Mon, Mar 13, 2023 at 3:43 AM Simeon Schum ***@***.***> wrote:
Hello Jianshu,
As a quick additional comment, it looks like R version 4.2.3 is still in
pre-release. It will be released soon, but if you just want to update to
version 4.2.2 that would be fine as well.
Thanks,
Simeon
On Mon, Mar 13, 2023 at 3:22 AM Simeon Schum ***@***.***> wrote:
> Hello Jianshu,
>
> Thanks for the documents, I will be looking at them. However, it may be
> worth it to you to upgrade to a newer version of R and try it again, there
> were some issues in a newer version of R (between 4.0 and now) that I had
> to fix in MFAssignR. I know that R version 4.2.1 works with MFAssignR, but
> I am not sure about version 4.2.3 (the latest version) so you can try to
> upgrade to 4.2.1 perhaps and see if the issue is addressed. Or you can go
> to R version 4.2.3 and see if that works for you as well.
>
> Apologies for the unclear notes on MFAssignR, I am still more of a
> chemist than a computer code writer.
>
> I will take a look at what you sent me, but my first suggestion would be
> to update R to the newest version and see if that addresses your issue. If
> it doesn't, or causes other problems, I would recommend installing a
> previous version of R such as 4.2.1 because that is what is currently on my
> computer and it seems to work. I will be checking whether or not I have
> problems with the newest version and working to fix any problems that may
> come up.
>
> Also, I am attaching the version of MFAssignR I have on my computer, it
> should be the same as the version on Github, but sometimes people have had
> issues getting the files to work off of Github.
>
> Let me know if you have any other questions.
>
> Thanks,
> Simeon
>
> On Thu, Mar 9, 2023 at 10:47 AM Jianshu Duan ***@***.***>
> wrote:
>
>> Hi,
>>
>> Sure. I'm using R 4.1.3 and in RStudio 2021.09.0. I used the latest
>> version of MFAssignR, and specifically the files in the folder that says
>> it's compatible with R > 4.0. I suspect the issue may result from
>> unreasonable mass peaks after filtering out noise peaks, but I may be
>> wrong. Attached are my script (converted to a .txt file) and a peak list
>> file (.csv) generated on an Orbitrap MS. When I tested my script on the
>> example dataset included in the package, it didn't give me any error
>> messages. They only showed up when I was processing my own data.
>>
>> DOM_2020.csv
>> <https://github.com/skschum/MFAssignR/files/10933391/DOM_2020.csv>
>> MF assignment for DOM_2020.txt
>> <https://github.com/skschum/MFAssignR/files/10933401/MF.assignment.for.DOM_2020.txt>
>>
>> Thanks for your help!
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#28 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AF72HFAZT64FK55OVAMDO23W3H3PTANCNFSM6AAAAAAVO64PRE>
>> .
>> You are receiving this because you commented.Message ID:
>> ***@***.***>
>>
>
|
Hi Simeon, Thanks for looking into this issue. I upgraded my R to 4.2.2 (and also RStudio) following your suggestions. The issue persisted when I ran my own code. I couldn't find the RMarkdown file you mentioned in the previous comment, so I tested the example RMarkdown file you included in the package. When I first ran the RMarkdown script, there were no errors or warning messages after running MFAssignCHO_RMD(). But I noticed in the chunk options, you set The other test I did was to set the signal-noise ratio much higher, which gave me many fewer peaks to be processed by the formula assignment function. After doing that, the number of warnings greatly decreased. This echos my previous feeling that these warnings might come from chemically unreasonable mass peaks, such that they didn't form a series. I'm not sure if this makes sense or not. Anyway, the assignment function works and I still get molecular formulas. I can look into your code when I have time and try to figure out where exactly went wrong. Don't worries about not having clear notes :) PS: thanks for sharing the note on the CHOFIT algorithm, and now I understand what Ox does. |
Hi Simeon, I have found a solution to the issue here. The warnings result from lines 254, 260, and 274 of the MFAssign() function and lines 203, 209, and 223 of the MFAssignCHO() function. The original codes are:
In my data, when masses are grouped by their Kendrick mass defects, many groups contain only one mass. When these groups are present,
I'm using the min_() function in the hablar package. This way, min_() will simply return NA if a vector is empty and solves the problem. It significantly speeds up my code, and now I can get assignments in less than a minute even for over 5000 masses. I also noticed other issues during debugging:
Let me know what you think. |
Hello Jianshu,
That is very interesting, thank you for your time in investigating it. You
are correct that I wasn't seeing the warnings because I typically turned
them off and generally didn't think about them since the code was doing
what I wanted, however it is definitely better to have them fixed.
I looked into those problematic lines of code to remember exactly what
their purpose was, and you were correct that the peaksend was not doing
anything, at one point I believe I needed them to address an issue at the
end of the function, but I must have found a way around that, but didn't
remember the peaksend was now redundant. It can be removed without any
problems.
As for the actually useful "peaks", the purpose of them when MSMS = "off"
is to select different members of the CH2 homologous series and have it be
flexible enough to pick members even if there was a hole in the series.
Since it was causing problems I looked into better ways to do it, similar
to your using a function from the hablar package. I was able to simplify
the code and make it so that the warnings no longer occur while still using
the original "min" function. I need to do a bit more testing to ensure that
I haven't broken anything else, but if you are interested I would be glad
to send it to you so you can try it and see if it also addresses your speed
problem.
I wasn't able to observe your speed issue myself with or without the
warnings, but if removing the warnings seems to have addressed your issue,
that is great.
When I am done testing I will send you the version of the functions that I
put together that eliminated the errors if you would like to try it out. Of
course you are always welcome to keep using your version of it.
Thanks again for your interest in the package and your troubleshooting, it
helped address a problem that I had not really noticed due to the way I
normally run the code.
Thanks,
Simeon
…On Thu, Mar 16, 2023 at 11:25 PM Jianshu Duan ***@***.***> wrote:
Hi Simeon,
I have found a solution to the issues here. The warnings result from lines
254, 260, and 274 of the MFAssign() function and lines 203, 209, and 223 of
the MFAssignCHO() function. The original codes are:
peaksend <- dplyr::filter(Test, CH2_num !=0 & CH2_num !=
(min(CH2_num[CH2_num!=min(CH2_num)])+1)& CH2_num !=
(min(CH2_num[CH2_num!=min(CH2_num)])+3)) or
peaks <- dplyr::filter(Test, CH2_num ==0 | CH2_num ==
(min(CH2_num[CH2_num!=min(CH2_num)])+1) | CH2_num ==
(min(CH2_num[CH2_num!=min(CH2_num)])+3))
In my data, when masses are grouped by their Kendrick mass defects, many
groups contain only one mass. When these groups are present,
CH2_num[CH2_num!=min(CH2_num)] gives an empty vector, which can't be
handled by the base min() function. This causes the code to spit the above
error messages and significantly slows down the run. My understanding of
your code here is that you wanted to select the ones that have CH2 numbers
of 0, or 1 more than the second smallest CH2 number, or 3 more than the
second smallest CH2 number, all within a Kendrick mass series. To fix this
issue, I changed the code to following:
peaksend <- dplyr::filter(Test, CH2_num !=0 & CH2_num !=
(hablar::min_(CH2_num[CH2_num!=min(CH2_num)]) + 1) & CH2_num !=
(hablar::min_(CH2_num[CH2_num!=min(CH2_num)]) + 3)) and
peaks <- dplyr::filter(Test, CH2_num == 0 | CH2_num ==
(hablar::min_(CH2_num[CH2_num!=min(CH2_num)]) + 1) | CH2_num ==
(hablar::min_(CH2_num[CH2_num!=min(CH2_num)]) + 3))
I'm using the min_() function in the *hablar* package. This way, min_()
will simply return NA if a vector is empty and solves the problem. It
significantly speeds up my code, and now I can get assignments in less than
a minute even for over 5000 masses.
I also noticed other issues during debugging:
- The first is that peaksend doesn't have an obvious use in these
functions. It is not referenced again later in the code after you define
it. Do you have something in mind for the use of this data?
- A very small number of the masses produce CH2 numbers that are not
integers, and they tend be the higher MW ones. Maybe this comes from a
rounding error?
Let me know what you think.
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF72HFHIPCFYX3TI73O67FDW4PRSTANCNFSM6AAAAAAVO64PRE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi Simeon, Sounds good. Looking forward to your code, and I'm happy to test it too. |
Hi,
Thanks for building this beautiful package, and it has been extremely helpful for my research.
I encountered the following error messages while using MFAssignCHO() and MFAssign():
I can still get assigned formulas despite these warnings. Somehow it is slower to assign formulas when these errors are present. It takes close to 10 minutes when using MFAssignCHO() on around 4000–5000 masses, but faster when using MFAssign() for the final assignment.
The text was updated successfully, but these errors were encountered: