Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hurdle Model on DIA-NN Data #65

Open
abadgerw opened this issue Oct 1, 2024 · 7 comments
Open

Hurdle Model on DIA-NN Data #65

abadgerw opened this issue Oct 1, 2024 · 7 comments

Comments

@abadgerw
Copy link

abadgerw commented Oct 1, 2024

@lievenclement @lgatto @ococrook Thank you for a great tool! I am looking to fit a hurdle model on data I have run through DIANN. In order to do this, should I be utilizing the protein intensity values and the number of precursors mapped to that protein as inputs to the model?

In addition, for my learning, what is the difference between the hurdle model approach and the approaches used by DEqMS and proDA that also seem to model peptide counts/missingness?

@abadgerw abadgerw changed the title Hurdle Model Hurdle Model on DIA-NN Data Oct 3, 2024
@lgatto
Copy link
Collaborator

lgatto commented Oct 3, 2024

I'll only answer the part I'm more familiar with. The hurdle model tests specifically for differential detection, and in the absence thereof, differential abundance. proDA uses a dropout model that models the probability to missing a feature based on its abundance, and then uses this to test for differential abundance (even when no abundances have been measured in one condition). I don't remember what DEqMS does, and haven't used it.

@abadgerw
Copy link
Author

abadgerw commented Oct 4, 2024

Thanks, @lgatto! This is helpful. I'll await your colleagues guidance regarding application of msqrob2 to DIA-NN outputs. Looking forward to trying it out.

@abadgerw
Copy link
Author

@ococrook I just wanted to circle back and see if you had some insight into this query with regards to DIA-NN outputs as I'd love to be able to use your fantastic tool?

@abadgerw
Copy link
Author

@ococrook Hope you are doing well. I wanted to check in and see if you had a chance to review my query so that I may utilize your wonderful tool?

@ococrook
Copy link
Collaborator

Hi!

Sorry for delayed response, yes I would think that's a sensible model input. I would ask @lievenclement to clarify though as I didn't develop the tool

@abadgerw
Copy link
Author

Thanks, @ococrook!

@lievenclement any feedback/thoughts on the use of DIA-NN data as inputs for your wonderful tool?

@cvanderaa
Copy link
Collaborator

Hi @abadgerw,

I'm answering on behalf of Lieven. Many apologies for our late reply. We are happy to read your interest in using msqorb2.

As mentioned by Laurent, the hurdle approach will compute 2 models: one model for differential abundance (using observed intensity data) and one model for differential detection (using feature count data).

In order to use this package, you need to first process your data with QFeatures. The exact data processing workflow will depend on your research question and experimental setup, but it's important that your data is log-transformed, normalized, and that the last step is the the aggregation to proteins (using aggregateFeatures()). Your data cannot be imputed, otherwise you can't perform differential detection.

msqrob2 will compute both models using the msqrobHurdle() function. The i argument should point to your aggregated set and the function will automatically use the intensity data for the first model and the count data (internally stored by aggregateFeatures()) for the second model. For more flexibility, you can run the intensity-based model using msqrob() and the count-based model using msqrobGlm() model.

A sensible workflow for DIA-NN data would be to start with the precursor-level data and to aggregate to proteins. However, we still need to investigate how to specify the count-based model for DIA-NN data, for instance we should take into account the differences in protein detection rates across samples, but how to compute these rates is sill unclear to us. Hence, I would consider msqrob's hurdle model on DIA-NN data as experimental. Our schedules are cramped until end of the year, but we plan to work on this by the start of 2025.

I hope this can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants