Hurdle Model on DIA-NN Data #65

abadgerw · 2024-10-01T10:52:26Z

@lievenclement @lgatto @ococrook Thank you for a great tool! I am looking to fit a hurdle model on data I have run through DIANN. In order to do this, should I be utilizing the protein intensity values and the number of precursors mapped to that protein as inputs to the model?

In addition, for my learning, what is the difference between the hurdle model approach and the approaches used by DEqMS and proDA that also seem to model peptide counts/missingness?

lgatto · 2024-10-03T19:09:51Z

I'll only answer the part I'm more familiar with. The hurdle model tests specifically for differential detection, and in the absence thereof, differential abundance. proDA uses a dropout model that models the probability to missing a feature based on its abundance, and then uses this to test for differential abundance (even when no abundances have been measured in one condition). I don't remember what DEqMS does, and haven't used it.

abadgerw · 2024-10-04T00:36:36Z

Thanks, @lgatto! This is helpful. I'll await your colleagues guidance regarding application of msqrob2 to DIA-NN outputs. Looking forward to trying it out.

abadgerw · 2024-10-10T20:29:11Z

@ococrook I just wanted to circle back and see if you had some insight into this query with regards to DIA-NN outputs as I'd love to be able to use your fantastic tool?

abadgerw · 2024-10-21T09:57:13Z

@ococrook Hope you are doing well. I wanted to check in and see if you had a chance to review my query so that I may utilize your wonderful tool?

ococrook · 2024-10-21T12:13:24Z

Hi!

Sorry for delayed response, yes I would think that's a sensible model input. I would ask @lievenclement to clarify though as I didn't develop the tool

abadgerw · 2024-10-22T10:40:20Z

Thanks, @ococrook!

@lievenclement any feedback/thoughts on the use of DIA-NN data as inputs for your wonderful tool?

cvanderaa · 2024-10-29T13:52:04Z

Hi @abadgerw,

I'm answering on behalf of Lieven. Many apologies for our late reply. We are happy to read your interest in using msqorb2.

As mentioned by Laurent, the hurdle approach will compute 2 models: one model for differential abundance (using observed intensity data) and one model for differential detection (using feature count data).

In order to use this package, you need to first process your data with QFeatures. The exact data processing workflow will depend on your research question and experimental setup, but it's important that your data is log-transformed, normalized, and that the last step is the the aggregation to proteins (using aggregateFeatures()). Your data cannot be imputed, otherwise you can't perform differential detection.

msqrob2 will compute both models using the msqrobHurdle() function. The i argument should point to your aggregated set and the function will automatically use the intensity data for the first model and the count data (internally stored by aggregateFeatures()) for the second model. For more flexibility, you can run the intensity-based model using msqrob() and the count-based model using msqrobGlm() model.

A sensible workflow for DIA-NN data would be to start with the precursor-level data and to aggregate to proteins. However, we still need to investigate how to specify the count-based model for DIA-NN data, for instance we should take into account the differences in protein detection rates across samples, but how to compute these rates is sill unclear to us. Hence, I would consider msqrob's hurdle model on DIA-NN data as experimental. Our schedules are cramped until end of the year, but we plan to work on this by the start of 2025.

I hope this can help.

abadgerw changed the title ~~Hurdle Model~~ Hurdle Model on DIA-NN Data Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hurdle Model on DIA-NN Data #65

Hurdle Model on DIA-NN Data #65

abadgerw commented Oct 1, 2024 •

edited

Loading

lgatto commented Oct 3, 2024

abadgerw commented Oct 4, 2024

abadgerw commented Oct 10, 2024

abadgerw commented Oct 21, 2024

ococrook commented Oct 21, 2024

abadgerw commented Oct 22, 2024

cvanderaa commented Oct 29, 2024

Hurdle Model on DIA-NN Data #65

Hurdle Model on DIA-NN Data #65

Comments

abadgerw commented Oct 1, 2024 • edited Loading

lgatto commented Oct 3, 2024

abadgerw commented Oct 4, 2024

abadgerw commented Oct 10, 2024

abadgerw commented Oct 21, 2024

ococrook commented Oct 21, 2024

abadgerw commented Oct 22, 2024

cvanderaa commented Oct 29, 2024

abadgerw commented Oct 1, 2024 •

edited

Loading