-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add weights in pseudbulkDGE for edgeR analysis #107
base: master
Are you sure you want to change the base?
Conversation
In principle, the ability to include weights would be useful. However, there is a minor and major problem. The minor problem is that The major problem is that I don't see how these weights can be easily computed by your average user without asking them to go through the edgeR pipeline for each comparison. I assume that you're computing weights using knowledge of the design matrix, plus some normalization and filtering information, etc. If that's the case, you probably were already halfway through some kind of edgeR analysis, so why bother using How are you deriving your weights? Unlike limma, we don't have many weighting functions in edgeR, and the interpretation of the weights isn't so clear-cut either... something to do with scaling the QL dispersion IIRC. But if you do have a function, a more ergonomic approach would be to allow users to pass a weighting function that runs on |
Thank you for looking into this; I missed the subsetting and have adapted the pull request. On the major problem, I think it depends. Personally, I am only interested in sample-level weights that do not depend on the gene. These can be estimated for all cell types in one go on the aggregated SCE, and subsequently provided to Regarding the interpretation, how much of an issue do you think it is? You'll know better, but it seems like it may be more clear-cut when not using the QL framework and so, if weights are used, the I understand that the suggestion may be niche, and if you think it does not fit within |
The proposed changes to the code seem fine enough... provided the weights are sensible. And I do worry about whether you have sensible weights to pass in. I was staring at edgeR's C++ code to jog my memory, and I think that the weights are being applied to the mean-variance function for each observation. So: if you were to say that an observation has a user-supplied weight of 2, you're claiming it's half as variable as it should be for its mean. This has the (superficially desired) effect of increasing its influence on the coefficient estimates, but it's also a very specific meaning, and I don't think you can just throw in any value and expect type I error control to be maintained. On that note: IIRC, as soon as you have a non-unity weight, you're in already halfway into QL territory, as the QL methods also involve scaling the variance function. Specifically, you can't just divide the variance by some arbitrary user-supplied weight and still consider the observations to follow a negative binomial distribution - these non-normal models don't scale up and down like that, and that's why we have QL. So the choice between QL/non-QL shouldn't be the issue here. Having said all of that, I barely remember any of this stuff, and my grasp of it was fairly tenuous to begin with, so it might just all be fine. Perhaps @gksmyth or @yunshun could provide their thoughts. I remember some discussion about weighting a few years ago, but I can't remember what the conclusion was. |
Please be careful. Weights used for edgeR glm fitting and weights input to NB dispersion estimation are not the same thing. For glms and the QL pipeline, weights have a clear meaning in terms of the quasi-dispersions. For non-QL functions like estimateDisp(), weights only have a meaning in terms of interpreting observations as averages, which I suspect will not have the flexibility that @koenvandenberge is after. Inverse probability weights are designed to correct biases in the linear model estimates and are not related to variances or dispersions. As far as I can see, inverse probability weights would not work correctly with dispersion estimation functions or with differential expression tests. |
Hi,
In some cases, one might want to add weights to the dispersion estimation and fitting of the edgeR model when using the
pseudobulkDGE
function. In my case this would be sample-level weights (much like voom with sample quality weights), but I imagine that observation-level weights may also be of interest, so I have allowed a matrix of weights that is of same dimension as the expression matrix.This seems to be possible using only a minor adjustment.
Let me know what you think.