Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nan in the resulting files #3

Open
zqsha opened this issue Apr 6, 2020 · 22 comments
Open

Nan in the resulting files #3

zqsha opened this issue Apr 6, 2020 · 22 comments

Comments

@zqsha
Copy link

zqsha commented Apr 6, 2020

Hi there,

Thanks for developing such an amazing toolbox.
When I run the hlmm with simply linear model, rather than linear mixed model, I found there were lots of nan in the resulting file, like this:
SNP n frequency likelihood add add_se add_t add_pval var var_se var_t var_pval av_pval
rs62224609 32057 0.09890195589106904 nan nan nan nan nan nan 0.018897 nan nan nan
rs376238049 28364 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs62224614 31900 0.09943573667711599 nan nan nan nan nan nan 0.018898 nan nan nan
rs7286962 31742 0.0996314031882049 nan nan nan nan nan nan 0.018928 nan nan nan
rs62224618 32256 0.10105096726190477 nan nan nan nan nan nan 0.018657 nan nan nan
rs372511672 27602 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs2844853 27472 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs370772954 30458 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs3949130 31617 0.07600341588385995 nan nan nan nan nan nan 0.021283 nan nan nan
rs200167968 31639 0.07527102626505262 nan nan nan nan nan nan 0.021369 nan nan nan
rs79847867 32256 0.07783048115079365 nan nan nan nan nan nan 0.020848 nan nan nan
rs200058026 31578 0.06846538729495218 nan nan nan nan nan nan 0.022361 nan nan nan
rs200867508 28405 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs199576657 28348 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs202050260 28001 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs7284947 29177 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs131517 15518 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs131522 17386 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs131523 31749 0.30407256921477843 nan nan nan nan nan nan 0.012176 nan nan nan
rs185518626 30875 0.01259919028340081 nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs131525 31783 0.30829374193751374 nan nan nan nan nan nan 0.012122 nan nan nan
rs131526 31845 0.04813942534149784 nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs131527 31854 0.05019777735920139 nan nan nan nan nan nan 0.025782 nan nan nan

Any suggestions would be appreciated! Thanks.

Best,
Zhiqiang

@AlexTISYoung
Copy link
Owner

AlexTISYoung commented Apr 6, 2020 via email

@AlexTISYoung
Copy link
Owner

AlexTISYoung commented Apr 6, 2020 via email

@zqsha
Copy link
Author

zqsha commented Apr 6, 2020

Hi Alex,

Thanks for your reply. OK. I see. I just focus on the linear model, rather than the linear mixed model, as the manual said it's really slow for this model. I tried to set MAF with --min_maf 0.01. I found it still did not work. I just first tried the first 100 SNPs as an example. Following is the log:
zhisha@lux13:~> python /data/clusterfs/lag/users/zhisha/asy_mean_variance/hlmm/hlmm-master/bin/hlmm_chr.py /data/clusterfs/lag/users/zhisha/snp_heritability/genetic_data/ukb_imp_chr22_v4_imagingT1_N32256.bed 0 100 /data/clusterfs/lag/users/zhisha/asy_mean_variance/hlmm/pheno1/asy_resid_pheno1.phen /data/clusterfs/lag/users/zhisha/asy_mean_variance/hlmm/pheno1/hlmm_p1_chr22 --min_maf 0.01
Number of non-missing phenotype observations: 32256
3 parameters in model
/home/zhisha/.local/lib/python2.7/site-packages/pysnptools/snpreader/bed.py:45: FutureWarning: 'count_A1' was not set. For now it will default to 'False', but in the future it will default to 'True'
warnings.warn("'count_A1' was not set. For now it will default to 'False', but in the future it will default to 'True'", FutureWarning)
Number of test loci: 100
Genotypes for 32256 individuals read
32256 individuals in genotype file with no missing phenotype or covariate observations
Fitting Null Model
/home/zhisha/.local/lib/python2.7/site-packages/hlmm/hetlm.py:88: RuntimeWarning: overflow encountered in exp
D_inv = np.exp(-self.V.dot(beta))
/home/zhisha/.local/lib/python2.7/site-packages/hlmm/hetlm.py:68: RuntimeWarning: overflow encountered in exp
L = np.sum(Vbeta) + np.sum(np.square(resid) * np.exp(-Vbeta))
/home/zhisha/.local/lib/python2.7/site-packages/hlmm/hetlm.py:111: RuntimeWarning: overflow encountered in exp
D_inv = np.exp(-self.V.dot(beta))
/home/zhisha/.local/lib/python2.7/site-packages/hlmm/hetlm.py:159: RuntimeWarning: overflow encountered in exp
D_inv=np.exp(-self.V.dot(beta))
Fitting models for specified loci
Fitting AV model for locus 0
Fitting AV model for locus 2
Fitting AV model for locus 3
Fitting AV model for locus 4
Fitting AV model for locus 8
Fitting AV model for locus 9
Fitting AV model for locus 10
............
No error!!

The results are as follows:
SNP n frequency likelihood add add_se add_t add_pval var var_se var_t var_pval av_pval
rs62224609 32057 0.09890195589106904 nan nan nan nan nan nan 0.018897 nan nan nan
rs376238049 28364 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs62224614 31900 0.09943573667711599 nan nan nan nan nan nan 0.018898 nan nan nan
rs7286962 31742 0.0996314031882049 nan nan nan nan nan nan 0.018928 nan nan nan
rs62224618 32256 0.10105096726190477 nan nan nan nan nan nan 0.018657 nan nan nan
rs372511672 27602 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs2844853 27472 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs370772954 30458 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs3949130 31617 0.07600341588385995 nan nan nan nan nan nan 0.021283 nan nan nan
rs200167968 31639 0.07527102626505262 nan nan nan nan nan nan 0.021369 nan nan nan
rs79847867 32256 0.07783048115079365 nan nan nan nan nan nan 0.020848 nan nan nan
rs200058026 31578 0.06846538729495218 nan nan nan nan nan nan 0.022361 nan nan nan
rs200867508 28405 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs199576657 28348 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs202050260 28001 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs7284947 29177 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs131517 15518 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs131522 17386 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs131523 31749 0.30407256921477843 nan nan nan nan nan nan 0.012176 nan nan nan
rs185518626 30875 0.01259919028340081 nan nan nan nan nan nan 0.051084 nan nan nan
rs131525 31783 0.30829374193751374 nan nan nan nan nan nan 0.012122 nan nan nan
rs131526 31845 0.04813942534149784 nan nan nan nan nan nan 0.026285 nan nan nan
rs131527 31854 0.05019777735920139 nan nan nan nan nan nan 0.025782 nan nan nan
rs62223292 19003 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs62223293 19034 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs131528 32219 0.31130699276824236 nan nan nan nan nan nan 0.012009 nan nan nan
rs131529 32228 0.31145277398535437 nan nan nan nan nan nan 0.012005 nan nan nan
rs131530 32224 0.3114138530287984 nan nan nan nan nan nan 0.012006 nan nan nan

Any suggestions?
Best,
Zhiqiang

@zqsha
Copy link
Author

zqsha commented Apr 7, 2020

Hi Alex,
Any thoughts about the nan values which was above-mentioned? Looking forward to your reply.

Best,
Zhiqiang

@AlexTISYoung
Copy link
Owner

AlexTISYoung commented Apr 7, 2020 via email

@zqsha
Copy link
Author

zqsha commented Apr 7, 2020

Hi Alex,
I just just use cognitive function as an example. Does it have to do with phenotype? Thanks.

@AlexTISYoung
Copy link
Owner

AlexTISYoung commented Apr 8, 2020 via email

@zqsha
Copy link
Author

zqsha commented Apr 8, 2020

I just randomly picked an item, fluid intelligence. So you mean maybe some traits could fit well with HLMM, but not all of the traits fit well? OK, I will try to use BMI or height.

@AlexTISYoung
Copy link
Owner

AlexTISYoung commented Apr 8, 2020 via email

@zqsha
Copy link
Author

zqsha commented Apr 8, 2020

OK. Agreed! I will try to use BMI as an example. If it happens again, I will let you know.

@trochet
Copy link

trochet commented Apr 8, 2020

Hi Alex and Zhiqiang,

I was also finding that about a third of the SNPs (230,150 out of 670,131) I tested were NaN. My phenotype was a quantitative trait simulated using GCTA with 10 causal loci, of which four had opposite effects in men and women. I wasn't using UK Biobank (my cohort was French-Canadians) and my simulations are being conducted entirely on chromosome 1. There were no covariates (simulated or included in the model).

I hadn't done any SNP QC prior to running my simulations. What I found when I went back and removed SNPs with low minor allele frequency was that most but not all of the problem was solved. Removing SNPs with less than a 99% genotyping rate took care of the remaining NaNs. I haven't played around to see if a lower threshold will work just as well. Maybe Alex can tell us if there's a threshold imposed by the script?

-Holly

@zqsha
Copy link
Author

zqsha commented Apr 8, 2020

Hi Holly,

Thank you so much for your suggestions. Alex said I could set the MAF using --min_maf. I tried but still found these NaN and seem to get the same results. Based on the previously showed results,
SNP n frequency likelihood add add_se add_t add_pval var var_se var_t var_pval av_pval
rs62224609 32057 0.09890195589106904 nan nan nan nan nan nan 0.018897 nan nan nan
rs376238049 28364 nan nan NaN NaN NaN NaN NaN NaN NaN NaN nan
rs62224614 31900 0.09943573667711599 nan nan nan nan nan nan 0.018898 nan nan nan
rs7286962 31742 0.0996314031882049 nan nan nan nan nan nan 0.018928 nan nan nan
rs62224618 32256 0.10105096726190477 nan nan nan nan nan nan 0.018657 nan nan nan

We can see a lot of SNPs with frequency >5% still have NaN. Still confused. Anyway, thanks for sharing your advice.

Best,
Zhiqiang

@AlexTISYoung
Copy link
Owner

AlexTISYoung commented Apr 8, 2020 via email

@zqsha
Copy link
Author

zqsha commented Apr 8, 2020

Hi Alex,

Yeah, I agreed with you about the maf. But based on the above results, we can see lots of maf >5% were still ignored. My trait values are like this,
-0.00571
-0.00842
0.0245
0.027
0.00364
....
If the number is too small, I think I could multiply 100 and try again.

Best,
Zhiqiang

@trochet
Copy link

trochet commented Apr 8, 2020

Zhiquiang,

You've responded to the part about MAF, but you seemed to have missed the discussion of SNP call rates. Have you tried filtering based on that?

-Holly

@zqsha
Copy link
Author

zqsha commented Apr 8, 2020

Hi Holly,

Thanks for your reminder. My sample size is 32256. Based on the results, we can calculte the missing rate with n. So, we can see these kinds of SNPs have lower missing rate (lower than 5%), they still have nan. Little confused.

SNP n frequency likelihood add add_se add_t add_pval var var_se var_t var_pval av_pval
rs62224609 32057 0.09890195589106904 nan nan nan nan nan nan 0.018897 nan nan nan
rs62224614 31900 0.09943573667711599 nan nan nan nan nan nan 0.018898 nan nan nan
rs7286962 31742 0.0996314031882049 nan nan nan nan nan nan 0.018928 nan nan nan
rs62224618 32256 0.10105096726190477 nan nan nan nan nan nan 0.018657 nan nan nan

-Zhiqiang

@ZXiaopu
Copy link

ZXiaopu commented Mar 17, 2022

Hello,

I met the same problem. I'm working on simulated data. After comparing my phenotype data with the example file, I found mine is 10-fold smaller than the example phenotype. i.e. mine is between 1e-3 ~ 1e-2. Once I multiple 10 my phenotype data, NaN disappeared.

Could anyone give any suggestions for this situation please?

Thanks,
Xiaopu

@AlexTISYoung
Copy link
Owner

AlexTISYoung commented Mar 17, 2022 via email

@ZXiaopu
Copy link

ZXiaopu commented Mar 18, 2022

Hi Xiaopu, What is the variance of the phenotype file you are inputting? Thanks, Alex.

On Thu, 17 Mar 2022 at 09:19, Xiaopu Zhang @.> wrote: Hello, I met the same problem. I'm working on simulated data. After comparing my phenotype data with the example file, I found mine is 10-fold smaller than the example phenotype. i.e. mine is between 1e-3 ~ 1e-2. Once I multiple 10 my phenotype data, NaN disappeared. Could anyone give any suggestions for this situation please? Thanks, Xiaopu — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQQS6MY2PEJU7FFIUPRUPTVANLSZANCNFSM4MCIHEYQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.>

Hi Alex,

Thanks for your quick reply.

I'm using residuals of adjusted DNA methylation data so it contains both positive and negative values but they are all small.

Thanks,
Xiaopu

@AlexTISYoung
Copy link
Owner

AlexTISYoung commented Mar 18, 2022 via email

@ZXiaopu
Copy link

ZXiaopu commented Oct 11, 2022 via email

@AlexTISYoung
Copy link
Owner

AlexTISYoung commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants