Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GWAS_parsing.py - AttributeError: 'DataFrame' object has no attribute 'sample_size' #182

Open
kateelliott opened this issue Sep 20, 2023 · 0 comments

Comments

@kateelliott
Copy link

Hi,

I am currently trying to harmonise my data as instructed in the tutorial. Am am following the steps one by one, running the test data and then my data. I'm having trouble with GWAS_parsing.py. I'm getting an error saying that there is no sample_size attribute in the data frame, but when I compare my input to the test data, that doesn't have a sample_size column either, but seems to run. One major difference is that my data is already on B38, so I'm not carrying out a liftover - is this making a difference? Please see all of the commands I have run and the error messages below.

Any help to resolve this, very gratefully received!

Kate

(base) [vwy332@rescomp1 TWAS]$ python summary-gwas-imputation-master/src/gwas_parsing.py \

-gwas_file data/gwas/cad.add.160614.website.txt.gz
-liftover data/liftover/hg19ToHg38.over.chain.gz
-snp_reference_metadata data/reference_panel_1000G/variant_metadata.txt.gz METADATA
-output_column_map markername variant_id
-output_column_map noneffect_allele non_effect_allele
-output_column_map effect_allele effect_allele
-output_column_map beta effect_size
-output_column_map p_dgc pvalue
-output_column_map chr chromosome
--chromosome_format
-output_column_map bp_hg19 position
-output_column_map effect_allele_freq frequency
--insert_value sample_size 184305 --insert_value n_cases 60801
-output_order variant_id panel_variant_id chromosome position effect_allele non_effect_allele frequency pvalue zscore effect_size standard_error sample_size n_cases
-output output/harmonized_gwas/CARDIoGRAM_C4D_CAD_ADDITIVE.txt.gz
INFO - Parsing input GWAS
INFO - loaded 9455778 variants
INFO - Performing liftover
INFO - 9455778 variants after liftover
INFO - Creating index to attach reference ids
INFO - Acquiring reference metadata
INFO - alligning alleles
INFO - 7919441 variants after restricting to reference variants
INFO - Ensuring variant uniqueness
INFO - 7919439 variants after ensuring uniqueness
INFO - Checking for missing frequency entries
INFO - Saving...
INFO - Finished converting GWAS in 577.7747920179972 seconds

(base) [vwy332@rescomp1 TWAS]$ zcat data/gwas/cad.add.160614.website.txt.gz | head
markername chr bp_hg19 effect_allele noneffect_allele effect_allele_freq median_info model beta se_dgc p_dgc het_pvalue n_studies
rs143225517 1 751756 C T .158264 .92 FIXED .013006 .017324 .4528019 .303481 35
rs3094315 1 752566 A G .763018 1 FIXED -.005243 .0157652 .7394597 .146867 36
rs3131972 1 752721 G A .740969 .96034 FIXED -.003032 .0156381 .8462652 .340843 36
rs3131971 1 752894 C T .744287 .793 FIXED .00464 .0162377 .7750657 .035821 37
rs61770173 1 753405 A C .775368 .91694 FIXED -.006291 .016708 .7065265 .377485 36
rs2073814 1 753474 G C .716742 .8848 FIXED .000407 .0157456 .9793782 .472543 35
rs2073813 1 753541 A G .194804 .91034 FIXED .005802 .0167808 .7295293 .36886 35
rs3131969 1 754182 G A .760434 .92385 FIXED -.006522 .0165571 .693647 .321205 34
rs3131968 1 754192 G A .759886 .920915 FIXED -.006791 .0165668 .6818679 .306346 34

(base) [vwy332@rescomp1 TWAS]$ python summary-gwas-imputation-master/src/gwas_parsing.py \

-gwas_file ../twas_inputs/PSO_meta_UKB_Stuart_meta_FinnGen_twas_input.txt.gz
-snp_reference_metadata data/reference_panel_1000G/variant_metadata.txt.gz METADATA
-output_column_map variant_id variant_id
-output_column_map non_effect_allele non_effect_allele
-output_column_map effect_allele effect_allele
-output_column_map effect_size effect_size
-output_column_map pvalue pvalue
-output_column_map chromosome chromosome
--chromosome_format
-output_column_map position position
-output_column_map frequency frequency
--insert_value sample_size 550000 --insert_value n_cases 45000
-output_order variant_id chromosome position effect_allele non_effect_allele effect_size pvalue
-output ../twas_inputs/PSO_meta_UKB_Stuart_meta_FinnGen_harmonized.txt.gz
INFO - Parsing input GWAS
WARNING - Encountered GWAS pvalues equal to zero. This might be caused by numerical resolution. Please consider using another scheme such as -beta- and -se- columns, or checking your input gwas for zeros.
WARNING - Applying thresholding to divergent zscores. You can disable this behavior by using '--input_pvalue_fix 0' in the command line
WARNING - Using 38.467406 to fill in divergent zscores
INFO - loaded 7398369 variants
INFO - Creating index to attach reference ids
INFO - Acquiring reference metadata
INFO - alligning alleles
INFO - 82891 variants after restricting to reference variants
INFO - Ensuring variant uniqueness
INFO - 82891 variants after ensuring uniqueness
INFO - Checking for missing frequency entries
Traceback (most recent call last):
File "/gpfs3/well/jknight-kate/SNP/Janssen_GWAS/PSO/TWAS/summary-gwas-imputation-master/src/gwas_parsing.py", line 311, in
run(args)
File "/gpfs3/well/jknight-kate/SNP/Janssen_GWAS/PSO/TWAS/summary-gwas-imputation-master/src/gwas_parsing.py", line 283, in run
d = clean_up(d)
File "/gpfs3/well/jknight-kate/SNP/Janssen_GWAS/PSO/TWAS/summary-gwas-imputation-master/src/gwas_parsing.py", line 243, in clean_up
d = d.assign(sample_size=[int(x) if not math.isnan(x) else "NA" for x in d.sample_size])
File "/apps/eb/2020b/skylake/software/Anaconda3/2022.05/lib/python3.9/site-packages/pandas/core/generic.py", line 5575, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'sample_size'

(base) [vwy332@rescomp1 TWAS]$ python summary-gwas-imputation-master/src/gwas_parsing.py \

-gwas_file ../twas_inputs/PSO_meta_UKB_Stuart_meta_FinnGen_twas_input.txt.gz
-snp_reference_metadata data/reference_panel_1000G/variant_metadata.txt.gz METADATA
-output_column_map variant_id variant_id
-output_column_map non_effect_allele non_effect_allele
-output_column_map effect_allele effect_allele
-output_column_map effect_size effect_size
-output_column_map pvalue pvalue
-output_column_map chromosome chromosome
--chromosome_format
-output_column_map position position
-output_column_map frequency frequency
-output_order variant_id chromosome position effect_allele non_effect_allele effect_size pvalue
-output ../twas_inputs/PSO_meta_UKB_Stuart_meta_FinnGen_harmonized.txt.gz
INFO - Parsing input GWAS
WARNING - Encountered GWAS pvalues equal to zero. This might be caused by numerical resolution. Please consider using another scheme such as -beta- and -se- columns, or checking your input gwas for zeros.
WARNING - Applying thresholding to divergent zscores. You can disable this behavior by using '--input_pvalue_fix 0' in the command line
WARNING - Using 38.467406 to fill in divergent zscores
INFO - loaded 7398369 variants
INFO - Creating index to attach reference ids
INFO - Acquiring reference metadata
INFO - alligning alleles
INFO - 82891 variants after restricting to reference variants
INFO - Ensuring variant uniqueness
INFO - 82891 variants after ensuring uniqueness
INFO - Checking for missing frequency entries
Traceback (most recent call last):
File "/gpfs3/well/jknight-kate/SNP/Janssen_GWAS/PSO/TWAS/summary-gwas-imputation-master/src/gwas_parsing.py", line 311, in
run(args)
File "/gpfs3/well/jknight-kate/SNP/Janssen_GWAS/PSO/TWAS/summary-gwas-imputation-master/src/gwas_parsing.py", line 283, in run
d = clean_up(d)
File "/gpfs3/well/jknight-kate/SNP/Janssen_GWAS/PSO/TWAS/summary-gwas-imputation-master/src/gwas_parsing.py", line 243, in clean_up
d = d.assign(sample_size=[int(x) if not math.isnan(x) else "NA" for x in d.sample_size])
File "/apps/eb/2020b/skylake/software/Anaconda3/2022.05/lib/python3.9/site-packages/pandas/core/generic.py", line 5575, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'sample_size'

(base) [vwy332@rescomp1 TWAS]$ zcat ../twas_inputs/PSO_meta_UKB_Stuart_meta_FinnGen_twas_input.txt.gz | head
variant_id chromosome position effect_allele non_effect_allele effect_size pvalue
rs4951859 1 729679 C G -0.00982308894876 0.464321
rs79010578 1 736289 T A 0.00622160561126 0.656529
rs28527770 1 751756 T C 0.00338127703432 0.805308
rs3094315 1 752566 G A -0.00428416393905 0.740094
rs3131972 1 752721 A G -0.0076642958235 0.551712
rs2073813 1 753541 G A 0.00709278661322 0.595115
rs3131969 1 754182 A G -0.00701957970291 0.597731
rs3131968 1 754192 A G -0.0071887774542 0.588993
rs3131967 1 754334 T C -0.00723813219138 0.586371
(base) [vwy332@rescomp1 TWAS]$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant