-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input-Output Dimension #2
Comments
Sorry for delayed reply, The Hi-C data were downloaded from GSE63525, and only .tar.gz files were available when we downloaded these data. I checked the raw data from GSE63525 (e.g. GSE63525_GM12878_primary_intrachromosomal_contact_matrices.tar.gz). The largest index for binned coordinates in the three-column-tab file (chr12_10kb.RAWobserved) is 133840000. But there are 13398 values in the bias file (chr12_10kb.KRnorm/SQRTVCnorm/VCnorm). The processed matrix is expanded to 13398 to match the bias file. But values in bias file are NaNs when row index is larger than 13384, so corresponding values in Hi-C matrix are all zeros. |
Yes, I realized that the dimensions are taken from the KRnorm vector. So, I guess the final rows and columns are the "extra/trimmable" Nan values, would you agree with that? Thanks again for the reply :) |
Yes, I agree with you. According to the description in the README file (GSE63525_GM12878_primary_README.rtf)
We can see that the genome locations are converted to line numbers in the bias vector without shift at the beginning. So I think the final rows and columns could be omitted. |
Hello,
I tried to predict high resolution matrix using the processed file that you share.
My assumptions were:
I placed the above file into the specified directory. Then choosing the 1/16 model parameters, I ran the following code.
python data_predict.py -lr 40kb -ckpt save/deephic_raw_16.pth -c GM12878
The chromosome I am interested in is 12. The size of this chromosome is 133851895 bases; so when it is binned at 10kb, one should have 13,386 bins. However, the predicted chromosome 12 matrix has dimensions of 13,398 x 13,398. When I checked the input file, I've seen that 'sizes' key in the dictionary holds this same value of 13398 for chromosome 12. That discrepancy occurs in other chromosomes too.
So the question is:
How are these shapes/sizes are calculated?
Thanks in advance!
The text was updated successfully, but these errors were encountered: