Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimkaMin output file symmetry #16

Open
hjruscheweyh opened this issue Aug 30, 2020 · 2 comments
Open

SimkaMin output file symmetry #16

hjruscheweyh opened this issue Aug 30, 2020 · 2 comments

Comments

@hjruscheweyh
Copy link

Dear SimkaMin Dev,

I recently stumbled upon your Simkamin tool and tried to use it to compare my 4000 datasets against each other to get information on the similarity of these samples.

I found something odd in the output matrices. They don’t seem to be symmetric. Where the upper triangular contains mostly values between 0.0 and 1, the lower triangular matrix contains mostly but not exclusively zeros. I would like understand if the lower triangular matrix would be empty but a non-symmetric output is strange.

In fact it seems that there is always a subpart that is symmetric but its mostly not.

I attached a screenshot of parts of the matrix.

Do you know what to do with this information? Should I only use the column-based distances?

Screenshot 2020-07-30 at 20 07 05

Best and thanks,
Hans

@hjruscheweyh
Copy link
Author

Adding @qclayssen as he will evaluate the matrix

@clemaitre
Copy link
Collaborator

Dear Hans,

Thank you for pointing out this behavior.

Since the last release, SimkaMin is supposed to output fully symmetrical matrices, with the same values in the upper and lower triangular parts of the matrix. So this is clearly a bug.
After investigations, it happens when more than 100 datasets are compared. In fact, the distances are computed by "blocks" of 100 datasets, so this is the merging of the different parts of the full matrix that is at issue. During the merging, blocks of 100x100 zeros are put in the lower triangular part (instead of copying the values from the corresponding upper triangular part).

We will try to fix this as soon as possible.
But, in the meantime, you can safely use the values of the upper triangular part of the matrix which are correct.

Please let me know, if this is not clear enough.

Best,
Claire

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants