Dev #61

lizhencmb · 2021-04-17T13:13:53Z

Hi Arthur,

I am going through the wgd code as a way to learn a bit more of python. It is fun actually :-) I have made some changes to let V2 produce similar output files as V1. I've seen that you've put weighting stuff in the visualization part, but I think it would still make sense to include them in the ksd Ks table, so that others can draw the distributions by their own (if they want to).

You can see I've also added a function to strip the alignment with a parameter to leave some gaps. I was thinking that codeml can deal with some gaps in its pairwise mode with cleandata=0. However, after some tests, it seems not really the case, so the function is currently only used to remove all the gaps.

Best,
Zhen

…g csv output with tabs

arzwa · 2021-04-19T06:30:31Z

Thanks Zhen, nice to see someone helping out! Concerning the cleandata thing in PAML, see this reported bug. I had changed some output file formats indeed, but I agree it may be better to keep them compatible with earlier versions.

It seems the tests are failing because

the alignment length has changed due to gap trimming you introduced, so I think we should just update the tests to test with/without trimming.
something related to your last commit concerning the diamond output, which I don't see immediately.

So if we update the tests, I can merge this in.

lizhencmb · 2021-04-19T20:43:51Z

Hi Arthur, the failed tests were due to replacing gene ids in multi-species diamond search and alignment trimming. I see that that we trim sequence alignments twice (a bit redundant) and the tests only considers the first one. I did not change the tests for now, but will modify it a little bit later, e.g. maybe for gene tree inference we can tolerate some gaps.

For the commit about diamond, I just added a diamond output file in the output folder (in wgd_dmd by default).

lizhencmb added 6 commits April 13, 2021 17:41

add diamond results in the output

6bf917d

rename and reformat mcl output

634b586

read gene family accordingly

60b08e1

add _strip_aln() to tolerate gaps for codeml using the whole alignment

99e1cd6

add statistics for pairwise alignments in the --pairwise mode

b854765

add statistics for pairwise alignments in the normal mode; reformatin…

c6788ef

…g csv output with tabs

replace original gene ids correctly in the diamond output

2502a8f

save pairs without Ks in a different file; add weights in the ksd output

6561b02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev #61

Dev #61

lizhencmb commented Apr 17, 2021

arzwa commented Apr 19, 2021 •

edited

Loading

lizhencmb commented Apr 19, 2021

Dev #61

Are you sure you want to change the base?

Dev #61

Conversation

lizhencmb commented Apr 17, 2021

arzwa commented Apr 19, 2021 • edited Loading

lizhencmb commented Apr 19, 2021

arzwa commented Apr 19, 2021 •

edited

Loading