ncbi_egapx: form seems to work and generates what looks like valid yaml #29

fubar2 · 2024-09-05T03:57:27Z

needs tests but cannot even run let alone test here - no machine with 120GB or 31 cores - because of the resource requirements baked into the docker image

richard-burhans

Looks good!

richard-burhans · 2024-09-09T22:07:31Z

Attention: deployment failure!

https://github.com/richard-burhans/galaxytools/actions/runs/10781909881

nekrut · 2024-10-08T19:39:47Z

@fubar2 or @richard-burhans = we need to add an option that would allow EGAPx to take a file with Protein FASTA as annotation source (see https://github.com/ncbi/egapx?tab=readme-ov-file#input-data-format)

fubar2 · 2024-10-09T00:06:39Z

would allow EGAPx to take a file with Protein FASTA

@nekrut: If that protein fasta is independent of the NCBI, then it may make sense to use it in the HMM.
Easy to add another form input, but you probably need to be certain that the supplied protein annotation is statistically independent of the NCBI protein annotation to contribute useful, unbiased information.

If any NCBI protein fasta exists for a taxon, it is AFAIK an output from running the internal NCBI pipeline that has become egapx. So predicting proteins using egapx, relying on information from a fasta that has been predicted by the father of egapx, may yield biased and uninterpretable results AFAIK because of lack of statistical independence in some of the inputs used for prediction?

Not an expert but this is a good question for one of the NCBI authors.

marco91sol · 2024-10-09T13:19:12Z

I believe we can implement this and run some tests to compare the results with and without the additional protein FASTA and HMM files. EGAPx treats these as optional parameters:

However, the challenge is that the taxid-protein set is not available for all taxa (I've already contacted the CGR group for more information). For instance, I encountered this issue while testing on sharks, where the taxid wasn't connected to any protein set.

form seems to work and generates what looks like valid yaml

8963cfa

needs tests but cannot even run let alone test here - no machine with 120GB or 31 cores - because of the resource requirements baked into the docker image

fubar2 changed the title ~~form seems to work and generates what looks like valid yaml~~ ncbi_egapx: form seems to work and generates what looks like valid yaml Sep 5, 2024

fubar2 added 3 commits September 5, 2024 15:12

update some wording

ba749a8

replace double quotes with singles in command lines

336da21

more tweaks..

24dcdc2

richard-burhans approved these changes Sep 9, 2024

View reviewed changes

richard-burhans marked this pull request as ready for review September 9, 2024 21:57

richard-burhans merged commit 050f870 into richard-burhans:main Sep 9, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ncbi_egapx: form seems to work and generates what looks like valid yaml #29

ncbi_egapx: form seems to work and generates what looks like valid yaml #29

fubar2 commented Sep 5, 2024

richard-burhans left a comment

richard-burhans commented Sep 9, 2024

nekrut commented Oct 8, 2024

fubar2 commented Oct 9, 2024 •

edited

Loading

marco91sol commented Oct 9, 2024

ncbi_egapx: form seems to work and generates what looks like valid yaml #29

ncbi_egapx: form seems to work and generates what looks like valid yaml #29

Conversation

fubar2 commented Sep 5, 2024

richard-burhans left a comment

Choose a reason for hiding this comment

richard-burhans commented Sep 9, 2024

nekrut commented Oct 8, 2024

fubar2 commented Oct 9, 2024 • edited Loading

marco91sol commented Oct 9, 2024

fubar2 commented Oct 9, 2024 •

edited

Loading