Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft spec for pangene search #9

Open
StevenCannon-USDA opened this issue Jul 17, 2023 · 9 comments
Open

Draft spec for pangene search #9

StevenCannon-USDA opened this issue Jul 17, 2023 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@StevenCannon-USDA
Copy link
Contributor

Please see draft spec for pangenes search query - to find ~paralogous/allelic genes (corresponding by homology and synteny):
https://github.com/legumeinfo/website-ui-specs/tree/main/pangenes-search

... and provide feedback. Please respond via this issue.
@sammyjava @That-Thing @maxglycine @jd-campbell @alancleary @adf-ncgr @sdash-github

The pangene sets we have in the Data Store currently are for: Arachis, Cicer, Glycine, Medicago, Phaseolus, Vigna. I've tried to make the spec suitable for use at LegumeInfo, SoyBase, and PeanutBase.

This spec may again come before the mine backend is ready ... but it sounds like it is on the way.

@sammyjava
Copy link
Contributor

Yeah, the mine 5.1.0.3 graphql-server is ready, and we can test against the dev MiniMine, which is on 5.1.0.3. So nothing holding us back pangene set-wise. The dev MiniMine is at https://mines.dev.lis.ncgr.org/minimine/begin.do

@sammyjava
Copy link
Contributor

sammyjava commented Jul 17, 2023

FYI, here's what PanGeneSet looks like in the graphql-server branch, just a bucket o' genes and proteins.

<class name="PanGeneSet" extends="Annotatable" is-interface="true">
        <collection name="dataSets" referenced-type="DataSet"/>
        <collection name="genes" referenced-type="Gene" reverse-reference="panGeneSets"/>
        <collection name="proteins" referenced-type="Protein" reverse-reference="panGeneSets"/>
</class>
type PanGeneSet implements Annotatable {
  ## Annotatable
  id: ID!
  identifier: ID!
  ontologyAnnotations: [OntologyAnnotation!]!
  publications: [Publication!]!
  ## PanGeneSet
  dataSets: [DataSet]
  genes: [Gene]
  proteins: [Protein]
}

@adf-ncgr
Copy link
Contributor

thanks @StevenCannon-USDA I have a couple of minor (maybe) comments/questions on the initial spec:

  • the results you show seem to be displaying transcript/protein isoform ids; is this intended or should we just focus on the gene ids in what we present (seems cleaner to me)?
  • might we want to provide any additional details about the member genes such as their locations or sizes (e.g. to give at least a crude sense for variability)?
  • is there any implied sorting in how the pangene members are listed?
  • should the accession dropdown support multi-selection (e.g. suppose I want to get allelic comparisons between two favorite lines). And note that your first example seems to imply it shouldn't be a dropdown, but a text box matched as "contains"?
  • might we want to make explicit when a given accession is absent from a pangene set? e.g. suppose I wanted to know about genes that are missing from my favorite soybean line- would I want to get empty pangene representations for those pangene sets in which a selected accession does not occur, or simply not get them in the returned results?
  • would we want a linkout for the set of genes belonging to a pangene (e.g. pushing them to the GCV multi-alignment view or to an intermine list)

some of these are probably just stuff to think about for future iterations.

@maxglycine
Copy link

May want to add an output option to download query results to the users computer. A query could return a large amount of identifiers and the user may want to save them. Otherwise, the user would have to copy html text and paste it somewhere.

@sammyjava
Copy link
Contributor

sammyjava commented Jul 18, 2023

Genes in this pangene set would be best implemented by adding "size" to the PanGeneSet object in the mines and populating it in a post-processor, as we do with GeneFamily. That is not currently present in PanGeneSet in 5.1.0.3. Nor are there any other aggregate quantities like we have in GeneFamily 5.1.0.3:

<class name="GeneFamily" extends="Annotatable" is-interface="true" term="">
        <attribute name="description" type="java.lang.String"/>
        <attribute name="version" type="java.lang.String"/>
        <attribute name="size" type="java.lang.Integer"/>
        <reference name="phylotree" referenced-type="Phylotree" reverse-reference="geneFamily"/>
        <collection name="genes" referenced-type="Gene"/>
        <collection name="proteins" referenced-type="Protein"/>
        <collection name="proteinDomains" referenced-type="ProteinDomain" reverse-reference="geneFamilies"/>
        <collection name="dataSets" referenced-type="DataSet"/>
        <collection name="tallies" referenced-type="GeneFamilyTally" reverse-reference="geneFamily"/>
</class>

If this is a Big Deal, stop me from building 5.1.0.3 mines. GlycineMine 5.1.0.3 is almost built, took two weeks.

@sammyjava
Copy link
Contributor

May want to add an output option to download query results to the users computer. A query could return a large amount of identifiers and the user may want to save them. Otherwise, the user would have to copy html text and paste it somewhere.

This sounds like an across-the-board option that would be implemented for all results output like pagination. Thoughts, @alancleary ? After all, we all remember that "Every page should have a download button!" :)

@StevenCannon-USDA
Copy link
Contributor Author

@sammyjava - "Genes in this pangene set" - I would say "not a big deal" (not a high priority in the first implementation).

@sammyjava
Copy link
Contributor

@StevenCannon-USDA I'm a bit confused about the scope of this search. Are you saying that we'll have a list of pangene sets, each with its corresponding genes listed below it? For example, what happens if the only search element is "Glycine", all else left blank? A gigantic list of all Glycine pangene-sets with their genes? (Which is fine, if that's what you want.)

@sammyjava
Copy link
Contributor

And, if so, are you specifying that pagination be on a pangene-set-to-pangene-set basis? Each page displays a single pangene set? (That's just setting the page size to 1, which is easy. The list of genes within a pangene set would be part of that pangene set record's display.) Just want some detail on pagination expectations when we've got results which are a list of lists.

@jd-campbell jd-campbell added the enhancement New feature or request label Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants