Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate CoL-Data #12

Open
1 of 3 tasks
nleanba opened this issue Jul 24, 2024 · 7 comments · May be fixed by #13
Open
1 of 3 tasks

Integrate CoL-Data #12

nleanba opened this issue Jul 24, 2024 · 7 comments · May be fixed by #13
Assignees

Comments

@nleanba
Copy link
Collaborator

nleanba commented Jul 24, 2024

@nleanba nleanba self-assigned this Jul 24, 2024
@nleanba
Copy link
Collaborator Author

nleanba commented Jul 24, 2024

It might be worthwile to let synolib also accept URIs for CoL-taxons and (our) taxon-concepts/names as search query -- this would allow taxomplete to just have an IRI as a value and would allow for easy queries for non-binomial names or names with weird characters

@nleanba nleanba linked a pull request Jul 24, 2024 that will close this issue
@nleanba nleanba linked a pull request Jul 24, 2024 that will close this issue
@nleanba
Copy link
Collaborator Author

nleanba commented Jul 24, 2024

I have (in PR) implemented searching by URI.

I think it might be a good idea to only allow searches by URI, and make Taxomplete feed Synolib an URI directly.

We might make a more generic (tn/tc/col uri or latin name) → list of tn/tc/col uris function (finding trivial synonyms where the name is the same) to be used by Taxomplete to give a starting point URI and by synolib to find trivial synonyms.

This would unify all places where we search for names by string literals and would make it easier to integrate CoL data.

@retog opinions?

edit: Taxomplete would still need its own code to find URIs, as it is the only place where we want to find partial matches (it's auto complete after all)

@nleanba
Copy link
Collaborator Author

nleanba commented Jul 24, 2024

UI wise I think Taxomplete should show small badges in it's auto complete indicating if the suggestion is a TN/TC or CoL taxon (or both)

@nleanba
Copy link
Collaborator Author

nleanba commented Jul 24, 2024

In general, there might be some benefit in reconsidering the data structures of synolib.

  • Having the list of synonyms be a flat list of taxon concepts (i.e taxons with authority) has some issues wrt. treatments associated to a taxon name (i.e. no authority) and ordering of results
  • I think a list of taxon names T (which may map to our taxon names tn or be implied by a col taxon) containing "taxa with authorities" TC (which can each be associated a RDF taxon-concept tc, a col taxon or both)
  • this would allow:
    • finding taxon names with no tc taxon-concepts (currently special cased by Synospecies, not found by Synolib, see also Remove need for queries in SynoSpecies #10)
    • grouping in the timeline (I am imagining a row per taxon name T, which is expandable into one row per TC and potentially a row containing the treatments associated to the T directly
    • finding taxons only present as col but not in any treatment (so no tn or tc)
    • restricted search radius, e.g. a no-synonm mode where it only looks for trivial synonyms where the latin name matches, which would be useful if a search is only to check if a certain treatment has been updated correctly but is not interested in the full list of synonyms (for large groups e.g. t Rex this is a performance considération) or only expanding e.g more than five names (T) if the user requests it*

This would be a slight (but clean) divorce between the treatment rdf structure and the synolib structure, which is necessary to integrate the col data.

Related to *: It might make sense to make this list be "lazy evaluated" where Synolib only fully loads all TC of a T and further synonyms when requested/awaited by the library user (Synospecies). Internally, Synolib would still collect already found synonyms and not make duplicate queries, but only start expanding them to find more on request.

@nleanba
Copy link
Collaborator Author

nleanba commented Jul 24, 2024

We have to be careful however when using CoL taxa to find synonyms:
Different tcs (with differing latin names) can be linked to the same col taxon -- therefore, we always must check if two potentially trivial synonyms are actually the same latin name (same T)

example:

SELECT DISTINCT * WHERE {
  ?s ?p <https://www.catalogueoflife.org/data/taxon/8295f6bf-59a3-431a-9e7b-eff343efa154> .
}

(for my own reference:

PREFIX cito: <http://purl.org/spar/cito/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dwc: <http://rs.tdwg.org/dwc/terms/>
PREFIX treat: <http://plazi.org/vocab/treatment#>
SELECT DISTINCT
  ?tc (group_concat(DISTINCT ?auth; separator=" / ") as ?authority) (group_concat(DISTINCT ?colid; separator="|") as ?colids) (group_concat(DISTINCT ?aug;separator="|") as ?augs) (group_concat(DISTINCT ?def;separator="|") as ?defs) (group_concat(DISTINCT ?dpr;separator="|") as ?dprs) (group_concat(DISTINCT ?cite;separator="|") as ?cites)
WHERE {
  {
    ?tc treat:hasTaxonName <http://taxon-name.plazi.org/id/Animalia/Sadayoshia_miyakei> .
  } UNION {
    ?tc <http://www.w3.org/2000/01/rdf-schema#seeAlso> <https://www.catalogueoflife.org/data/taxon/8295f6bf-59a3-431a-9e7b-eff343efa154> .
  }
  OPTIONAL { ?tc dwc:scientificNameAuthorship ?auth . }
  OPTIONAL { ?tc <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?colid . }
  OPTIONAL { ?aug treat:augmentsTaxonConcept ?tc . }
  OPTIONAL { ?def treat:definesTaxonConcept ?tc . }
  OPTIONAL { ?dpr treat:deprecates ?tc . }
  OPTIONAL { ?cite cito:cites ?tc . }
}
GROUP BY ?tc

)

@retog
Copy link
Collaborator

retog commented Jul 25, 2024

The sarch by URI change sounds reasonable.

As for the change in synolib, should it only support synonyms or no-synonyms are also to restrict the search, so some synonyms but mabye not all?

@nleanba
Copy link
Collaborator Author

nleanba commented Jul 25, 2024

I think having the option to restrict the search (all, some, no synonyms) is only a future possibility and we don't really need to decide on details now.

If we do implement it, I would (start) with two modes only: all synonyms (same as now, default) and no-non-trivial synonyms (i.e. only things that have the same latin name as was searched for.

Middle-ground modes to me only make sense if a) there is explicit demand for it or b) we can implement them in a way that is either invisible to the user or provides a somehow more ergonomic experience (e.g. load the next synonyms only on scroll or idk). I mostly included it in my list above because it is a possibility, not beacuse I see an urgent need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants