Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synolib2 #14

Draft
wants to merge 50 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
5da326a
Foundation
nleanba Oct 25, 2024
13c13d5
finds initial name if given colUri
nleanba Oct 25, 2024
ed57af6
working on queuing
nleanba Oct 25, 2024
d58f9e6
mostly working for coL and tc
nleanba Oct 27, 2024
cef7cd9
small cleanup
nleanba Oct 27, 2024
18c96a6
deduplicated code
nleanba Oct 27, 2024
aaaa469
added support for TNuris
nleanba Oct 27, 2024
bd30d55
small tweak
nleanba Oct 27, 2024
b253e7d
justifications!
nleanba Oct 27, 2024
7507c72
doc cleanup
nleanba Oct 27, 2024
229f010
latin names
nleanba Oct 27, 2024
ce97f55
recursife justification display
nleanba Oct 27, 2024
c61f3c0
vernacular names
nleanba Oct 27, 2024
f8e0181
foloow CoL-acceptedName-links
nleanba Oct 27, 2024
9676a3f
fixed kingdom-alignment
nleanba Oct 27, 2024
1e17c3b
updated readme
nleanba Oct 27, 2024
e6c6958
added support for subtaxa
nleanba Oct 27, 2024
d8ead30
reasoning
nleanba Oct 27, 2024
bf3e141
some quick fixes for lindas
nleanba Oct 27, 2024
0658ab1
npm and minimal html example
nleanba Oct 27, 2024
be61578
query params
nleanba Oct 27, 2024
e068088
moved npm build stuff to toplevel, reorganized build and example code
nleanba Oct 28, 2024
4c22ecd
updated README.md to explain example
nleanba Oct 28, 2024
f2bb1d7
looks real nice
nleanba Oct 28, 2024
6d34a0b
tweaks
nleanba Oct 28, 2024
8877e5e
fixed build
nleanba Oct 29, 2024
d822086
order treatments by date
nleanba Oct 29, 2024
04f0a1c
expand treatment details on click
nleanba Oct 29, 2024
3778b51
Merge remote-tracking branch 'origin/main' into synolib2
nleanba Oct 29, 2024
38700ed
move to build example with deno
nleanba Oct 29, 2024
d9edaa0
deno fmt
nleanba Oct 29, 2024
5b28261
fix build and lint
nleanba Oct 30, 2024
36d464d
fixed handling of multiple col taxa for one name
nleanba Oct 30, 2024
c80dc94
improved handling of other infraspecific names
nleanba Oct 30, 2024
d9c1d17
fixed duplicate check
nleanba Oct 30, 2024
c1ff9ef
various small improvements
nleanba Oct 30, 2024
e89be67
fixed github action
nleanba Oct 30, 2024
fb76109
actually fix github action
nleanba Oct 31, 2024
d961840
handle col-taxa without authority
nleanba Oct 31, 2024
986931a
resolve ids to names if the are synonyms
nleanba Oct 31, 2024
35c8c3b
actually require tn to ba a tn
nleanba Nov 5, 2024
d932950
moved queries to own file
nleanba Nov 5, 2024
24af399
simplified treatments
nleanba Nov 5, 2024
936a99c
better figure support
nleanba Nov 5, 2024
9663f68
fixed queries to be usable with lindas
nleanba Nov 5, 2024
46d1d35
handle kingdoms
nleanba Nov 8, 2024
57a0684
sungenus
nleanba Nov 8, 2024
1c1aad9
v3.0.1
nleanba Nov 8, 2024
df61eb0
fix getNameFromCoL query
nleanba Nov 8, 2024
2b610f9
v3.1.1
nleanba Nov 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
synonym-group.js

docs

npm-package/node_modules
npm-package/index.js
npm-package/index.d.ts
57 changes: 57 additions & 0 deletions DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Design

<!-- deno-fmt-ignore -->
> [!NOTE]
> This currently (2024-07-27) describes a potential future design, not
> the current one.

## Overview

The central taxonomic entity is one object `N` per latin name.\
Synolib returns as results a list of `N`s.

Each `N` exists because of a taxon-name, taxon-concept or col-taxon in the
data.\
Each `N` is uniquely determined by its human-readable latin name (for taxa
ranking below genus, this is a multi-part name — binomial or trinomial) and
kingdom.\
Each `N` contains `N+A` objects which represent latin names with an authority.\
Each `N` contains, if present, `treatment`s directly associated with the
respective taxon-name.\
Other metadata (if present) of a `N` are the list of its parent names (family,
order, ...); vernacular names; and taxon-name URI.

Each `N+A` exists because of a taxon-concept or col-taxon in the data. It always
has a parent `N`.\
Each `N+A` is uniquely determined by its human-readable latin name (as above),
kingdom and (normalized [^1]) authority.\
Each `N+A` contains, if present, `treatment`s directly associated with the
respective taxon-concept.\
Other metadata (if present) of a `N` are CoL IDs; and taxon-concept URI.

A `treatment` exists because it is in the data, and is identifed by its RDF
URI.\
A `treatment` may _define_, _augment_, _deprecate_ or _cite_ a `N+A`, and
_treat_ or _cite_ a `N`.\
If a `treatment` does _define_, _augment_, _deprecate_ or _treat_ different `N`
and/or `N+A`s, they are considered synonyms.\
Note that _cite_ does not create synonmic links.\
Other metadata of a `treatment` are its authors, material citations, and images.

Starting point of the algorithm is a latin name or the URI of either a
taxon-name, tacon-conecpt or col-taxon.\
It will first try to find the respective `N` and all associated metadata, `N+A`s
and `treatment`s.\
This `N` is the first result.\
Then it will recursively use all synonyms indicated by the found `treatment`s to
find new `N`s.\
For each new `N`, it will find all associated metadata, `N+A`s and `treatment`s;
and return it as the next result.\
Then it will continue to expand recursively until no more new `N`s are found.

The algorithm keeps track of which treatment links it followed and other reasons
it added a `N` to the results.\
This "justification" is also proved as metadata of a `N`.

[^1]: I.e. ignoring differences in punctuation, diacritics, capitalization and
such.
114 changes: 0 additions & 114 deletions JustificationSet.ts

This file was deleted.

18 changes: 14 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,21 @@
A js module to get potential synonyms of a taxon name, the justifications for
such synonymity and treatments about these taxon names or the respective taxa.

See `index.html` for an example of a webpage using the library. Go to
[http://plazi.github.io/synolib/](http://plazi.github.io/synolib/) to open the example page in the browser
and execute the script.
For a command line example using the library see: `example/cli.ts`.

For a simple command line example using the library see: `main.ts`.
You can try it locally using Deno with

```sh
deno run --allow-net ./example/cli.ts Ludwigia adscendens
# or
deno run --allow-net ./example/cli.ts http://taxon-name.plazi.org/id/Plantae/Ludwigia_adscendens
# or
deno run --allow-net ./example/cli.ts http://taxon-concept.plazi.org/id/Plantae/Ludwigia_adscendens_Linnaeus_1767
# or
deno run --allow-net ./example/cli.ts https://www.catalogueoflife.org/data/taxon/3WD9M
```

(replace the argument with whatever name interests you)

## building

Expand Down
81 changes: 81 additions & 0 deletions SparqlEndpoint.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
async function sleep(ms: number): Promise<void> {
const p = new Promise<void>((resolve) => {
setTimeout(resolve, ms);
});
return await p;
}

/** Describes the format of the JSON return by SPARQL endpoints */
export type SparqlJson = {
head: {
vars: string[];
};
results: {
bindings: {
[key: string]:
| { type: string; value: string; "xml:lang"?: string }
| undefined;
}[];
};
};

/**
* Represents a remote sparql endpoint and provides a uniform way to run queries.
*/
export class SparqlEndpoint {
/** Create a new SparqlEndpoint with the given URI */
constructor(private sparqlEnpointUri: string) {}

/** @ignore */
// reasons: string[] = [];

/**
* Run a query against the sparql endpoint
*
* It automatically retries up to 10 times on fetch errors, waiting 50ms on the first retry and doupling the wait each time.
* Retries are logged to the console (`console.warn`)
*
* @throws In case of non-ok response status codes or if fetch failed 10 times.
* @param query The sparql query to run against the endpoint
* @param fetchOptions Additional options for the `fetch` request
* @param _reason (Currently ignored, used internally for debugging purposes)
* @returns Results of the query
*/
async getSparqlResultSet(
query: string,
fetchOptions: RequestInit = {},
_reason = "",
): Promise<SparqlJson> {
// this.reasons.push(_reason);

fetchOptions.headers = fetchOptions.headers || {};
(fetchOptions.headers as Record<string, string>)["Accept"] =
"application/sparql-results+json";
let retryCount = 0;
const sendRequest = async (): Promise<SparqlJson> => {
try {
// console.info(`SPARQL ${_reason} (${retryCount + 1})`);
const response = await fetch(
this.sparqlEnpointUri + "?query=" + encodeURIComponent(query),
fetchOptions,
);
if (!response.ok) {
throw new Error("Response not ok. Status " + response.status);
}
return await response.json();
} catch (error) {
if (fetchOptions.signal?.aborted) {
throw error;
} else if (retryCount < 10) {
const wait = 50 * (1 << retryCount++);
console.warn(`!! Fetch Error. Retrying in ${wait}ms (${retryCount})`);
await sleep(wait);
return await sendRequest();
}
console.warn("!! Fetch Error:", query, "\n---\n", error);
throw error;
}
};
return await sendRequest();
}
}
Loading