Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dereferencer #11

Merged
merged 90 commits into from
Jan 17, 2024
Merged

Dereferencer #11

merged 90 commits into from
Jan 17, 2024

Conversation

cedricdcc
Copy link
Contributor

Module dereferencer added that will perform tasks given a config file.

Set config file looks like this:

SPARQL: >
  PREFIX dc: <http://purl.org/dc/terms/> 
  SELECT ?o
  FROM <urn:lwua:INGEST:/data/mr_regions_ldes_test.ttl>
  WHERE {
    ?s dc:isVersionOf ?o .
      FILTER regex(str(?o), "marineregions.org")
  }
property_paths:
  - http://marineregions.org/ns/ontology#hasGeometry
  - http://marineregions.org/ns/ontology#isPartOf:
    - http://marineregions.org/ns/ontology#hasGeometry
    - https://schema.org/geo:
      - https://schema.org/latitude
      - https://schema.org/longitude

laurianvm and others added 30 commits September 8, 2023 07:17
make it easy to get the jupyer token
but moving the log elsewhere
reverses symlink direction to make docker happy (and the other usage satisfied)

applies the poetry build inside docker inage build to fix #1
further enhances the request from #3
some path changes allong the way (conform the reeversed symlinks)
this marks an important milestone for #4 as we can now insert triples!
completes the goal of "test driving the ingest of rdf into graphdb"

fixes  #4
cedricdcc and others added 26 commits December 5, 2023 19:42
Know issue with inserting wktstrings, look into postgres for solving this
TODO: discuss with marc what the correct approach is for this
- added re testing
- deleted make_tasks tests because function is no longer in use
- added edmo as test for the script tag search
- added pytest.ini so when doing pytests I can see what went wrong
@cedricdcc cedricdcc marked this pull request as ready for review January 17, 2024 16:24
@cedricdcc cedricdcc merged commit f1058bf into lifewatch:main Jan 17, 2024
?s ?p schema:Dataset .
}
assert-paths:
cache_lifetime: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remind me what this does?

cache_lifetime: 18000
```

In this example, the subjects are the literal values of the URIs. The `assert-paths` are the property paths to follow in the results. The `cache_lifetime` is the lifetime of the cache in minutes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still does not explain what is being cached, and thus does not help me to consider what should drive my decision as a user to set, increase, decrease this thingy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is a _test file should we not put it somewhere else?
at least grouped under ./data/tests for now but maybe even ./tests/data (although that might come natural if we repackage this as k-gap project without other data)

cc:prohibits cc:CCCommercialUse .

<https://data.arms-mbon.org/> a schema:Project .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this too is being added for testing purposes?

if so please consider moving to separate file and under ./data/tests

SPARQL: >
select DISTINCT ?s where {
?s ?p ?o .
FILTER regex(str(?s), "^http://marineregions.org/mrgid/[0-9]{1,5}$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

violates the adagium "sparql regex on URI considered bad"

please rephrase in terms of selecting subjects of a certain type maybe?

:return: The results of the query
:rtype: list
"""
template = "deref_property_trajectory.sparql"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select_property_path.sparql

as it should not be tied to us using this in a context of deciding to maybe deref and fetch

"subject": uri,
"property_trajectory": pp_trajectory,
"prefixes": prefixes}
query = J2RDF.build_syntax(template, **vars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are not using template var anywhere else - pls consider inlining for readability?

alternative - make decent local constants for the provided templates you use in this file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop deref in this name, there is nothing in this template that ties its use to the conicidence of using it here in the deref component context

or put differently why not add deref_ to all folders and files in this space ¯_(ツ)_/¯

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also -- is this still in use / useful?
seems to be a precursor / double of the more complete next one in the list?


SELECT ?o
WHERE {
<{{ subject }}> <{{ property }}> ?o .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename property to path or property_path ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another deref_ prefix to drop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and kill that _trajectory thing as well - not meaningful, only confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now spotted the subtle < > difference --> does that make sense / enough motivation to keep it around?

Copy link
Contributor

@marc-portier marc-portier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently already merged - but with many todos imho


SELECT ?o
WHERE {
<{{ subject }}> {{ property_trajectory }} ?o .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

property_trajectory should be property_path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now spotted the subtle < > difference --> does that make sense / enough motivation to keep it around?

WHERE {
?s rdf:type schema:Person .
}
assert-pathsy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo? assert-pathsy has a trailing y

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange - what is the use?
if temporary disabling of a deref-config is the goal then it now requires renaming anyway -- so just make the arrangement to add a .bak suffix of sorts rather then alter the dereference* prefix ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and all following files in this directory

does not make sense that the PR branch for lwua-deref is making all these changes to deref-ingest ...

remove / move to different PR if needed ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally hate foldernames with suffix-s
./config/ will do nicely

motivation: there is only one folder with that name (so it is singular :) -- , and all folders are meant to possibly contain multiple files ... so what gives?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this general stuff in this specific PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is jupyter folder included in this PR for deref?

added some showcases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants