Skip to content

Commit

Permalink
Merge branch 'master' into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
karacolada authored Apr 10, 2024
2 parents eab06b8 + 2c2cad1 commit e90842c
Show file tree
Hide file tree
Showing 21 changed files with 134 additions and 102 deletions.
18 changes: 13 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
# with:
# python-version: '3.11'
# - name: Cache python dependencies
# uses: actions/cache@v3
# uses: actions/cache@v4
# with:
# path: ~/.cache/pip
# key: pip-docs-${{ hashFiles('pyproject.toml') }}
Expand All @@ -52,7 +52,7 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Cache python dependencies
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
~/.cache/pip
Expand All @@ -71,6 +71,8 @@ jobs:

tests:
runs-on: ubuntu-22.04
permissions:
contents: write
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.11
Expand All @@ -84,9 +86,6 @@ jobs:
python -m pip install --upgrade hatch
- name: Run test suite with coverage
run: hatch run cov-ci
- name: Generate badges
if: always()
run: hatch run badges
- name: Upload test results
if: always()
uses: actions/upload-artifact@v4
Expand All @@ -101,6 +100,15 @@ jobs:
name: coverage-results
retention-days: 1
path: pytest-cobertura.xml
- run: rm ./reports/coverage/.gitignore
- name: Generate coverage badge
if: github.ref == 'refs/heads/master'
run: hatch run cov-badge
- name: Deploy reports to GitHub Pages
if: github.ref == 'refs/heads/master'
uses: JamesIves/github-pages-deploy-action@65b5dfd4f5bcd3a7403bbc2959c144256167464e # v4.5.0
with:
folder: ./reports

event_file:
runs-on: ubuntu-22.04
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/reports.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ jobs:
# Ref: https://github.com/marocchino/sticky-pull-request-comment#inputs
- name: Add Code Coverage PR Comment
if: ${{ steps.get-pr-number.outputs.number }} != null
uses: marocchino/sticky-pull-request-comment@efaaab3fd41a9c3de579aba759d2552635e590fd # v2.8.0
uses: marocchino/sticky-pull-request-comment@331f8f5b4215f0445d3c07b4967662a32a2d3e31 # v2.9.0
with:
recreate: true
number: ${{ steps.get-pr-number.outputs.number }}
Expand Down
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ repos:
rev: v4.5.0
hooks:
- id: end-of-file-fixer
exclude_types: [svg]
- id: mixed-line-ending
types: [python]
- id: trailing-whitespace
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@ Developers: [Robert Huber](mailto:[email protected]), [Anusuriya Devaraju](mailto:
Thanks to [Heinz-Alexander Fuetterer](https://github.com/afuetterer) for his contributions and his help in cleaning up the code.

[![CI](https://github.com/pangaea-data-publisher/fuji/actions/workflows/ci.yml/badge.svg)](https://github.com/pangaea-data-publisher/fuji/actions/workflows/ci.yml)
[![Coverage](https://pangaea-data-publisher.github.io/fuji/coverage/coveragebadge.svg)](https://pangaea-data-publisher.github.io/fuji/coverage/)

[![Publish Docker image](https://github.com/pangaea-data-publisher/fuji/actions/workflows/publish-docker.yml/badge.svg)](https://github.com/pangaea-data-publisher/fuji/actions/workflows/publish-docker.yml)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4063720.svg)](https://doi.org/10.5281/zenodo.4063720)


## Overview

F-UJI is a web service to programmatically assess FAIRness of research data objects based on [metrics](https://doi.org/10.5281/zenodo.3775793) developed by the [FAIRsFAIR](https://www.fairsfair.eu/) project.
Expand Down
6 changes: 4 additions & 2 deletions fuji_server/controllers/fair_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ def __init__(
self.pid_url = None # full pid # e.g., "https://doi.org/10.1594/pangaea.906092 or url (non-pid)
self.landing_url = None # url of the landing page of self.pid_url
self.origin_url = None # the url from where all starts - in case of redirection we'll need this later on
self.repository_urls = [] # urls identified which could represent the repository
self.repository_urls = [] # urls identified which could represent the repository will need this probably for FAIRiCAT things
self.landing_html = None
self.landing_content_type = None
self.landing_origin = None # schema + authority of the landing page e.g. https://www.pangaea.de
Expand Down Expand Up @@ -399,6 +399,8 @@ def retrieve_metadata_external(self, target_url=None, repeat_mode=False):
self.linked_namespace_uri.update(self.metadata_harvester.linked_namespace_uri)
self.related_resources.extend(self.metadata_harvester.related_resources)
self.metadata_harvester.get_signposting_object_identifier()
self.pid_url = self.metadata_harvester.pid_url
self.pid_scheme = self.metadata_harvester.pid_scheme
self.pid_collector.update(self.metadata_harvester.pid_collector)

"""def lookup_metadatastandard_by_name(self, value):
Expand Down Expand Up @@ -694,4 +696,4 @@ def set_repository_uris(self):
self.repository_urls.append(publisher_url)
if self.repository_urls:
self.repository_urls = list(set(self.repository_urls))
print("REPOSITORY: ", self.repository_urls)
# print("REPOSITORY: ", self.repository_urls)
2 changes: 1 addition & 1 deletion fuji_server/evaluators/fair_evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ def isTestDefined(self, testid):
if testid in self.metric_tests:
return True
else:
self.logger.info(
self.logger.debug(
self.metric_identifier
+ " : This test is not defined in the metric YAML and therefore not performed: "
+ str(testid)
Expand Down
2 changes: 1 addition & 1 deletion fuji_server/evaluators/fair_evaluator_formal_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ def testExternalStructuredMetadataAvailable(self):
sparql_provider = SPARQLMetadataProvider(
endpoint=self.fuji.sparql_endpoint, logger=self.logger, metric_id=self.metric_identifier
)
if self.fuji.pid_url == None:
if self.fuji.pid_url is None:
url_to_sparql = self.fuji.landing_url
else:
url_to_sparql = self.fuji.pid_url
Expand Down
30 changes: 15 additions & 15 deletions fuji_server/evaluators/fair_evaluator_license.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,33 +44,33 @@ def __init__(self, fuji_instance):
def setLicenseDataAndOutput(self):
self.license_info = []
specified_licenses = self.fuji.metadata_merged.get("license")
# if specified_licenses is None: # try GitHub data
# specified_licenses = self.fuji.github_data.get("license")
if specified_licenses is None and self.metric_identifier.startswith("FRSM"): # try GitHub data
specified_licenses = self.fuji.github_data.get("license")
if isinstance(specified_licenses, str): # licenses maybe string or list depending on metadata schemas
specified_licenses = [specified_licenses]
if specified_licenses is not None and specified_licenses != []:
for l in specified_licenses:
for license in specified_licenses:
isurl = False
licence_valid = False
license_output = LicenseOutputInner()
if isinstance(l, str):
isurl = idutils.is_url(l)
if isinstance(license, str):
isurl = idutils.is_url(license)
if isurl:
iscc, generic_cc = self.isCreativeCommonsLicense(l, self.metric_identifier)
iscc, generic_cc = self.isCreativeCommonsLicense(license, self.metric_identifier)
if iscc:
l = generic_cc
spdx_uri, spdx_osi, spdx_id = self.lookup_license_by_url(l, self.metric_identifier)
license = generic_cc
spdx_uri, spdx_osi, spdx_id = self.lookup_license_by_url(license, self.metric_identifier)
else: # maybe licence name
spdx_uri, spdx_osi, spdx_id = self.lookup_license_by_name(l, self.metric_identifier)
license_output.license = l
spdx_uri, spdx_osi, spdx_id = self.lookup_license_by_name(license, self.metric_identifier)
license_output.license = license
if spdx_uri:
licence_valid = True
license_output.details_url = spdx_uri
license_output.osi_approved = spdx_osi
self.output.append(license_output)
self.license_info.append(
{
"license": l,
"license": license,
"id": spdx_id,
"is_url": isurl,
"spdx_uri": spdx_uri,
Expand Down Expand Up @@ -231,14 +231,14 @@ def testLicenseIsValidAndSPDXRegistered(self):
)
)
if self.license_info:
for l in self.license_info:
for license in self.license_info:
if test_required:
for rq_license_id in test_required:
if l.get("id"):
if fnmatch.fnmatch(l.get("id"), rq_license_id):
if license.get("id"):
if fnmatch.fnmatch(license.get("id"), rq_license_id):
test_status = True
else:
if l.get("valid"):
if license.get("valid"):
test_status = True
else:
self.logger.warning(
Expand Down
4 changes: 2 additions & 2 deletions fuji_server/harvester/data_harvester.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def retrieve_all_data(self, scan_content=True):
fl["size"] = None
else:
fl["size"] = None
if fl.get("type") == None:
if fl.get("type") is None:
if fl["trust"] > 1:
fl["trust"] -= 1
elif "/" in str(fl.get("type")):
Expand All @@ -113,7 +113,7 @@ def retrieve_all_data(self, scan_content=True):
timeout = 10
if len(ft) > self.max_number_per_mime:
self.logger.warning(
f"FsF-F3-01M : Found more than -: {self.max_number_per_mime!s} data links (out of {len(ft)!s}) of type {fmime} will only take {self.max_number_per_mime!s}"
f"FsF-F3-01M : Found more than -: {self.max_number_per_mime!s} data links (out of {len(ft)!s}) of type {fmime} will only take {self.max_number_per_mime!s} for content analysis"
)
files_to_check = ft[: self.max_number_per_mime]
# add the fifth one for compatibility reasons < f-uji 3.0.1, when we took the last of list of length FILES_LIMIT
Expand Down
62 changes: 36 additions & 26 deletions fuji_server/harvester/metadata_harvester.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ def merge_metadata(self, metadict, url, method, format, mimetype, schema="", nam
"FsF-F2-01M : Harvesting of this metadata is explicitely disabled in the metric configuration-:"
+ str(metadata_standard)
)
if isinstance(metadict, dict) and allow_merge == True:
if isinstance(metadict, dict) and allow_merge is True:
# self.metadata_sources.append((method_source, 'negotiated'))
for r in metadict.keys():
if r in self.reference_elements:
Expand Down Expand Up @@ -246,14 +246,14 @@ def merge_metadata(self, metadict, url, method, format, mimetype, schema="", nam
print("Metadata Merge Error: " + str(e), format, mimetype, schema)

def exclude_null(self, dt):
if type(dt) is dict:
if isinstance(dt, dict):
return dict((k, self.exclude_null(v)) for k, v in dt.items() if v and self.exclude_null(v))
elif type(dt) is list:
elif isinstance(dt, list):
try:
return list(set([self.exclude_null(v) for v in dt if v and self.exclude_null(v)]))
except Exception:
return [self.exclude_null(v) for v in dt if v and self.exclude_null(v)]
elif type(dt) is str:
elif isinstance(dt, str):
return dt.strip()
else:
return dt
Expand All @@ -263,17 +263,22 @@ def check_if_pid_resolves_to_landing_page(self, pid_url=None):
candidate_landing_url = self.pid_collector[pid_url].get("resolved_url")
if candidate_landing_url and self.landing_url:
candidate_landing_url_parts = extract(candidate_landing_url)
# print(candidate_landing_url_parts )
# landing_url_parts = extract(self.landing_url)
input_id_domain = candidate_landing_url_parts.domain + "." + candidate_landing_url_parts.suffix
# landing_domain = landing_url_parts.domain + "." + landing_url_parts.suffix
if self.landing_domain != input_id_domain:
self.logger.warning(
"FsF-F1-02D : Landing page domain resolved from PID found in metadata does not match with input URL domain -:"
+ str(pid_url)
+ str(self.landing_domain)
+ " <> "
+ str(input_id_domain)
)
self.logger.warning(
"FsF-F2-01M : Landing page domain resolved from PID found in metadata does not match with input URL domain -:"
+ str(pid_url)
+ str(self.landing_domain)
+ " <> "
+ str(input_id_domain)
)
return False
else:
Expand Down Expand Up @@ -321,7 +326,8 @@ def check_pidtest_repeat(self):
validated = False
if idhelper.is_persistent and validated:
found_pids[found_id_scheme] = idhelper.get_identifier_url()
if len(found_pids) >= 1 and self.repeat_pid_check == False:
if len(found_pids) >= 1 and self.repeat_pid_check is False:
# print(found_pids, next(iter(found_pids.items())))
self.logger.info(
"FsF-F2-01M : Found object identifier in metadata, repeating PID check for FsF-F1-02D"
)
Expand All @@ -345,12 +351,12 @@ def set_html_typed_links(self):
try:
dom = lxml.html.fromstring(self.landing_html.encode("utf8"))
links = dom.xpath("/*/head/link")
for l in links:
for link in links:
source = MetadataOfferingMethods.TYPED_LINKS
href = l.attrib.get("href")
rel = l.attrib.get("rel")
type = l.attrib.get("type")
profile = l.attrib.get("format")
href = link.attrib.get("href")
rel = link.attrib.get("rel")
type = link.attrib.get("type")
profile = link.attrib.get("format")
type = str(type).strip()
# handle relative paths
linkparts = urlparse(href)
Expand Down Expand Up @@ -628,6 +634,7 @@ def retrieve_metadata_embedded_extruct(self):
pass

extracted = extruct.extract(extruct_target, syntaxes=syntaxes, encoding="utf-8")

except Exception as e:
extracted = {}
self.logger.warning(
Expand Down Expand Up @@ -673,7 +680,7 @@ def retrieve_metadata_embedded(self):
# requestHelper.setAcceptType(AcceptTypes.html_xml) # request
requestHelper.setAcceptType(AcceptTypes.default) # request
neg_source, landingpage_html = requestHelper.content_negotiate("FsF-F1-02D", ignore_html=False)
if not "html" in str(requestHelper.content_type):
if "html" not in str(requestHelper.content_type):
self.logger.info(
"FsF-F2-01M :Content type is "
+ str(requestHelper.content_type)
Expand Down Expand Up @@ -701,17 +708,17 @@ def retrieve_metadata_embedded(self):
self.logger.error("FsF-F2-01M : Resource inaccessible -: " + str(e))
pass

if self.landing_url and self.is_html_page:
if self.landing_url:
if self.landing_url not in ["https://datacite.org/invalid.html"]:
if response_status == 200:
if "html" in requestHelper.content_type:
self.raise_warning_if_javascript_page(requestHelper.response_content)

up = urlparse(self.landing_url)
upp = extract(self.landing_url)
self.landing_origin = f"{up.scheme}://{up.netloc}"
self.landing_domain = upp.domain + "." + upp.suffix
self.landing_html = requestHelper.getResponseContent()
if self.is_html_page:
self.landing_html = requestHelper.getResponseContent()
self.landing_content_type = requestHelper.content_type
self.landing_redirect_list = requestHelper.redirect_list
self.landing_redirect_status_list = requestHelper.redirect_status_list
Expand Down Expand Up @@ -1440,16 +1447,19 @@ def retrieve_metadata_external(self, target_url=None, repeat_mode=False):
target_url_list = [self.origin_url, self.landing_url]
# specific target url
if isinstance(target_url, str):
target_url_list = [target_url]

target_url_list = set(tu for tu in target_url_list if tu is not None)
self.retrieve_metadata_external_xml_negotiated(target_url_list)
self.retrieve_metadata_external_schemaorg_negotiated(target_url_list)
self.retrieve_metadata_external_rdf_negotiated(target_url_list)
self.retrieve_metadata_external_datacite()
if not repeat_mode:
self.retrieve_metadata_external_linked_metadata()
self.retrieve_metadata_external_oai_ore()
if self.use_datacite is False and "doi" == self.pid_scheme:
target_url_list = []
else:
target_url_list = [target_url]
if target_url_list:
target_url_list = set(tu for tu in target_url_list if tu is not None)
self.retrieve_metadata_external_xml_negotiated(target_url_list)
self.retrieve_metadata_external_schemaorg_negotiated(target_url_list)
self.retrieve_metadata_external_rdf_negotiated(target_url_list)
self.retrieve_metadata_external_datacite()
if not repeat_mode:
self.retrieve_metadata_external_linked_metadata()
self.retrieve_metadata_external_oai_ore()

"""if self.reference_elements:
self.logger.debug(f"FsF-F2-01M : Reference metadata elements NOT FOUND -: {self.reference_elements}")
Expand Down
2 changes: 1 addition & 1 deletion fuji_server/helper/linked_vocab_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ def set_linked_vocab_index(self):
def get_overlap(self, s1, s2):
result = ""
for char in s1:
if char in s2 and not char in result:
if char in s2 and char not in result:
result += char
return len(result)

Expand Down
4 changes: 2 additions & 2 deletions fuji_server/helper/metadata_collector.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,8 +257,8 @@ def getMetadataMapping(self):
def getLogger(self):
return self.logger

def setLogger(self, l):
self.logger = l
def setLogger(self, logger):
self.logger = logger

def getSourceMetadata(self):
return self.source_metadata
Expand Down
Loading

0 comments on commit e90842c

Please sign in to comment.