Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cedric injest attempt #9

Merged
merged 66 commits into from
Jan 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
3ce6908
Merge pull request #5 from lifewatch/fix/docker-names
laurianvm Sep 8, 2023
08ab97c
Merge pull request #7 from lifewatch/fix/jupyter-token
laurianvm Sep 8, 2023
af750b2
Merge branch 'main' of github.com:lifewatch/user-analysis-2023 into t…
marc-portier Sep 9, 2023
3afba5d
realized we need the data for the ingest
marc-portier Sep 9, 2023
342d63e
docker image builds python using poetry
marc-portier Sep 10, 2023
3a3c169
apply image names
marc-portier Sep 10, 2023
8d6bc1a
getting the graphdb to work together with the sparqlwrapper
marc-portier Sep 10, 2023
d39c314
minor cleanup
marc-portier Sep 10, 2023
1215308
Merge branch 'fix/docker-names' of github.com:lifewatch/user-analysis…
marc-portier Sep 11, 2023
c2d53c5
create a local graphd-db image that initializes the database
marc-portier Sep 11, 2023
97586d3
introduce the notebooks so they become available in the jupyter
marc-portier Sep 11, 2023
e0eefa2
use the new feaures of the jupyter and graphdb images
marc-portier Sep 11, 2023
30bd04f
cleanup not needed test script
marc-portier Sep 11, 2023
a9b85af
ingest of file succeeded
marc-portier Sep 11, 2023
417766c
prefer https for schema.org
marc-portier Oct 26, 2023
c71976b
rename docker/info script, introducing jq and some enhancements
marc-portier Oct 26, 2023
d9f8ce1
fix error in copy statement (2nd arg required)
marc-portier Oct 26, 2023
edeb104
introduce external shared logging volume
marc-portier Oct 26, 2023
b897e68
updated deps
marc-portier Oct 26, 2023
2f90b6b
updated deps
marc-portier Oct 26, 2023
0983c1d
ensure the log folder exists
marc-portier Oct 26, 2023
846ec54
fix path to data - as it is distinct to the location inside the grpah…
marc-portier Oct 26, 2023
c31846c
use the new external logging/ folder
marc-portier Oct 26, 2023
892abd6
extended readme
marc-portier Oct 26, 2023
907d87d
room for more dependencies in ipynb context
marc-portier Oct 26, 2023
523df6e
as is current dump of progress towards autodetection
marc-portier Nov 14, 2023
7eb3d96
normalise dos2unix for /docker/**/*.sh files
cedricdcc Nov 14, 2023
8b93eb1
added watcher to injest
cedricdcc Nov 14, 2023
4378715
deleted non essential code fr starting graphdb-database
cedricdcc Nov 15, 2023
e41d322
watcher works, iri injest error on graph modifications though
cedricdcc Nov 16, 2023
e66d915
working injest , no auto
cedricdcc Nov 16, 2023
6a83d3e
auto injest complete
cedricdcc Nov 16, 2023
4731fe7
small refactoring
cedricdcc Nov 17, 2023
866c1b8
added rdf2j and refactoring of the graph functions
cedricdcc Nov 17, 2023
36d73e4
Update graph_functions.py
cedricdcc Nov 17, 2023
8fa637b
deleted / commented out non used imports
cedricdcc Nov 17, 2023
ee74345
performed autopep8 and black on all python files
cedricdcc Nov 18, 2023
fd4879d
refactoring of watcher.py , editied templates and graphdb.py function…
cedricdcc Nov 21, 2023
9f8cfdc
beginning of tests
cedricdcc Nov 21, 2023
c916ec5
changed const variables and reverted changes on update context lastmod
cedricdcc Nov 22, 2023
5f67142
done refactoring + tests made + workflows for autopep8 and black made
cedricdcc Nov 27, 2023
f486db3
changed version for workflows
cedricdcc Nov 27, 2023
103b51c
renaming workflow file + change in python test file to check if actio…
cedricdcc Nov 27, 2023
3eed9b7
changed python workflow versions to work with arch x64
cedricdcc Nov 27, 2023
a2c6cf8
attempt 4 at working linting
cedricdcc Nov 27, 2023
c1af6c7
Automated code formatting
github-actions[bot] Nov 27, 2023
d982426
last reforctoring mods
cedricdcc Nov 29, 2023
c6246c8
Automated code formatting
github-actions[bot] Nov 29, 2023
9375895
added beginning of dereferencer
cedricdcc Nov 29, 2023
3fa7ada
added dereferencing config and memory
cedricdcc Nov 30, 2023
ccc7e87
Automated python code formatting
github-actions[bot] Nov 30, 2023
04803d5
small updates lwua-ingest and added deref entity runs for orcid and mr
cedricdcc Dec 1, 2023
b4fcec9
Automated python code formatting
github-actions[bot] Dec 1, 2023
29417c6
deleted metadata management for now in search for more favorable system
cedricdcc Dec 1, 2023
f070d16
Automated python code formatting
github-actions[bot] Dec 1, 2023
1b5963d
working dereferencer
cedricdcc Dec 1, 2023
9188dcc
fixed linting workflow
cedricdcc Dec 1, 2023
0f5714a
Update derefEntity.py
cedricdcc Dec 1, 2023
dec05f6
wf-update
cedricdcc Dec 1, 2023
cc35c2b
Automated python code formatting
github-actions[bot] Dec 1, 2023
bbf25ef
Revert "Automated code formatting"
cedricdcc Dec 4, 2023
7cac5e2
Automated python code formatting
github-actions[bot] Dec 4, 2023
19f5a97
no deref
cedricdcc Dec 4, 2023
21d6243
Merge branch 'cedric_injest_attempt' of https://github.com/cedricdcc/…
cedricdcc Dec 4, 2023
5e939aa
Update linting-python-files.yml
cedricdcc Jan 17, 2024
1e44ddd
Merge branch 'main' into cedric_injest_attempt
cedricdcc Jan 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
docker/**/*.sh text eol=lf
43 changes: 43 additions & 0 deletions .github/workflows/linting-python-files.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: Python Linting

on:
pull_request:
types: [closed]
paths:
- 'docker/lwua-ingest/**/*.py'
- 'docker/lwua-dereferencer/**/*.py'

jobs:
lint:
runs-on: ubuntu-latest
steps:
- name: Check out source repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.10.6

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install black autopep8
- name: Run Black
run: |
black docker/lwua-ingest/
black docker/lwua-dereferencer/
- name: Run autopep8
run: |
autopep8 --in-place --aggressive --aggressive --max-line-length 79 --recursive docker/lwua-ingest/
autopep8 --in-place --aggressive --aggressive --max-line-length 79 --recursive docker/lwua-dereferencer/
- name: Commit and push changes
run: |
git config --global user.name 'cedricdcc'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? this auto-lint at server side is odd -- pls check how things were setup in pykg2tbl as part of the client-side commit (so no need for user in config)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather remove this one and add some todo to have the linting on the client - possibly with git-hook

git config --global user.email 'github-actions[bot]@users.noreply.github.com'
git add -A
git commit -m "Automated python code formatting" || exit 0
git push
33 changes: 33 additions & 0 deletions .github/workflows/lwua-ingest-testing.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Python Tests

on:
push:
paths:
- 'docker/lwua-ingest/lwua-py/**/*.py'
pull_request:
paths:
- 'docker/lwua-ingest/lwua-py/**/*.py'

jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Check out source repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.10.6

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install poetry
cd docker/lwua-ingest/lwua-py
poetry install
- name: Run pytest
run: |
cd docker/lwua-ingest/lwua-py
poetry run pytest ./tests/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to run this kind of stuff via Makefile --> again pykg2tbl shows how to set it up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment still applies

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Steps:
2. to build the services simply run

```bash
.$ touch .env # make sure you have an .env file
.$ cp dotenv-example .env # make sure you have an .env file
.$ cd docker && docker-compose build # use docker to build the services
```

Expand Down
2 changes: 1 addition & 1 deletion data/project.ttl
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
schema:creator <https://edmo.seadatanet.org/report/422> .

<https://orcid.org/0000-0003-0663-5907> a schema:Person ;
schema:name "Laurian van Maldghem"^^xsd:string .
schema:name "Laurian van Maldeghem"^^xsd:string .

<https://orcid.org/0000-0002-9648-6484> a schema:Person ;
schema:name "Marc Portier"^^xsd:string .
Expand Down
64 changes: 52 additions & 12 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
version: "3.7"
# -----------------------------------------------------------------------
services:
graphdb:
build:
context: ./graphdb
image: lwua/lwua_graphdb
container_name: lwua_graphdb
ports:
- 7200:7200 # HTTP
volumes:
- ../data:/root/graphdb-import/data
# todo - think about volumes for persistence of data ?
labels:
be.vliz.container.project: "LWUA"
be.vliz.container.group: "services"

jupyter:
build:
context: ./jupyter
Expand All @@ -12,21 +26,42 @@ services:
- graphdb
volumes:
- "./jupyter/notebooks:/notebooks"
env_file:
- ../.env
environment:
- GDB_BASE=http://graphdb:7200/
labels:
be.vliz.container.project: "LWUA"
be.vliz.container.group: "services"

graphdb:
# todo consider a local build that also initializes a repository for lwua inside this lwua_graphdb!
build:
context: ./graphdb
image: lwua/lwua_graphdb
container_name: lwua_graphdb
ports:
- 7200:7200 # HTTP
volumes:
- ../data:/root/graphdb-import/data
# todo - think about volumes for persistence of data
dereferencer:
build:
context: ./lwua-dereferencer/
#args:
image: lwua/dereferencer
container_name: dereferencer
depends_on:
- graphdb
volumes:
- ../data:/data # Store for any input data
- ../configs:/configs # Store for any input data
- ../logging:/logging # Store for any input data
env_file:
- ../.env
environment:
- GDB_BASE=http://graphdb:7200/
- INGEST_DATA_FOLDER=/data
# for test / dev -- no restart and single run
restart: "no"
command: run
# towards deploy -- make restart and keep service running -- consequence: use ctrl-c to stop
# restart: unless-stopped
# command: start
links:
- graphdb
logging:
driver: json-file
options:
max-size: 10m
labels:
be.vliz.container.project: "LWUA"
be.vliz.container.group: "services"
Expand All @@ -37,11 +72,16 @@ services:
#args:
image: lwua/lwua_ingest
container_name: lwua_ingest
depends_on:
- graphdb
volumes:
- ../data:/data # Store for any input data
- ../logging:/logging # Store for any input data
env_file:
- ../.env
environment:
- GDB_BASE=http://graphdb:7200/
- INGEST_DATA_FOLDER=/data
# for test / dev -- no restart and single run
restart: "no"
command: run
Expand Down
42 changes: 1 addition & 41 deletions docker/graphdb/initdb/init_graphdb.sh
Original file line number Diff line number Diff line change
@@ -1,44 +1,4 @@
#!/bin/bash -ex

GDBPIDF="/tmp/init-graphdb-serv.pid"
GDBOUTF="/tmp/init-graphdb-out.txt"

start_graphdb(){
rm -f ${GDBPIDF}
graphdb -s -p ${GDBPIDF} >${GDBOUTF} 2>&1 &
sleep 1
}

wait_graphdb(){
count=0
while ! nc -z localhost 7200; do
count=$((count+1))
if [ $count -gt 1000 ]; then
return
fi
# else
sleep 0.1 # wait for 1/10 of the second before check again
done
}

stop_graphdb(){
kill -9 $(cat ${GDBPIDF})
sleep 1
rm -f ${GDBPIDF}
rm -f ${GDBOUTF}
}

createdb() {
curl -X POST http://localhost:7200/rest/repositories -H 'Content-Type: multipart/form-data' -F [email protected]
}


# one could do it like this
#start_graphdb
#wait_graphdb
#createdb
#wait_configdb
#stop_graphdb
#!/bin/bash

# but actually this just works too:
REPODIR="/opt/graphdb/home/data/repositories/lwua23"
Expand Down
8 changes: 4 additions & 4 deletions docker/jupyter/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM jupyter/base-notebook

RUN pip install pykg2tbl

VOLUME /notebooks
WORKDIR /notebooks
WORKDIR /notebooks

COPY ./requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
Loading