-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
9 changed files
with
157 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
name: "Build and upload Docker image for releases" | ||
|
||
on: | ||
push: | ||
tags: ["*"] | ||
workflow_dispatch: | ||
|
||
jobs: | ||
build_and_push_docker_image: | ||
name: "Build Docker Image" | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: "Checkout" | ||
uses: actions/checkout@v4 | ||
|
||
- name: "Set up Docker Buildx" | ||
uses: docker/setup-buildx-action@v3 | ||
|
||
- name: "Login to Docker Hub 🐳" | ||
uses: docker/login-action@v3 | ||
with: | ||
username: ${{ secrets.DOCKER_USERNAME }} | ||
password: ${{ secrets.DOCKER_PASSWORD }} | ||
|
||
- name: "Add Docker metadata" | ||
id: meta | ||
uses: docker/metadata-action@v5 | ||
with: | ||
images: | | ||
timdanaos/app | ||
tags: | | ||
type=ref,event=tag | ||
type=ref,event=branch | ||
type=sha | ||
- name: "Publish Docker image" | ||
uses: docker/build-push-action@v5 | ||
with: | ||
context: . | ||
tags: | | ||
${{ steps.meta.outputs.tags }} | ||
push: true | ||
labels: ${{ steps.meta.outputs.labels }} | ||
cache-from: type=gha | ||
cache-to: type=gha,mode=max | ||
if: github.event_name == 'release' && startsWith(github.ref, 'refs/tags/') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
""" | ||
Oddpub is being actively developed where as rtransparent has stagnated. | ||
Oddpub implements parallelism and their interface does not easily allow working | ||
with objects in memory so we will use that to reduce IO overhead. | ||
The alternative would be to load the pdf file into memory (pdftools::pdf_data | ||
and then pass that into oddpub private functions). This would make it easier to | ||
manage the parallelism, troubleshoot, and define the interface but partially | ||
reinvents the wheel. | ||
""" | ||
|
||
import os | ||
from pathlib import Path | ||
|
||
import rpy2.robjects as ro | ||
from rpy2.robjects import pandas2ri | ||
from rpy2.robjects.packages import importr | ||
|
||
from osm.config import osm_config | ||
|
||
oddpub = importr("oddpub") | ||
future = importr("future") | ||
ro.r(f'Sys.setenv(VROOM_CONNECTION_SIZE = "{osm_config.vroom_connection_size}")') | ||
|
||
|
||
def oddpub_pdf_conversion( | ||
pdf_dir: Path, text_dir: Path, workers: int = len(os.sched_getaffinity(0)) | ||
): | ||
future.plan(future.multisession, workers=workers) | ||
oddpub.pdf_convert(str(pdf_dir), str(text_dir)) | ||
|
||
|
||
def oddpub_metric_extraction( | ||
text_dir: Path, workers: int = len(os.sched_getaffinity(0)) | ||
): | ||
future.plan(future.multisession, workers=workers) | ||
pdf_sentences = oddpub.pdf_load(f"{text_dir}/") | ||
open_data_results = oddpub.open_data_search(pdf_sentences) | ||
with (ro.default_converter + pandas2ri.converter).context(): | ||
metrics = ro.conversion.get_conversion().rpy2py(open_data_results) | ||
|
||
return metrics |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters