Skip to content

Commit

Permalink
add gui and pyinstaller package on release. version to 3.0
Browse files Browse the repository at this point in the history
  • Loading branch information
gcerretani committed Oct 14, 2024
1 parent b6b2afc commit b9e54ac
Show file tree
Hide file tree
Showing 6 changed files with 317 additions and 78 deletions.
48 changes: 48 additions & 0 deletions .github/workflows/pyinstaller.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: Build executables

on:
push:
tags:
- 'v*'

jobs:
build:
name: Build executables

strategy:
matrix:
runs-on: ['ubuntu-240.04', 'macos-11', 'windows-2019']
runs-on: ${{ matrix.runs-on }}

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.7

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install pyinstaller
- name: Build executable with PyInstaller
run: |
pyinstaller --onefile antenati_gui.py
- name: Upload artifact for Windows
if: runner.os == 'Windows'
uses: actions/upload-artifact@v3
with:
name: antenati_gui_windows.exe
path: dist/antenati_gui.exe

- name: Upload artifact for macOS and Ubuntu
if: runner.os != 'Windows'
uses: actions/upload-artifact@v3
with:
name: antenati_gui_${{ matrix.runs-on }}
path: dist/antenati_gui
12 changes: 10 additions & 2 deletions .github/workflows/pythonapp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,28 @@ on: [push]

jobs:
test:
name: Test

strategy:
matrix:
runs-on: ['ubuntu-20.04', 'macos-11', 'windows-2019']

runs-on: ${{ matrix.runs-on }}

steps:
- uses: actions/checkout@v2
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python 3.7
uses: actions/setup-python@v2
with:
python-version: 3.7

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements.txt
- name: Download test
run: |
python antenati.py "https://www.antenati.san.beniculturali.it/ark:/12657/an_ua19944535/w9DWR8x"
40 changes: 4 additions & 36 deletions PRINCIPIANTI.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,7 @@
# Istruzioni per principianti

## Windows
### Installare Python
Occorre Python 3 almeno alla versione 3.6. Il modo più veloce è passare dal Microsoft Store. Potete aprirlo e cercare "Python 3.10", o per semplicità [cliccare qui](https://www.microsoft.com/it-it/p/python-310/9pjpw5ldxlz5).
Scaricate la versione GUI dagli artefatti dell'ultima release. Trovate le versioni per Windows, Linux e macOS.

### Scaricare questo repository
Potete scaricare il contenuto di questo repository da [qui](https://github.com/gcerretani/antenati/archive/refs/heads/master.zip). Estraetene il contenuto, che dovrebbe chiamarsi **antenati-master**, da qualche parte, per esempio nella cartella dei Documenti.

### Aprire un terminale
Aprite un terminale. La PowerShell è la soluzione più semplice e moderna: cercate "Windows PowerShell" dal menu start ed apritela. Per cambiare la cartella di lavoro a quella dove avete scaricato il contenuto di questo repository, eseguite:

cd $env:HOMEPATH\Documents\antenati-master

Controllate di essere nella cartella giusta. Eseguite:

ls

e guardate che ci sia il contenuto di questo repository.

### Installare le dipendenze
Quindi, eseguite:

pip install -r requirements.txt

Dovrebbe impiegare qualche secondo. Questa cosa va fatta solamente la prima volta, e serve a installare le dipendenze di questo programma. Le volte successive potete saltare questo passaggio

### Via!
Adesso siete pronti. Provate a scaricare un album copiando l'URL della pagina del Portale Antenati dopo a `python3 antenati.py`. Supponendo che siate interessati ai nati a Viareggio nel 1808, dovreste eseguire una cosa del genere:

python3 antenati.py https://antenati.cultura.gov.it/ark:/12657/an_ua19944535/w9DWR8x

Buon divertimento!

## Linux
TODO

## macos
TODO
0. Lanciate l'eseguibile!
1. Come URL inserite qualcosa tipo https://antenati.cultura.gov.it/ark:/12657/an_ua19944535/w9DWR8x.
2. Poi Selezionate una cartella di destinazione. Il programma scaricherà il contenuto in una sottocartella con un nome tipo *archivio-di-stato-di-lucca-stato-civile-napoleonico-viareggio-1807-nati-19944549*.
24 changes: 19 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,32 @@ A tool to download data from the *[Portale Antenati](http://antenati.cultura.gov

Since the website tends to be pretty slow in the evening, we present a script to help the retrieval of the documents for your family tree. The script allows you to download **all the images of any archive at the same time**, without any human action. Just launch the script and have a coffee while it downloads all the stuff for you.

## Requirements
## GUI version

Just get the executable from the release artifacts, and have fun!

#### Example:
In the website, navigate to the archive you want to download. For example, for the people born in Viareggio in 1807 you should find the page:

[https://antenati.cultura.gov.it/ark:/12657/an_ua19944535/w9DWR8x](https://antenati.cultura.gov.it/ark:/12657/an_ua19944535/w9DWR8x)

Then, copy the link to the first page, and paste it in the Archive URL field of the windows. Them, specify a destination folder:
the results will be placed there, in a new subfolder named *archivio-di-stato-di-lucca-stato-civile-napoleonico-viareggio-1807-nati-19944549*.

## CLI version

### Requirements
The software is written in Python 3 and tested with Python 3.7. On Windows the version on the Microsoft Store is fine, on Linux use your distribution package manager.

## Usage
### Usage
Open your preferite terminal and change directory to where you've extracted the content of this repo. Then execute the following commands.

### Install the dependencies
#### Install the dependencies
The first time you will have to install the dependencies:

pip install -r requirements.txt

### Run
#### Run
To download the images of a gallery, execute the script passing the URL of a collection you want to download as argument:

python3 antenati.py <URL of the album>
Expand All @@ -23,7 +37,7 @@ The files will be downloaded to a new folder named as *ARCHIVE-PLACE-YEAR-TYPE-I

python3 antenati.py -h

### Example:
#### Example:
In the website, navigate to the archive you want to download. For example, for the people born in Viareggio in 1807 you should find the page:

[https://antenati.cultura.gov.it/ark:/12657/an_ua19944535/w9DWR8x](https://antenati.cultura.gov.it/ark:/12657/an_ua19944535/w9DWR8x)
Expand Down
82 changes: 47 additions & 35 deletions antenati.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,35 +6,49 @@
__author__ = 'Giovanni Cerretani'
__copyright__ = 'Copyright (c) 2022, Giovanni Cerretani'
__license__ = 'MIT License'
__version__ = '2.5'
__version__ = '3.0'
__contact__ = 'https://gcerretani.github.io/antenati/'

from argparse import ArgumentDefaultsHelpFormatter, ArgumentParser
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass
from email.message import EmailMessage
from json import loads
from mimetypes import guess_extension
from os import chdir, mkdir, path
from pathlib import Path
from re import findall, search
from typing import Any, Dict, List
from typing import Any, Callable, Dict, List, Optional

from certifi import where
from urllib3 import HTTPResponse, HTTPSConnectionPool, PoolManager, make_headers
from urllib3 import HTTPSConnectionPool, PoolManager, make_headers
from click import echo, confirm
from slugify import slugify
from humanize import naturalsize
from tqdm import tqdm


_UpdaterType = Optional[Callable[[], None]]

@dataclass
class ProgressBar:
set_total: Callable[[int], None]
update: Callable[[], None]


DEFAULT_N_THREADS: int = 8
DEFAULT_N_CONNECTIONS: int = 4


class AntenatiDownloader:
"""Downloader class"""

url: str
archive_id: str
manifest: Dict[str, Any]
canvases: List[Dict[str, Any]]
dirname: str
dirname: Path
gallery_length: int
gallery_size: int

def __init__(self, url: str, first: int, last: int):
self.url = url
Expand All @@ -43,7 +57,6 @@ def __init__(self, url: str, first: int, last: int):
self.canvases = self.manifest['sequences'][0]['canvases'][first:last]
self.dirname = self.__generate_dirname()
self.gallery_length = len(self.canvases)
self.gallery_size = 0

@staticmethod
def __http_headers() -> Dict[str, Any]:
Expand Down Expand Up @@ -87,7 +100,7 @@ def __get_iiif_manifest(url: str) -> Dict[str, Any]:
cert_reqs='CERT_REQUIRED',
ca_certs=where()
)
http_reply: HTTPResponse = pool.request('GET', url)
http_reply = pool.request('GET', url)
if http_reply.status != 200:
raise RuntimeError(f'{url}: HTTP error {http_reply.status}')
content_type = AntenatiDownloader.__parse_header(http_reply.headers['Content-Type'])
Expand All @@ -112,12 +125,12 @@ def __get_metadata_content(self, label: str) -> str:
except StopIteration as exc:
raise RuntimeError(f'Cannot get {label} from manifest') from exc

def __generate_dirname(self) -> str:
def __generate_dirname(self) -> Path:
"""Generate directory name from info in IIIF manifest"""
archive_context = self.__get_metadata_content('Contesto archivistico')
archive_year = self.__get_metadata_content('Titolo')
archive_typology = self.__get_metadata_content('Tipologia')
return slugify(f'{archive_context}-{archive_year}-{archive_typology}-{self.archive_id}')
return Path(slugify(f'{archive_context}-{archive_year}-{archive_typology}-{self.archive_id}'))

def print_gallery_info(self) -> None:
"""Print IIIF gallery info"""
Expand All @@ -127,11 +140,16 @@ def print_gallery_info(self) -> None:
print(f'{label:<25}{value}')
print(f'{self.gallery_length} images found.')

def check_dir(self) -> None:
def check_dir(self, dirname: Optional[str] = None, interactive = True) -> None:
"""Check if directory already exists and chdir to it"""
if dirname is not None:
self.dirname = Path(dirname) / self.dirname
print(f'Output directory: {self.dirname}')
if path.exists(self.dirname):
echo(f'Directory {self.dirname} already exists.')
msg = f'Directory {self.dirname} already exists.'
if not interactive:
raise RuntimeError(msg)
echo(msg)
confirm('Do you want to proceed?', abort=True)
else:
mkdir(self.dirname)
Expand All @@ -140,7 +158,7 @@ def check_dir(self) -> None:
@staticmethod
def __thread_main(pool: HTTPSConnectionPool, canvas: Dict[str, Any]) -> int:
url = canvas['images'][0]['resource']['@id']
http_reply: HTTPResponse = pool.request('GET', url)
http_reply = pool.request('GET', url)
if http_reply.status != 200:
raise RuntimeError(f'{url}: HTTP error {http_reply.status}')
content_type = AntenatiDownloader.__parse_header(http_reply.headers['Content-Type'])
Expand Down Expand Up @@ -169,29 +187,23 @@ def __pool(maxsize: int) -> HTTPSConnectionPool:
ca_certs=where()
)

@staticmethod
def __progress(total: int) -> tqdm:
return tqdm(total=total, unit='img')
def run_cli(self, n_workers: int, n_connections) -> int:
"""Main function spanning run function in a thread pool, with tqdm progress bar"""
with tqdm(unit='img') as progress:
progress_bar = ProgressBar(progress.reset, progress.update)
return self.run(n_workers, n_connections, progress_bar)

def run(self, n_workers: int, n_connections: int) -> None:
def run(self, n_workers: int, n_connections: int, progress: ProgressBar) -> int:
"""Main function spanning run function in a thread pool"""
with self.__executor(n_workers) as executor, self.__pool(n_connections) as pool:
future_img = {executor.submit(self.__thread_main, pool, i): i for i in self.canvases}
with self.__progress(self.gallery_length) as progress:
for future in as_completed(future_img):
progress.update()
canvas = future_img[future]
label = canvas['label']
try:
size = future.result()
except RuntimeError as exc:
progress.write(f'{label} error ({exc})')
else:
self.gallery_size += size

def print_summary(self) -> None:
"""Print summary"""
print(f'Done. Total size: {naturalsize(self.gallery_size)}')
progress.set_total(self.gallery_length)
gallery_size = 0
for future in as_completed(future_img):
progress.update()
size = future.result()
gallery_size += size
return gallery_size


def main() -> None:
Expand All @@ -204,8 +216,8 @@ def main() -> None:
formatter_class=ArgumentDefaultsHelpFormatter
)
parser.add_argument('url', metavar='URL', type=str, help='url of the gallery page')
parser.add_argument('-n', '--nthreads', type=int, help='max n. of threads', default=8)
parser.add_argument('-c', '--nconn', type=int, help='max n. of connections', default=4)
parser.add_argument('-n', '--nthreads', type=int, help='max n. of threads', default=DEFAULT_N_CONNECTIONS)
parser.add_argument('-c', '--nconn', type=int, help='max n. of connections', default=DEFAULT_N_THREADS)
parser.add_argument('-f', '--first', type=int, help='first image to download', default=0)
parser.add_argument('-l', '--last', type=int, help='first image NOT to download', default=None)
parser.add_argument('-v', '--version', action='version', version=__version__)
Expand All @@ -221,10 +233,10 @@ def main() -> None:
downloader.check_dir()

# Run
downloader.run(args.nthreads, args.nconn)
gallery_size = downloader.run_cli(args.nthreads, args.nconn)

# Print summary
downloader.print_summary()
print(f'Done. Total size: {naturalsize(gallery_size)}')


if __name__ == '__main__':
Expand Down
Loading

0 comments on commit b9e54ac

Please sign in to comment.