Coins parser

A simple parser for euro coins images.

Context

For a deep learning project, we need a data set of all different euro coins. The main idea is to generate a lot of different photographs in Blender (different lighting, camera angles, background, focus distances, etc.) For this we need to dynamically build all coins in Blender, and for this, we need textures of all these coins. Here we are, scraping all this images.

The sum of the monetary values of all EU coins is €137.92. The sum of the comemorative €2 coins is €872. This is without counting the real value on the numismatic market, where some coins can be worth several hundred euros on their own.

Description

This script can scrape image URLs from different websites, and download them. Some scrapers are already implemented, but you can easily add your own (see Add a scraper).

Currently, there are two scrapers implemented:

Images are downloaded in the following folder structure:

{root}/coins/{scraperDetails}/{countryCode}_{value}_{particularity}.{imageExtension}

Where:

{root} is the root folder given as argument (cf. here)
{countryCode} is the country code in two letters of the coin (e.g. fr for France, ad for Andorra, etc.)
{scraperName} can be null, (depending on the scraper settings, cf. Add a scraper), if null, images are downloaded in {root}/coins/
{value} is the value of the coin (e.g. 1euro, 2cents, etc.), cf. this list
{particularity} is the particularity of the coin (e.g. 2019, 2018, 2017, etc.) or null if there is only one coin for this country (cf. here)
{imageExtension} is the extension of the image (jpg, .png, etc.) from the scraped website.

Examples of images:

./coins/va_50cents_2017.jpg
./coins/lv_1euro.jpg
./coins/va_10cents_SedeVacante.jpg

Installation

It depends on the part of the script you want to execute. If you already get JSON files, and don't want to do the scraping part, you can simply run:

pip install -r requirements-downloader.txt

Else, you need to install playwright:

pip install -r requirements-scraper.txt

(requirements-scraper.txt) includes requirements-downloader.txt, so you don't need to install it twice.

Usages

If, for example, you want to provide the root folder argument, you can do it like this:

python ./main.py -r ./images

Arguments

This is the list of different arguments that can be passed to the command.

short	long	default	short description
-r	--root	`"./images"`	root folder where images will be downloaded, and JSON file created
-s	--scrape	`false`	if true, no image download, only JSON scraping
-h	--help		display help
-d	--debug	`false`	if true, debug mode (lot of logs)
-q	--quiet	`false`	if true, quiet mode (no output)

`-r`, `--root`

This is the root folder that contains the JSON file. All images will be downloaded in {root}/coins/ If this argument is not provided, the default value will be read from src/constants.py (cf. DEFAULT_ROOT_FOLDER).

`-s`, `--scrape`

If true, no image download, only JSON scraping. If a JSON file ({root}/{json}) file already exists, it will override it.

Add a scraper

I didn't have time to write docs about this, but you can see examples in ./src/scrapers/.

Your classes have to inherit from Scraper and implement the scraper method. The constructor has to have this line: super().__init__(self, args, logger, NAME, BASE_URL), where:

args is the arguments passed to the command (cf. here)
loger is the logger from the app (main.py)
NAME, name of the scraper. This is compulsory, and will be used to create the JSON file.
BASE_URL, base URL of the website.

The scrape has to return a dictionary with the following keys:

key	type	description
`countryCode`	`string`	country code in two letters, cf. this list
`value`	`string`	value of the coin, cf. this list
`url`	`string`	URL of the coin
`particularity`	`string` or `null`	particularity of the coin, cf. this list
`imageExtension`	`string` or `null`	extension of the image
`special_path`	`string` or `null`	special path of the image

The special path is used in the root file. If it is null (default), then the image will be downloaded in {root}/coins/. If it is not null, then the image will be downloaded in {root}/coins/{special_path}/.

JSON file

When data is scraped, a new JSON file is created containing data for each image. It must respect the schema defined in ./schema.json:

Example of JSON file:

[
    {
        "countryCode": "va",
        "value": "50cents",
        "particularity": "2017",
        "url": "https://www.ecb.europa.eu/euro/coins/html/va/50c_2017.en.html"
    },
]

Where:

field	type	description
`countryCode`	`string`	country code in two letters, cf. this list
`value`	`string`	value of the coin, cf. this list
`particularity`	`string` or `null`	particularity of the coin, cf. here
`url`	`uri` (`string`)	URL of the scraped website

Special coins

Some countries have only one coin since 2002, and some have more than one. For example, Vatican City has a new coin for each pope. So, for these countries, the particularity is saved. It can be a year, or a name. It is null if there is only one coin.

List of countries

You can find the list of countries here.

List of coins

You can find the list of all euro coins here.

Regular coins

coin values
`2euro`
`1euro`
`50cents`
`20cents`
`10cents`
`5cents`
`2cents`
`1cent`

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.vscode		.vscode
out		out
post-process		post-process
scrapers		scrapers
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coins parser

Table of Contents

Context

Description

Installation

Usages

Arguments

`-r`, `--root`

`-s`, `--scrape`

Add a scraper

JSON file

Special coins

List of countries

List of coins

Regular coins

About

Languages

photonsquid/Recoinize-scraper

Folders and files

Latest commit

History

Repository files navigation

Coins parser

Table of Contents

Context

Description

Installation

Usages

Arguments

-r, --root

-s, --scrape

Add a scraper

JSON file

Special coins

List of countries

List of coins

Regular coins

About

Resources

Stars

Watchers

Forks

Languages

`-r`, `--root`

`-s`, `--scrape`