Deploy CCC

Dependencies

Change bee/conf.py content with content of bee/conf_local.py
Change ccc/conf_spacin.py content with content of ccc/conf_spacin_local.py
Shell (from ccc/scripts): python3 -m script.ccc.run_bee. It creates a folder called test in the same folder scripts.
OUTPUT JSON: scripts/test/share/ref/todo
ERRORS: scripts/test/index/ref/issue

RUN SPACIN LOCALLY

INPUT JSON: scripts/test/share/ref/todo
OUTPUT RDF (dump): scripts/ccc/
Run Blazegraph: java -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -server -Xmx1g -Djetty.port=9999 -Dbigdata.propertyFile=ccc.properties -jar blazegraph.jar
Run: python3 -m script.ccc.run_spacin

Empty scripts/ccc/ BUT do not remove scripts/ccc/context.json
Remove scripts/ccc.jnl (quit the .jar first!)
If you want to rerun SPACIN on the same JSON files, move the content of scripts/test/share/ref/done into scripts/test/share/ref/todo

Other notes:

BEE and SPACIN have been enhanced in order to exploit respectively a CSV dataset generated with europe-pubmed-central-dataset tool and papendex tool.

(BEE) in scripts/script/bee/conf.py there are:
- PARALLEL_PROCESSING: set to True in order to enable the improvement made
- dataset_reference: absolute reference to the CSV generated
- article_path_reference: absolute reference to the directory where all the XML articles are stored
- n_process: the number of processes that will be spawned.
- doc_for_process: the CSV will be splitted in a number of chunks (one for each process), having the number of docs specified here
(SPACIN) in script/ccc/conf_spacin.py there are:
- crossref_query_interface_type: set to 'local' if you want to exploit the local index, otherwise 'remote'
- orcid_query_interface_type = set to 'local' if you want to exploit the local index, otherwise 'remote'