You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently each (Stetl-based) ETL process like Top10nl, BRK, BGT etc has its own config/execution mode etc. At the same time all are very similar. Also for a user it is hard to grasp how to perform a specific ETL. This also makes Dockerization harder to develop.
The following needs/can be done to restructure the repo and its (Stetl-based) ETL processes:
move each process to its own (sub)directory named after the Basisregistratie: e.g. brt/top10, brk/dkk. Call each a "Project" (or "Process")
have for each Project a consistent dir/file-naming , e.g. Stetl config files e.g. brt/top10/etl/config/default.cfg, gfs files etc.
have a single script at the top dir like nlextract.sh (or nlextract.py maybe to be cross platform?)
each Project/Process will have a default argument-file and a possibly host-named args file.
allow the user to easily override default options like database host, and other credentials
Something like
nlextract.sh -p brt/top250 -a brt/top250/options/default.args -a /home/me/nlx/top250.args
Only problem is how to deal with the BAG, which is not Stetl-based and has more extended commandline options. Possibly the default "convert to PostGIS" can be performed by nlextract.sh|py.
The text was updated successfully, but these errors were encountered:
Stetl (master/latest) ondersteunt nu multiple -a opties. Zie voorbeeld gebruik in top10nl (README): https://github.com/nlextract/NLExtract/tree/master/brt/top10nl/etl . Tevens filenamen gestandaardiseerd, default.args (allowed nu in .gitignore maar niet andere .args bestanden) heeft alle default args, eigen .args hoeft alleen wijzigingen daarop te bevatten bijv alleen DB credentials.
Currently each (Stetl-based) ETL process like Top10nl, BRK, BGT etc has its own config/execution mode etc. At the same time all are very similar. Also for a user it is hard to grasp how to perform a specific ETL. This also makes Dockerization harder to develop.
The following needs/can be done to restructure the repo and its (Stetl-based) ETL processes:
brt/top10, brk/dkk
. Call each a "Project" (or "Process")brt/top10/etl/config/default.cfg
, gfs files etc.nlextract.sh
(ornlextract.py
maybe to be cross platform?)argument-file
and a possibly host-named args file.Something like
For Stetl an issue has been opened to allow multiple
-a
args.Only problem is how to deal with the BAG, which is not Stetl-based and has more extended commandline options. Possibly the default "convert to PostGIS" can be performed by
nlextract.sh|py
.The text was updated successfully, but these errors were encountered: