Skip to content

A small single page application to bulk-match names again database entities and export structured data

License

Notifications You must be signed in to change notification settings

EHRI/ehri-entity-matcher

Repository files navigation

EHRI Entity Matching Tool

A small (prototype) single page app which matches a list of names against a Solr index of entities (places, people, corporate bodies) and lets the user refine and export structured (CSV) data.

The Solr schema (on coredir) is not yet optimised and is quite slow to index due to the use of a Beider Morse filter analyser for the optional phonetic matching.

The datasets used are the Geonames allCountries.zip and this set of entities from the EHRI portal database.

You can set up a dev instance by running sudo docker-compose up from the project directory, which will start an instance of Solr 6.6.6 using the supplied core config on port 8984. You can then run the ./bin/ingest script to download the datasets and import them into Solr using, respectively, the CSV and JSON update handlers. Note: this will take quite some time since Solr has to ingest ~1.5GB of Geonames data.

Running the app on port 9000 can be done by starting the SBT shell and running the run command.

Screenshot

About

A small single page application to bulk-match names again database entities and export structured data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published