When automatic subject indexing methods (multi-label classification) are applied on big and diverse data sets, the quality of indexing can be impaired by different factors. As a consequence, quality is typically controlled before using automatically generated information in operative information retrieval systems. The releasetool intends to assist libraries in such quality control workflows.
+---------+ +-----------------------+
|automatic| | releasetool |
|subject +--------->+ (quality assessment) |
|indexing | import | |
+---+-----+ release +----------+------------+
^ candidate |
| |
| documents | if quality == ok:
| without | release controlled subject terms
| controlled v
| subject terms +--------+--------+
+------------------+ library catalog |
| (used for IR) |
+-----------------+
The project aims to build a straight-forward web application as a starting point for quality assessment of automatic subject indexing. The intended audience of this project are software developers at libraries as well as librarians involved in automatic subject indexing projects.
Here you can find the current development version. A prototype of the tool has been presented under the title Towards Semantic Quality Control of Automatic Subject Indexing (Martin Toepfer & Christin Seifert, 2017) at TPDL-2017, Thessaloniki, Greece, September 18-21, 2017. That predecessor used semantic web technology as a database backend. By contrast to the prototype, the new releasetool has been reimplemented and revised substantially in several aspects. Semantic web technology is now only utilized as a service to access the thesaurus. The reimplementation has been performed in order to put more emphasize on business processes and system integration.
The releasetool is a project of ZBW's automatic subject indexing working group. It has been written by Martin Toepfer, 2017-2018. Please visit our website for more information.
ZBW -- Leibniz Information Centre for Economics
The releasetool has been developed for the ZBW, hence testing and application yet focused on economics and the STW. Interfaces and configuration options are, however, implemented to apply the tool in different domains.
The code has not yet reached a stable state and coding style has been handled tolerantly.
In a nutshell:
- review:
- assess quality at the document-level (complete topic representation)
- assess quality at the subject-level (give feedback for each single suggested subject)
- add missing subjects, using auto-suggest lists based on your thesaurus
- monitor and maintain (admin):
- upload automatic subject indexing results (release candidate files)
- compare reviews
- create new collections
- explain:
- analyse errors, have a look at which method suggested a specific subject and how confident it was
Several operations for release processes are supported with basic default implementations, which can be customized. Some import/export features are implemented, so that you can use your favorite tools, like for example R or standard office software suites, to create more complex samples with fine-grained control, and then import the samples as collections easily via the admin interface or at the command line using commands.
- Technical documentation
- User documentation for the two main types of users:
The easiest way to run releasetool is via docker.
- Check out the repository.
- Copy
zaptain_ui/settings_template.py
tozaptain_ui/settings_local.py
and adjust the File to your needs. Especially change theSECRET_KEY
variable and remove the line below it. One possibility to create a secret is< /dev/urandom tr -dc _A-Z-a-z-0-9 | head -c${1:-64};echo;
. The database file is defined in theDATABASES
setting. The database file should be placed somewhere in/rt_data
(because this folder is created in the Dockerfile for that purpose). E.g.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': '/rt_data/rt_v1alpha.sqlite',
}
}
The release candidates are placed in $MEDIA_ROOT/releasecandidates
. MEDIA_ROOT
should be a relative path to /releasetool
, e.g. MEDIA_ROOT=rt_data
.
3. Run docker build -t rt .
from inside the base directory of this repository.
4. Run docker run -it --rm -p 8085:8000 rt
.
To persists the data use docker volumes.
/rt_data
should be bound to a local folder in order to persist the database.
/releasetool/$MEDIA_ROOT/
should be bound to a local folder in order to persist the release candidates.
It should be bound to a local folder.
Furthermore you should expose port 8000 of the container.
5. Access the UI in a browser at http://<your_host>:<port_defined_above>/releasetool
6. Login using the credentials rtadmin
with password changeme
.
7. Click on Admin
in the upper right corner.
8. Change the password. This functionality is again available in the upper right corner.
9. Inside the admin view select Rt configs
10. Now change all available keys to your desired values.
This can be done by clicking on them, then hit SAVE
in the lower left corner.
Key | Description | Example |
---|---|---|
catalog API pattern | URL where the tool can get metadata from. Needs to be parametrized with `{docid}. | `https://api.econbiz.de/v1/{docid} |
document weblink pattern | URL where the user can access the publication. Needs to be parametrized with {docid} |
https://www.econbiz.de/Record/{docid} |
name of the main automatic indexing method | Name of the indexing method that has generated the suggestions. This needs to be the same as the one you set for your imported collcection. | zaptain-rules |
support email | Email where users can reach support | [email protected] |
thesaurus category type | Type for thesausurus category. Used to structure the graph view. | http://zbw.eu/namespaces/zbw-extensions/Thsys |
thesaurus descriptor type | Type for thesaurus concepts. | http://zbw.eu/namespaces/zbw-extensions/Descriptor |
thesaurus sparql endpoint | URL where the thesaurus can be queried using SPARQL | http://zbw.eu/beta/sparql/stw/query |
thesaurus sparql query | Not used |
This code was created as part of the subject indexing automatization effort at ZBW – Leibniz Information Centre for Economics. See our homepage for more information, publications, and contact details. The releasetool has been created by Martin Toepfer, 2017–2018.