-
Notifications
You must be signed in to change notification settings - Fork 68
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #154 from informagi/hasibi-patch-2
Update README.md
- Loading branch information
Showing
1 changed file
with
32 additions
and
35 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,41 +12,11 @@ REL utilizes *English* Wikipedia as a knowledge base and can be used for the fol | |
- **Entity linking (EL)**: Given a text, the system outputs a list of mention-entity pairs, where each mention is a n-gram from text and each entity is an entity in the knowledge base. | ||
- **Entity Disambiguation (ED)**: Given a text and a list of mentions, the system assigns an entity (or NIL) to each mention. | ||
|
||
Documentation: <https://rel.readthedocs.io> | ||
|
||
To install: | ||
|
||
```console | ||
pip install radboud-el | ||
``` | ||
|
||
The source code is available from [Github](https://github.com/informagi/REL). | ||
**Documentation available at <https://rel.readthedocs.io>** | ||
|
||
Suggestions, improvements, and edits are most welcome. | ||
|
||
|
||
## REL variants | ||
|
||
REL comes in two variants for identifying entity mentions: | ||
|
||
- **Case-sensitive**: This setup is suitable for properly written texts (e.g., news articles) and is the default setup of the REL package. In this setup, we use the `ner-fast` FLAIR model, which is case-sensitive. The results reported in the REL paper are based on this model. | ||
|
||
- **Case-insensitive**: This setup is well suited for noisy texts (e.g., queries), where entity mentions can be (often) lowercased. In this setup, we use the `ner-fast-with-lowercase` model, which is the `ner-fast` FLAIR architucture trained on randomly cased and uncased text. This variant is the default setup of our API. | ||
|
||
Below is a comparison of these two models on [CoNLL-2003 NER](https://www.clips.uantwerpen.be/conll2003/ner/) dataset. | ||
|
||
| Model | CoNLL-2003 test | F1 | | ||
| ------ | --------------- | -- | | ||
| `ner-fast` | original | 92.78 | | ||
| `ner-fast` | lower-cased | 58.42 | | ||
| `ner-fast` | random | 70.64 | | ||
| `ner-fast-with-lowercase` | original | 91.53| | ||
| `ner-fast-with-lowercase` | lower-cased | 89.73 | | ||
| `ner-fast-with-lowercase` | random | 89.66 | | ||
|
||
See [Notes on using custom models](https://rel.readthedocs.io/en/latest/tutorials/custom_models/) for further information on switiching between these variants. | ||
|
||
|
||
## Calling our API | ||
|
||
Users may access our API by using the example script below. | ||
|
@@ -71,11 +41,17 @@ ed_result = requests.post(API_URL, json={ | |
}).json() | ||
``` | ||
|
||
## Setup package | ||
## Installation | ||
|
||
This section describes how to deploy REL on a local machine and setup the API. If you want to do anything more than simply running our API locally, you can skip the Docker steps and continue with installation from source. | ||
|
||
### Option 1: Installation using Docker | ||
### Option 1: Installation using pip | ||
|
||
```console | ||
pip install radboud-el | ||
``` | ||
|
||
### Option 2: Installation using Docker | ||
|
||
First, download the necessary data; you need the generic files and a Wikipedia version (2014 or 2019) (see [Download](#download)). Extract them anywhere, we will bind the directories to the Docker container as volumes. | ||
|
||
|
@@ -119,7 +95,7 @@ docker build . -t informagi/rel | |
|
||
To run the API locally, use the same commands as mentioned in the previous section. | ||
|
||
### Option 2: Installation from source code | ||
### Option 3: Installation from source code | ||
|
||
Run the following command in a terminal to install REL: | ||
|
||
|
@@ -155,6 +131,27 @@ The remainder of the tutorials are optional and for users who wish to e.g. train | |
6. [REL as systemd service](https://rel.readthedocs.io/en/latest/tutorials/systemd_instructions/) | ||
7. [Notes on using custom models](https://rel.readthedocs.io/en/latest/tutorials/custom_models/) | ||
|
||
## REL variants | ||
|
||
REL comes in two variants for identifying entity mentions: | ||
|
||
- **Case-sensitive**: This setup is suitable for properly written texts (e.g., news articles) and is the default setup of the REL package. In this setup, we use the `ner-fast` FLAIR model, which is case-sensitive. The results reported in the REL paper are based on this model. | ||
|
||
- **Case-insensitive**: This setup is well suited for noisy texts (e.g., queries), where entity mentions can be (often) lowercased. In this setup, we use the `ner-fast-with-lowercase` model, which is the `ner-fast` FLAIR architucture trained on randomly cased and uncased text. This variant is the default setup of our API. | ||
|
||
Below is a comparison of these two models on [CoNLL-2003 NER](https://www.clips.uantwerpen.be/conll2003/ner/) dataset. | ||
|
||
| Model | CoNLL-2003 test | F1 | | ||
| ------ | --------------- | -- | | ||
| `ner-fast` | original | 92.78 | | ||
| `ner-fast` | lower-cased | 58.42 | | ||
| `ner-fast` | random | 70.64 | | ||
| `ner-fast-with-lowercase` | original | 91.53| | ||
| `ner-fast-with-lowercase` | lower-cased | 89.73 | | ||
| `ner-fast-with-lowercase` | random | 89.66 | | ||
|
||
See [Notes on using custom models](https://rel.readthedocs.io/en/latest/tutorials/custom_models/) for further information on switiching between these variants. | ||
|
||
## Efficiency of REL | ||
|
||
We measured the efficiency of REL on a per-document basis. We ran our API with 50 documents from AIDA-B with > 200 words, which is 323 (± 105) words and 42 (± 19) mentions per document. The results are added to the table below. | ||
|
@@ -190,7 +187,7 @@ If you are using REL, please cite the following paper: | |
|
||
## Contact | ||
|
||
If you find any bugs or experience difficulties when using REL, please create a issue on this Github page. If you have any specific questions with respect to our research with REL, please email [Mick van Hulst](mailto:[email protected]). | ||
If you find any bugs or experience difficulties when using REL, please create a issue on this Github page. If you have any specific questions with respect to our research with REL, please email [Faegheh Hasibi](mailto:[email protected]). | ||
|
||
## Acknowledgements | ||
|
||
|