Skip to content

Commit

Permalink
Merge pull request #154 from informagi/hasibi-patch-2
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
chriskamphuis authored Feb 8, 2023
2 parents 0018b57 + f6ea097 commit a61bfc0
Showing 1 changed file with 32 additions and 35 deletions.
67 changes: 32 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,41 +12,11 @@ REL utilizes *English* Wikipedia as a knowledge base and can be used for the fol
- **Entity linking (EL)**: Given a text, the system outputs a list of mention-entity pairs, where each mention is a n-gram from text and each entity is an entity in the knowledge base.
- **Entity Disambiguation (ED)**: Given a text and a list of mentions, the system assigns an entity (or NIL) to each mention.

Documentation: <https://rel.readthedocs.io>

To install:

```console
pip install radboud-el
```

The source code is available from [Github](https://github.com/informagi/REL).
**Documentation available at <https://rel.readthedocs.io>**

Suggestions, improvements, and edits are most welcome.


## REL variants

REL comes in two variants for identifying entity mentions:

- **Case-sensitive**: This setup is suitable for properly written texts (e.g., news articles) and is the default setup of the REL package. In this setup, we use the `ner-fast` FLAIR model, which is case-sensitive. The results reported in the REL paper are based on this model.

- **Case-insensitive**: This setup is well suited for noisy texts (e.g., queries), where entity mentions can be (often) lowercased. In this setup, we use the `ner-fast-with-lowercase` model, which is the `ner-fast` FLAIR architucture trained on randomly cased and uncased text. This variant is the default setup of our API.

Below is a comparison of these two models on [CoNLL-2003 NER](https://www.clips.uantwerpen.be/conll2003/ner/) dataset.

| Model | CoNLL-2003 test | F1 |
| ------ | --------------- | -- |
| `ner-fast` | original | 92.78 |
| `ner-fast` | lower-cased | 58.42 |
| `ner-fast` | random | 70.64 |
| `ner-fast-with-lowercase` | original | 91.53|
| `ner-fast-with-lowercase` | lower-cased | 89.73 |
| `ner-fast-with-lowercase` | random | 89.66 |

See [Notes on using custom models](https://rel.readthedocs.io/en/latest/tutorials/custom_models/) for further information on switiching between these variants.


## Calling our API

Users may access our API by using the example script below.
Expand All @@ -71,11 +41,17 @@ ed_result = requests.post(API_URL, json={
}).json()
```

## Setup package
## Installation

This section describes how to deploy REL on a local machine and setup the API. If you want to do anything more than simply running our API locally, you can skip the Docker steps and continue with installation from source.

### Option 1: Installation using Docker
### Option 1: Installation using pip

```console
pip install radboud-el
```

### Option 2: Installation using Docker

First, download the necessary data; you need the generic files and a Wikipedia version (2014 or 2019) (see [Download](#download)). Extract them anywhere, we will bind the directories to the Docker container as volumes.

Expand Down Expand Up @@ -119,7 +95,7 @@ docker build . -t informagi/rel

To run the API locally, use the same commands as mentioned in the previous section.

### Option 2: Installation from source code
### Option 3: Installation from source code

Run the following command in a terminal to install REL:

Expand Down Expand Up @@ -155,6 +131,27 @@ The remainder of the tutorials are optional and for users who wish to e.g. train
6. [REL as systemd service](https://rel.readthedocs.io/en/latest/tutorials/systemd_instructions/)
7. [Notes on using custom models](https://rel.readthedocs.io/en/latest/tutorials/custom_models/)

## REL variants

REL comes in two variants for identifying entity mentions:

- **Case-sensitive**: This setup is suitable for properly written texts (e.g., news articles) and is the default setup of the REL package. In this setup, we use the `ner-fast` FLAIR model, which is case-sensitive. The results reported in the REL paper are based on this model.

- **Case-insensitive**: This setup is well suited for noisy texts (e.g., queries), where entity mentions can be (often) lowercased. In this setup, we use the `ner-fast-with-lowercase` model, which is the `ner-fast` FLAIR architucture trained on randomly cased and uncased text. This variant is the default setup of our API.

Below is a comparison of these two models on [CoNLL-2003 NER](https://www.clips.uantwerpen.be/conll2003/ner/) dataset.

| Model | CoNLL-2003 test | F1 |
| ------ | --------------- | -- |
| `ner-fast` | original | 92.78 |
| `ner-fast` | lower-cased | 58.42 |
| `ner-fast` | random | 70.64 |
| `ner-fast-with-lowercase` | original | 91.53|
| `ner-fast-with-lowercase` | lower-cased | 89.73 |
| `ner-fast-with-lowercase` | random | 89.66 |

See [Notes on using custom models](https://rel.readthedocs.io/en/latest/tutorials/custom_models/) for further information on switiching between these variants.

## Efficiency of REL

We measured the efficiency of REL on a per-document basis. We ran our API with 50 documents from AIDA-B with > 200 words, which is 323 (± 105) words and 42 (± 19) mentions per document. The results are added to the table below.
Expand Down Expand Up @@ -190,7 +187,7 @@ If you are using REL, please cite the following paper:

## Contact

If you find any bugs or experience difficulties when using REL, please create a issue on this Github page. If you have any specific questions with respect to our research with REL, please email [Mick van Hulst](mailto:[email protected]).
If you find any bugs or experience difficulties when using REL, please create a issue on this Github page. If you have any specific questions with respect to our research with REL, please email [Faegheh Hasibi](mailto:[email protected]).

## Acknowledgements

Expand Down

0 comments on commit a61bfc0

Please sign in to comment.