-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a364fc8
commit e07457a
Showing
1 changed file
with
251 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,251 @@ | ||
# NER model | ||
|
||
This page will collect the Model Cards for NER in PyThaiNLP. | ||
|
||
## Thai NER | ||
|
||
### v1.4 | ||
|
||
**Model Details** | ||
|
||
- Developer: Wannaphong Phatthiyaphaibun | ||
- This report author: Wannaphong Phatthiyaphaibun | ||
- Model date: 2020-5-21 | ||
- Model version: 1.4 | ||
- Used in PyThaiNLP version: 2.2 + | ||
- Filename: `~/pythainlp-data/thai-ner-1-4.crfsuite` | ||
- CRF Model | ||
- License: CC0 | ||
- GitHub for Thai NER 1.4 (Data and train notebook): [https://github.com/wannaphong/thai-ner/tree/master/model/1.4](https://github.com/wannaphong/thai-ner/tree/master/model/1.4) | ||
|
||
**Intended Use** | ||
|
||
- Named-Entity Tagging for Thai. | ||
- Not suitable for other language or non-news domain. | ||
|
||
**Factors** | ||
|
||
- Based on known problems with thai natural Language processing. | ||
|
||
**Metrics** | ||
|
||
- Evaluation metrics include precision, recall and f1-score. | ||
|
||
**Training Data** | ||
|
||
ThaiNER 1.3 Corpus Train set | ||
|
||
**Evaluation Data** | ||
|
||
ThaiNER 1.3 Corpus Test set | ||
|
||
**Quantitative Analyses** | ||
|
||
``` | ||
precision recall f1-score support | ||
B-DATE 0.92 0.86 0.89 375 | ||
I-DATE 0.94 0.94 0.94 747 | ||
B-EMAIL 1.00 1.00 1.00 5 | ||
I-EMAIL 1.00 1.00 1.00 28 | ||
B-LAW 0.71 0.56 0.62 43 | ||
I-LAW 0.74 0.70 0.72 154 | ||
B-LEN 0.96 0.93 0.95 29 | ||
I-LEN 0.98 0.94 0.96 69 | ||
B-LOCATION 0.88 0.77 0.82 864 | ||
I-LOCATION 0.86 0.73 0.79 852 | ||
B-MONEY 0.98 0.85 0.91 105 | ||
I-MONEY 0.96 0.95 0.95 239 | ||
B-ORGANIZATION 0.90 0.78 0.84 1166 | ||
I-ORGANIZATION 0.84 0.77 0.81 1338 | ||
B-PERCENT 1.00 0.97 0.99 34 | ||
I-PERCENT 1.00 0.96 0.98 51 | ||
B-PERSON 0.96 0.82 0.88 676 | ||
I-PERSON 0.94 0.92 0.93 2424 | ||
B-PHONE 1.00 0.72 0.84 29 | ||
I-PHONE 0.96 0.92 0.94 78 | ||
B-TIME 0.87 0.73 0.79 172 | ||
I-TIME 0.94 0.83 0.88 336 | ||
B-URL 0.89 1.00 0.94 24 | ||
I-URL 0.96 1.00 0.98 371 | ||
B-ZIP 1.00 1.00 1.00 4 | ||
micro avg 0.91 0.84 0.87 10213 | ||
macro avg 0.93 0.87 0.89 10213 | ||
weighted avg 0.91 0.84 0.87 10213 | ||
samples avg 0.17 0.17 0.17 10213 | ||
``` | ||
|
||
**Ethical Considerations** | ||
|
||
- This model has bias from corpus creator. (Wannaphong Phatthiyaphaibun) | ||
- This model uses the part-of-speech model to build it, so It does have a bias from the part-of-speech model. | ||
|
||
|
||
**Caveats and Recommendations** | ||
|
||
- Thai text only | ||
|
||
|
||
### v1.5 | ||
|
||
**Model Details** | ||
|
||
- Developer: Wannaphong Phatthiyaphaibun | ||
- This report author: Wannaphong Phatthiyaphaibun | ||
- Model date: 2021-1-16 | ||
- Model version: 1.5 | ||
- Used in PyThaiNLP version: 2.3 + | ||
- Filename: `~/pythainlp-data/thai-ner-1-5-newmm-lst20.crfsuite` | ||
- CRF Model | ||
- License: CC0 | ||
- GitHub for Thai NER 1.5 (Data and train notebook): `thai-ner-1-5-newmm-lst20.ipynb` [https://github.com/wannaphong/thai-ner/tree/master/model/1.5](https://github.com/wannaphong/thai-ner/tree/master/model/1.5) | ||
|
||
**Intended Use** | ||
|
||
- Named-Entity Tagging for Thai. | ||
- Not suitable for other language or non-news domain. | ||
|
||
**Factors** | ||
|
||
- Based on known problems with thai natural Language processing. | ||
|
||
**Metrics** | ||
|
||
- Evaluation metrics include precision, recall and f1-score. | ||
|
||
**Training Data** | ||
|
||
ThaiNER 1.5 Corpus Train set (5089 sent) | ||
|
||
**Evaluation Data** | ||
|
||
ThaiNER 1.5 Corpus Test set (1274 sent) | ||
|
||
**Quantitative Analyses** | ||
|
||
``` | ||
precision recall f1-score support | ||
B-DATE 0.93 0.82 0.87 350 | ||
I-DATE 0.95 0.94 0.95 665 | ||
B-LAW 0.85 0.54 0.66 87 | ||
I-LAW 0.85 0.64 0.73 253 | ||
B-LEN 1.00 0.75 0.86 12 | ||
I-LEN 1.00 0.69 0.82 26 | ||
B-LOCATION 0.81 0.70 0.75 620 | ||
I-LOCATION 0.74 0.72 0.73 533 | ||
B-MONEY 1.00 0.91 0.95 131 | ||
I-MONEY 0.99 0.95 0.97 321 | ||
B-ORGANIZATION 0.92 0.70 0.80 1334 | ||
I-ORGANIZATION 0.80 0.73 0.76 1198 | ||
B-PERCENT 0.94 0.88 0.91 17 | ||
I-PERCENT 0.91 0.95 0.93 22 | ||
B-PERSON 0.96 0.78 0.86 607 | ||
I-PERSON 0.94 0.88 0.91 2181 | ||
B-PHONE 1.00 0.50 0.67 2 | ||
I-PHONE 1.00 1.00 1.00 8 | ||
B-TIME 0.93 0.66 0.77 87 | ||
I-TIME 0.97 0.77 0.86 158 | ||
B-URL 0.91 0.83 0.87 12 | ||
I-URL 0.93 0.96 0.94 94 | ||
micro avg 0.89 0.79 0.84 8718 | ||
macro avg 0.92 0.79 0.84 8718 | ||
weighted avg 0.90 0.79 0.84 8718 | ||
samples avg 0.16 0.16 0.16 8718 | ||
``` | ||
|
||
**Ethical Considerations** | ||
|
||
- This model has bias from corpus creator. (Wannaphong Phatthiyaphaibun) | ||
- This model uses the part-of-speech model to build it, so It does have a bias from the part-of-speech model. | ||
|
||
**Caveats and Recommendations** | ||
|
||
- Thai text only | ||
|
||
## v1.5.1 | ||
|
||
**Model Details** | ||
|
||
- Developer: Wannaphong Phatthiyaphaibun | ||
- This report author: Wannaphong Phatthiyaphaibun | ||
- Model date: 2021-6-21 | ||
- Model version: 1.5.1 | ||
- Used in PyThaiNLP version: 2.4 + | ||
- Filename: `pythainlp/corpus/thainer_crf_1_5_1.model` | ||
- CRF Model | ||
- License: CC0 | ||
- GitHub for Thai NER 1.5.1 (Data and train notebook): [https://github.com/wannaphong/thai-ner/tree/master/model/1.5.1](https://github.com/wannaphong/thai-ner/tree/master/model/1.5.1) | ||
|
||
**Intended Use** | ||
|
||
- Named-Entity Tagging for Thai. | ||
- Not suitable for other language or non-news domain. | ||
|
||
**Factors** | ||
|
||
- Based on known problems with thai natural Language processing. | ||
|
||
**Metrics** | ||
|
||
- Evaluation metrics include precision, recall and f1-score. | ||
|
||
**Training Data** | ||
|
||
ThaiNER 1.5 Corpus Train set (5089 sent) | ||
|
||
**Evaluation Data** | ||
|
||
ThaiNER 1.5 Corpus Test set (1274 sent) | ||
|
||
**Quantitative Analyses** | ||
|
||
``` | ||
precision recall f1-score support | ||
B-DATE 0.93 0.81 0.87 350 | ||
I-DATE 0.94 0.94 0.94 665 | ||
B-LAW 0.85 0.54 0.66 87 | ||
I-LAW 0.87 0.65 0.74 253 | ||
B-LEN 1.00 0.75 0.86 12 | ||
I-LEN 1.00 0.69 0.82 26 | ||
B-LOCATION 0.80 0.70 0.75 620 | ||
I-LOCATION 0.75 0.72 0.73 533 | ||
B-MONEY 1.00 0.90 0.95 131 | ||
I-MONEY 0.99 0.94 0.97 321 | ||
B-ORGANIZATION 0.91 0.70 0.79 1334 | ||
I-ORGANIZATION 0.80 0.73 0.76 1198 | ||
B-PERCENT 0.94 0.88 0.91 17 | ||
I-PERCENT 0.91 0.95 0.93 22 | ||
B-PERSON 0.96 0.78 0.86 607 | ||
I-PERSON 0.94 0.88 0.91 2181 | ||
B-PHONE 1.00 0.50 0.67 2 | ||
I-PHONE 1.00 1.00 1.00 8 | ||
B-TIME 0.93 0.66 0.77 87 | ||
I-TIME 0.97 0.77 0.86 158 | ||
B-URL 0.91 0.83 0.87 12 | ||
I-URL 0.93 0.96 0.94 94 | ||
micro avg 0.89 0.79 0.84 8718 | ||
macro avg 0.92 0.79 0.84 8718 | ||
weighted avg 0.89 0.79 0.84 8718 | ||
samples avg 0.16 0.16 0.16 8718 | ||
``` | ||
|
||
**Ethical Considerations** | ||
|
||
- This model has bias from corpus creator. (Wannaphong Phatthiyaphaibun) | ||
- This model uses the part-of-speech model to build it, so It does have a bias from the part-of-speech model. | ||
|
||
**Caveats and Recommendations** | ||
|
||
- Thai text only | ||
|
||
### v2.0 | ||
|
||
Host: [https://huggingface.co/pythainlp/thainer-corpus-v2-base-model](https://huggingface.co/pythainlp/thainer-corpus-v2-base-model) |