-
Notifications
You must be signed in to change notification settings - Fork 3
/
fastText_run_log.txt
44 lines (42 loc) · 1.97 KB
/
fastText_run_log.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
1. Convert dataset for fastText classifier:
python Conv2FastText.py Datasets/CnonC_train.txt 0 > Datasets/CnonC_train_ft.txt
python Conv2FastText.py Datasets/CnonC_test.txt 0 > Datasets/CnonC_test_ft.txt
2. Refer to: https://medium.com/@ageitgey/text-classification-is-your-new-secret-weapon-7ca4fad15788
2.1 Train the model for classifying CnonC dataset:
$ mkdir Out
$ fastText/fasttext supervised -input Datasets/CnonC_train_ft.txt -output Out/CnonC_model
Read 0M words
Number of words: 1110
Number of labels: 2
Progress: 101.1% words/sec/thread: 10862 lr: -0.001143 loss: 0.692283 ETA: 0Progress: 100.0% words/sec/thread: 10815 lr: 0.000000 loss: 0.692283 ETA: 0h 0m
2.2 Test the classification performance:
$ fastText/fasttext test Out/CnonC_model.bin Datasets/CnonC_test_ft.txt
N 100
P@1 0.85
R@1 0.85
Number of examples: 100
3. After removing punctuations in the dataset, run again:
(1) Use default options:
$ fastText/fasttext supervised -input Datasets/CnonC_train_ft.txt -output Out/CnonC_model
Read 0M words
Number of words: 1087
Number of labels: 2
Progress: 100.3% words/sec/thread: 10047 lr: -0.000306 loss: 0.348531 ETA: 0Progress: 100.0% words/sec/thread: 10011 lr: 0.000000 loss: 0.348531 ETA: 0h 0m
$ fastText/fasttext test Out/CnonC_model.bin Datasets/CnonC_test_ft.txt
N 100
P@1 0.9
R@1 0.9
Number of examples: 100
# The performance jump from 0.85 to 0.9!!!
(2) Use different options from fastText:
$ fastText/fasttext supervised -input Datasets/CnonC_train_ft.txt -output Out/CnonC_model -lr 1.0 -epoch 25
Read 0M words
Number of words: 1087
Number of labels: 2
Progress: 100.3% words/sec/thread: 10047 lr: -0.000306 loss: 0.348531 ETA: 0Progress: 100.0% words/sec/thread: 10011 lr: 0.000000 loss: 0.348531 ETA: 0h 0m
$ fastText/fasttext test Out/CnonC_model.bin Datasets/CnonC_test_ft.txt
N 100
P@1 0.9
R@1 0.9
Number of examples: 100
# Did not change