TypeError: object of type 'int' has no len() #53

masakuri · 2017-12-11T05:08:50Z

When I trained with English train/dev files, it worked.
But when I trained with Japanese train/dev files (and set pre-trained Japanese word embeddings file), I got the following error.

  File "build/bdist.linux-x86_64/egg/deepcrf/__init__.py", line 66, in train
  File "build/bdist.linux-x86_64/egg/deepcrf/main.py", line 98, in run
  File "build/bdist.linux-x86_64/egg/deepcrf/util.py", line 102, in read_conll_file
TypeError: object of type 'int' has no len()

I want to set pre-trained Japanese char embeddings file, but it looks like there is not --char_emb_file option.
I am wondering if this is the cause of the error.
Does it support Japanese train/dev file (or --char_emb_file option) ?
Thank you.

The text was updated successfully, but these errors were encountered:

masakuri · 2017-12-11T05:31:09Z

I'm sorry, I typed incorrect command.
~~The error was solved.~~
I still have same error...

aonotas · 2017-12-11T06:37:43Z

Ok, please let me know your command.

masakuri · 2017-12-11T06:42:31Z

$ deep-crf train input_train_jp.txt --delimiter=" " --dev_file input_dev_jp.txt --save_dir save_jpmodel_dir --save_name bilstm-cnn-crf_adam_jp --optimizer adam --word_emb_file jp_word_emb300.txt --word_emb_vocab_type replace_only --gpu 0

Thank you.

aonotas · 2017-12-11T06:53:33Z

I think this error since your training file format input_train_jp.txt is wrong.
Invalid input feature sizes.

I just fix code, please use recent version and please let me know the result.
I think input_train_jp.txt should be:

彼 O
は O
オバマ大統領 S-PERSON
です O

彼 O
は O

masakuri · 2017-12-11T07:12:35Z

I got the following error.
ValueError: Invalid input feature sizes: "3". Please check at line [1298]

I checked at line 1298 in input_train_jp.txt and I understood that the "word" has space like:

ほげ[space]ほげ[space]O

"ほげ[space]ほげ" is proper noun.

Thank you for your help to know this error cause.
Is it OK to solve this problem by using --delimiter="\t" and input_train_jp.txt format is like ほげ[space]ほげ[tab]O ?

masakuri · 2017-12-11T07:19:06Z

I fix input_train_jp.txt format and I run the command ($ deep-crf train input_train_jp.txt --delimiter="\t" --dev_file input_dev_jp.txt --save_dir save_jpmodel_dir --save_name bilstm-cnn-crf_adam_jp --optimizer adam --word_emb_file jp_word_emb300.txt --word_emb_vocab_type replace_only --gpu 0), I got following error:

  File "build/bdist.linux-x86_64/egg/deepcrf/__init__.py", line 66, in train
  File "build/bdist.linux-x86_64/egg/deepcrf/main.py", line 102, in run
ValueError: Invalid training sizes: 0 sentences.

Any ideas?

aonotas · 2017-12-11T07:28:55Z

Is it OK to solve this problem by using --delimiter="\t" and input_train_jp.txt format is like ほげ[space]ほげ[tab]O ?

Yes! I think it is a good solution.

Each sentence must be split by a blank line (empty line \n) in input_train_jp.txt.

Note that you should put empty line (\n) between sentences. This format is called CoNLL format.

I mean if you have two sentences,

$ cat input_file.txt
Barack  B−PERSON 
Hussein I−PERSON 
Obama   E−PERSON
is      O 
a       O 
man     O 
.       O

Yuji   B−PERSON 
Matsumoto E−PERSON 
is     O 
a      O 
man    O 
.      O

masakuri · 2017-12-11T07:42:16Z

My input_train_jp.txt file has blank line ("\n") between sentences (more precisely, between tweets) but I got the error...

aonotas · 2017-12-11T08:00:24Z

Now your input_train_jp.txt seems following?

あああ[tab]O

あ[tab]O
い[tab]O
う[tab]O

お[space]お[tab]O
お[tab]O

masakuri · 2017-12-11T08:03:35Z

Now your input_train_jp.txt seems following?

あああ[tab]O

あ[tab]O
い[tab]O
う[tab]O

お[space]お[tab]O
お[tab]O

Yes.

aonotas · 2017-12-11T08:11:25Z

OK. Can you send me your input file via e-mail if you are ok.
nanigashi03[at] gmail.com

aonotas · 2017-12-11T08:22:40Z

Or, please try replace [tab] to [space] :

お[space]お   =>    お_お

[tab]   => [space]

and please use --delimiter=" ".

Maybe [tab] unicode causes this error?

masakuri · 2017-12-11T08:36:03Z

replace [tab] to [space]:

お[space]お => お_お

[tab] => [space]
use --delimiter=" "

It worked!!!
Thank you very much for your help!!!

aonotas · 2017-12-11T08:41:49Z

OK.
It seems our code or input format with [tab] will cause that error.

masakuri · 2017-12-11T08:47:00Z

I see. Thank you very much.
I changed the issue title to know the content.

masakuri changed the title ~~Does it support Japanese train/dev file (or --char_emb_file option) ?~~ TypeError: object of type 'int' has no len() Dec 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: object of type 'int' has no len() #53

TypeError: object of type 'int' has no len() #53

masakuri commented Dec 11, 2017

masakuri commented Dec 11, 2017 •

edited

Loading

aonotas commented Dec 11, 2017

masakuri commented Dec 11, 2017

aonotas commented Dec 11, 2017

masakuri commented Dec 11, 2017

masakuri commented Dec 11, 2017 •

edited

Loading

aonotas commented Dec 11, 2017 •

edited

Loading

masakuri commented Dec 11, 2017

aonotas commented Dec 11, 2017

masakuri commented Dec 11, 2017

aonotas commented Dec 11, 2017

aonotas commented Dec 11, 2017 •

edited

Loading

masakuri commented Dec 11, 2017

aonotas commented Dec 11, 2017

masakuri commented Dec 11, 2017

TypeError: object of type 'int' has no len() #53

TypeError: object of type 'int' has no len() #53

Comments

masakuri commented Dec 11, 2017

masakuri commented Dec 11, 2017 • edited Loading

aonotas commented Dec 11, 2017

masakuri commented Dec 11, 2017

aonotas commented Dec 11, 2017

masakuri commented Dec 11, 2017

masakuri commented Dec 11, 2017 • edited Loading

aonotas commented Dec 11, 2017 • edited Loading

masakuri commented Dec 11, 2017

aonotas commented Dec 11, 2017

masakuri commented Dec 11, 2017

aonotas commented Dec 11, 2017

aonotas commented Dec 11, 2017 • edited Loading

masakuri commented Dec 11, 2017

aonotas commented Dec 11, 2017

masakuri commented Dec 11, 2017

masakuri commented Dec 11, 2017 •

edited

Loading

masakuri commented Dec 11, 2017 •

edited

Loading

aonotas commented Dec 11, 2017 •

edited

Loading

aonotas commented Dec 11, 2017 •

edited

Loading