Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError running cell extraction with downloaded model #15

Open
creisle opened this issue Sep 14, 2021 · 6 comments
Open

TypeError running cell extraction with downloaded model #15

creisle opened this issue Sep 14, 2021 · 6 comments

Comments

@creisle
Copy link
Contributor

creisle commented Sep 14, 2021

I am probably doing something incorrect here but I am not sure what. I got everything to run up to and including the combine script, src/baseline/retriever/combine_retrieval.py. I then downloaded the models linked in the README and tried to run the table cell extraction step and I run into this issue (see trace below)

/projects/creisle_prj/creisle_scratch/FEVEROUS/venv/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
[INFO] 2021-09-14 12:58:40,249 - LogHelper - Log Helper set up
[INFO] 2021-09-14 12:58:41,154 - __main__ - Start extracting cells from Tables...

  0%|                                                  | 0/7890 [00:00<?, ?it/s]
100%|███████████████████████████████████| 7890/7890 [00:00<00:00, 259469.96it/s]
Ignored unknown kwargs option trim_offsets
Traceback (most recent call last):
  File "src/baseline/retriever/predict_cells_from_table.py", line 322, in <module>
    main()
  File "src/baseline/retriever/predict_cells_from_table.py", line 317, in main
    extract_cells_from_tables(annotations, args)
  File "src/baseline/retriever/predict_cells_from_table.py", line 260, in extract_cells_from_tables
    predictions =  (model_output.predictions > 0.25).astype(int)
TypeError: '>' not supported between instances of 'NoneType' and 'float'

Have you seen this before? Any idea what I might have done wrong?

System specs
OS: centos07
python: 3.7
conda or pip: pip

@Raldir
Copy link
Owner

Raldir commented Sep 17, 2021

Sorry for the delay! The processing step looks suspicious to me, it normally takes a bit of time (about 40 seconds for me) for the data to process. Could you check the input, e.g. by checking the first entries of text_test in l.250?

@creisle
Copy link
Contributor Author

creisle commented Sep 20, 2021

I ran it again and printed out the first entry of text_test. It is odd, it looks like its trying to run a batch of zero. Something must be wrong with one of the inputs but I didn't see any upstream errors so far.

Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.
[INFO] 2021-09-20 13:35:23,636 - LogHelper - Log Helper set up
[INFO] 2021-09-20 13:35:24,767 - __main__ - Start extracting cells from Tables...

  0%|          | 0/7890 [00:00<?, ?it/s]
100%|██████████| 7890/7890 [00:00<00:00, 302936.25it/s]
***** Running Prediction *****
  Num examples = 0
  Batch size = 16
Encoding(num_tokens=2, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])
Traceback (most recent call last):
  File "src/baseline/retriever/predict_cells_from_table.py", line 323, in <module>
    main()
  File "src/baseline/retriever/predict_cells_from_table.py", line 318, in main
    extract_cells_from_tables(annotations, args)
  File "src/baseline/retriever/predict_cells_from_table.py", line 261, in extract_cells_from_tables
    predictions =  (model_output.predictions > 0.25).astype(int)
TypeError: '>' not supported between instances of 'NoneType' and 'float'
[Mon Sep 20 13:36:23 2021]
Error in rule extract_table_cells:
    jobid: 0
    output: data/dev.combined.not_precomputed.p5.s5.t3.cells.jsonl
    shell:
        source /projects/creisle_prj/creisle_scratch/FEVEROUS/venv3.7-unbuntu/bin/activate; python src/baseline/retriever/predict_cells_from_table.py --input_path data/dev.combined.not_precomputed.p5.s5.t3.jsonl --max_sent 5 --wiki_path data/feverous_wikiv1.db --model_path models/feverous_cell_extractor
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

@Raldir
Copy link
Owner

Raldir commented Sep 24, 2021

Yes, there is something wrong with the input. With the latest commit, I have added some small additional loggings -- they might help to find what's going on. Could you pull the current version and try again?

@creisle
Copy link
Contributor Author

creisle commented Sep 27, 2021

I gave this a try and ran into some namespace/path errors with the latest version (#18). I am trying this again with the path fixes I put in the latest PR

@creisle
Copy link
Contributor Author

creisle commented Oct 5, 2021

Sorry for the delay, This is the error message I get with the changes in main

[INFO] 2021-10-05 12:25:46,778 - __main__ - Start extracting cells from Tables...

  0%|          | 0/7890 [00:00<?, ?it/s]
100%|██████████| 7890/7890 [00:00<00:00, 289959.33it/s]
Traceback (most recent call last):
  File "src/feverous/baseline/retriever/predict_cells_from_table.py", line 325, in <module>
    main()
  File "src/feverous/baseline/retriever/predict_cells_from_table.py", line 320, in main
    extract_cells_from_tables(annotations, args)
  File "src/feverous/baseline/retriever/predict_cells_from_table.py", line 245, in extract_cells_from_tables
    logger.info('Sample entry: {}'.format(all_input[0]))

Should I try from the start? Does the new error help?

@Raldir
Copy link
Owner

Raldir commented Oct 6, 2021

Is that the end of the error message already? Does not tell us much beyond that the entries are not processed correctly. I would recommend doing a line by line check for l.237 to l.242 and check whether anno contains information. Maybe call anno.get_claim() and anno.get_evidence() to check the sanity of the annotation processor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants