Questions #2

evangeliazve · 2022-11-12T15:34:10Z

Hi again,

Thank you again for this solution. I would be happy to share your work in way you wish. Pleas let me know.

I have the following questions :

Do you have the code that allows the usage of the RE Model Predictions ?
How can I take the output from the NER Model Inference and Feed this to the RE Model for training ?

Best Regards,
Evangelia Zve

sujitpal · 2022-11-12T16:42:18Z

Answers to your question:

Not directly, but you can adapt the Evaluation code (for example cell 29 in 05a-nyt-re-bert.ipynb) and the Preprocessing code to take a single sentence with entities, embed the PHRASE spans and encode them using the tokenizer into a batch size of 1.
You can do batch inference on the NER model by passing in sentences (again you would need to adapt the code, I didn't do it here). The output would be 2 or more entities detected in each sentence. You then generate multiple instances of the training sentence with exactly 2 entities in each. For example if your NER predicted entities (A, B, C) in the sentence, your training set for RE would be the sentence with entity pairs (A, B), (B, C), (A, C).

evangeliazve · 2022-11-12T17:32:20Z

Hello,

Thanks for your quick reply. It's clear.
I have one more question. Do you think is possible to save the Relation Extraction Model you proposed in Hugging Face ?

Best,
Evangelia Zve

sujitpal · 2022-11-12T19:14:22Z

I haven't tried it myself, but it should be possible with push_to_hub as detailed in this page -- https://huggingface.co/docs/transformers/v4.15.0/model_sharing

evangeliazve · 2022-11-14T19:17:49Z

Thank you very much for your help

evangeliazve · 2022-11-16T12:09:18Z

Hello,

"Not directly, but you can adapt the Evaluation code (for example cell 29 in 05a-nyt-re-bert.ipynb) and the Preprocessing code to take a single sentence with entities, embed the PHRASE spans and encode them using the tokenizer into a batch size of 1."

Regarding this, I cannot handle the preprocessing part as I need to define relationship to create the span_idxs
Should I create span ids for every possible relationship and then predict if it is actually a relationship or not ?

Thanks again

darebfh · 2023-07-03T11:00:40Z

Hello! I am also very interested in performing NER based on 🤗Transformers. I managed to adapt the code of 05-nyt-re-bert
for inference (see below). You can then retrieve the name of the relation class using id2label() with the predicted output.
@evangeliazve the span_idxs array does not contain relationships but simply the positions of the two spans containing the named entities.

model = BertForRelationExtraction.from_pretrained(os.path.join(MODEL_DIR, "ckpt-{:d}".format(epoch)), len(valid_relations))

input_object = {
    "tokens": ["But", "that", "spasm", "of", "irritation", "by", "a", "master", "intimidator", "was", "minor", "compared", "with", "what", "<S:PER>", "Bobby", "Fischer", "</S:PER>", ",", "the", "erratic", "former", "world", "chess", "champion", ",", "dished", "out", "in", "March", "at", "a", "news", "conference", "in", "Reykjavik", ",", "<O:LOC>", "Iceland", "</O:LOC>", "."]
}

def encode_data_inference(examples):
  tokenized_inputs = tokenizer(examples["tokens"],
                               is_split_into_words=True,
                               truncation=True,
                               return_tensors ="pt") # this is needed because for training, conversion to tensors is performed using the DataLoader
  span_idxs = []
  for input_id in tokenized_inputs.input_ids:
    tokens = tokenizer.convert_ids_to_tokens(input_id)
    print(tokens)
    span_idxs.append([
      [idx for idx, token in enumerate(tokens) if token.startswith("<S:")][0],
      [idx for idx, token in enumerate(tokens) if token.startswith("</S:")][0],
      [idx for idx, token in enumerate(tokens) if token.startswith("<O:")][0],
      [idx for idx, token in enumerate(tokens) if token.startswith("</O:")][0]
    ])
  tokenized_inputs["span_idxs"] = torch.from_numpy(np.array(span_idxs)) # manually create a tensor containing the span ids
  return tokenized_inputs

input = encode_data_inference(input_object)

with torch.no_grad():
    logits = model(**input).logits
    print(logits)
    predictions = torch.argmax(outputs.logits, dim=-1).cpu().numpy()
print(predictions)

Output:
['[CLS]', 'But', 'that', 'spa', '##sm', 'of', 'irritation', 'by', 'a', 'master', 'in', '##ti', '##mi', '##da', '##tor', 'was', 'minor', 'compared', 'with', 'what', '<S:PER>', 'Bobby', 'Fischer', '</S:PER>', ',', 'the', 'erratic', 'former', 'world', 'chess', 'champion', ',', 'dish', '##ed', 'out', 'in', 'March', 'at', 'a', 'news', 'conference', 'in', 'Rey', '##k', '##ja', '##vik', ',', '<O:LOC>', 'Iceland', '</O:LOC>', '.', '[SEP]']
tensor([[-4.2859, -0.3964, -0.4866, -1.7542, 6.7569, -5.2384, 0.4867, 2.8524,
2.5765]])
[6]

sujitpal · 2023-07-05T17:20:56Z

Hello,

"Not directly, but you can adapt the Evaluation code (for example cell 29 in 05a-nyt-re-bert.ipynb) and the Preprocessing code to take a single sentence with entities, embed the PHRASE spans and encode them using the tokenizer into a batch size of 1."

Regarding this, I cannot handle the preprocessing part as I need to define relationship to create the span_idxs Should I create span ids for every possible relationship and then predict if it is actually a relationship or not ?

Thanks again

Sorry for the delay in responding, looks like I missed this comment. And thanks for the nice example @darebfh ! Looks like it predicted an incorrect relationship id 6 which is location/neighborhood/neighborhood_of but given that there does not seem to be anything specifically defined for (person, ?, location) maybe this is the best it could do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions #2

Questions #2

evangeliazve commented Nov 12, 2022 •

edited

Loading

sujitpal commented Nov 12, 2022

evangeliazve commented Nov 12, 2022 •

edited

Loading

sujitpal commented Nov 12, 2022

evangeliazve commented Nov 14, 2022

evangeliazve commented Nov 16, 2022

darebfh commented Jul 3, 2023 •

edited

Loading

sujitpal commented Jul 5, 2023

Questions #2

Questions #2

Comments

evangeliazve commented Nov 12, 2022 • edited Loading

sujitpal commented Nov 12, 2022

evangeliazve commented Nov 12, 2022 • edited Loading

sujitpal commented Nov 12, 2022

evangeliazve commented Nov 14, 2022

evangeliazve commented Nov 16, 2022

darebfh commented Jul 3, 2023 • edited Loading

sujitpal commented Jul 5, 2023

evangeliazve commented Nov 12, 2022 •

edited

Loading

evangeliazve commented Nov 12, 2022 •

edited

Loading

darebfh commented Jul 3, 2023 •

edited

Loading